[GitHub] storm pull request: STORM-933:NullPointerException during KafkaSpo...
Github user asfgit closed the pull request at: https://github.com/apache/storm/pull/660 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [DISCUSS] Backport bugfixes (to 0.10.x / 0.9.x)
Given how huge 0.10 release was I feel trying to back port all bug fixes and testing that it does not brake something else might turn out to be a huge PITA. I think going with a stable 0.10 release might be the best solution for now. I don’t think back porting requires confirmation however given we will probably have to do release for each version where back porting was done it is probably best to notify Release manager and discuss options. I agree having a rule/bylaw would help clarify things for future. Thanks Parth On 8/2/15, 4:30 PM, 임정택 kabh...@gmail.com wrote: Bump. Does anyone have opinions about this? I already did back-port some bugfixes (not in list) into 0.10.x and 0.9.x lines, but I'm not 100% sure that it is preferred way. Seems like we don't have explicit rules about doing back-port. Only thing I know is Taylor was (or has been) a gatekeeper. Now I really want to know that it still need to be confirmed by Taylor before doing back-port. Thanks, Jungtaek Lim (HeartSaVioR) 2015-07-28 8:27 GMT+09:00 임정택 kabh...@gmail.com: Hi all, Recently I see many bugfixes are only merged to master, or 0.10.x-branch. Since 0.10.0-beta1 introduces huge changeset, and it contains a lot of bugfixes, I think we can consider backporting them to 0.9.x-branch before releasing 0.9.6. I create a sheet and write down bugfix issues which could be backported, and status of issue. (what versions it is applied, and what versions it can be applied) https://docs.google.com/spreadsheets/d/1KQrOlqk1hlE2oDmXFY34lJaY0PU7V5uxq 9U1vfIhLq4/edit?usp=sharing Please let me know whenever you find missing spots or wrong contents. There seems to be other approach: - release stable version of 0.10.0, and drop plan to release 0.9.6 so that let all users who want bugfix release move to 0.10.0 Since a lot of bugfix issues are waiting for backporting, alternative approach may be make sense. I'm open to hear any thoughts, so please share your opinions. Thanks, Jungtaek Lim (HeartSaVioR) to. Taylor I don't know I can do backport without your confirmation. (by each issue) If you want to decide about backporting yourself, I'll follow you. -- Name : 임 정택 Blog : http://www.heartsavior.net / http://dev.heartsavior.net Twitter : http://twitter.com/heartsavior LinkedIn : http://www.linkedin.com/in/heartsavior
[GitHub] storm pull request: STORM-966 ConfigValidation.DoubleValidator is ...
Github user caofangkun commented on a diff in the pull request: https://github.com/apache/storm/pull/658#discussion_r36149821 --- Diff: storm-core/src/jvm/backtype/storm/ConfigValidation.java --- @@ -28,7 +28,6 @@ /** * Declares methods for validating configuration values. */ -public static interface FieldValidator { --- End diff -- This interface should not be annotated. Could you please have a check? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (STORM-966) ConfigValidation.DoubleValidator doesn't really validate whether the type of the object is a double
[ https://issues.apache.org/jira/browse/STORM-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652914#comment-14652914 ] ASF GitHub Bot commented on STORM-966: -- Github user caofangkun commented on a diff in the pull request: https://github.com/apache/storm/pull/658#discussion_r36149821 --- Diff: storm-core/src/jvm/backtype/storm/ConfigValidation.java --- @@ -28,7 +28,6 @@ /** * Declares methods for validating configuration values. */ -public static interface FieldValidator { --- End diff -- This interface should not be annotated. Could you please have a check? ConfigValidation.DoubleValidator doesn't really validate whether the type of the object is a double --- Key: STORM-966 URL: https://issues.apache.org/jira/browse/STORM-966 Project: Apache Storm Issue Type: Improvement Reporter: Boyang Jerry Peng Assignee: Boyang Jerry Peng Priority: Minor ConfigValidation.DoubleValidator code only checks if the object is null whether if the object is a instance of Number which is a parent class of Double. DoubleValidator is only used once in Config.java and in that instance: public static final Object TOPOLOGY_STATS_SAMPLE_RATE_SCHEMA = ConfigValidation.DoubleValidator; can just be set to: public static final Object TOPOLOGY_STATS_SAMPLE_RATE_SCHEMA = NUMBER.class; Then we can just get rid of the misleading function ConfigValidation.DoubleValidator since it doesn't really check if a object is of double type thus the validator function doesn't really do anything and the name is misleading. In previous commit https://github.com/apache/storm/commit/214ee7454548b884c591991b1faea770d1478cec Number.Class was used anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] Backport bugfixes (to 0.10.x / 0.9.x)
Good catch on storm-903. I'll take a closer look. -Taylor On Aug 3, 2015, at 7:25 PM, 임정택 kabh...@gmail.com wrote: Thanks all. I also think that it is really painful to backport something. That's why I asked about what version lines we'll consider from other thread. Seems like we're sure about releasing official version of 0.10.0 and phasing out 0.9.x lines. I'll backport bugfixes to only 0.10.x-branch and let you know when I finished. Before start releasing 0.10.0, we may take a look at STORM-903 https://issues.apache.org/jira/browse/STORM-903, which seems to be not finished. Thanks, Jungtaek Lim (HeartSaVioR) 2015-08-04 5:36 GMT+09:00 P. Taylor Goetz ptgo...@gmail.com: Thanks for putting together this list Jungtaek. Back-porting is a pain, and the more the 0.9.x, 0.10.x and master lines diverge, the harder it gets. I propose we back-port the 4 fixes you identified for the 0.10 branch, and start discussing releasing 0.10.0 (final, not beta). Once 0.10.0 is out, I think we can start phasing out the 0.9.x line. The idea was to continue to support 0.9.x while 0.10.0 stabilized and allow early upgraders had a chance to kick the tires and report any glaring issues. IMO more than enough time has passed and we should move forward with a 0.10.0 release. In terms of the who and when of back porting, the general principle I’ve followed is that once a patch has been merged, it is a candidate for back-porting, and that any committer can do that since the patch had already been reviewed and accepted. I don’t think a separate pull request is necessary. In fact, I think extra pull requests for back-porting makes JIRA/Github issues a little messy and confusing. IMO the only time we need back-port pull requests is: a) A non-committer contributor is requesting a patch be applied to an earlier version. b) A committer back-ported a patch with a lot of conflicts, and feels it warrants further review before committing. Basically a way of saying “This merge was messy. Could others check my work?” If things go wrong at any time, there’s always “git revert”. I don’t think we need to codify any of this in our BYLAWS unless there is some sort of conflict, which for now there isn’t. If we feel the need to document the process I feel documenting it README/wiki entry should suffice. I’m more in favor of mutual trust among committers than hard and fast rules. Once a particular practice gets formalized in our bylaws, it can be very difficult to change. -Taylor On Aug 3, 2015, at 12:56 PM, Derek Dagit der...@yahoo-inc.com.INVALID wrote: Dealing with branches is a pain, and it is good we are paying attention to back-porting. It is good to bring it up for discussion, and I agree checking with those who do releases is a reasonable thing to do. I do not think there are special restrictions on back-porting fixes to previous branches. I would be comfortable with the normal rules for a pull request. Effort is one cost, and we could eventually run into some more challenging merge conflicts as well. There are multiple things to consider, and I think it is a judgment call. On the other hand, if it does become clear that clarifying principles helpful in our BYLAWS, then I am all for it. If we commit to supporting specific branches with certain kinds of fixes, then we need to stick to such a commitment. -- Derek - Original Message - From: Parth Brahmbhatt pbrahmbh...@hortonworks.com To: dev@storm.apache.org dev@storm.apache.org Cc: Sent: Monday, August 3, 2015 11:26 AM Subject: Re: [DISCUSS] Backport bugfixes (to 0.10.x / 0.9.x) Given how huge 0.10 release was I feel trying to back port all bug fixes and testing that it does not brake something else might turn out to be a huge PITA. I think going with a stable 0.10 release might be the best solution for now. I don’t think back porting requires confirmation however given we will probably have to do release for each version where back porting was done it is probably best to notify Release manager and discuss options. I agree having a rule/bylaw would help clarify things for future. Thanks Parth On 8/2/15, 4:30 PM, 임정택 kabh...@gmail.com wrote: Bump. Does anyone have opinions about this? I already did back-port some bugfixes (not in list) into 0.10.x and 0.9.x lines, but I'm not 100% sure that it is preferred way. Seems like we don't have explicit rules about doing back-port. Only thing I know is Taylor was (or has been) a gatekeeper. Now I really want to know that it still need to be confirmed by Taylor before doing back-port. Thanks, Jungtaek Lim (HeartSaVioR) 2015-07-28 8:27 GMT+09:00 임정택 kabh...@gmail.com: Hi all, Recently I see many bugfixes are only merged to master, or 0.10.x-branch. Since 0.10.0-beta1 introduces huge changeset, and it contains a lot of bugfixes, I think we can consider backporting them
[jira] [Commented] (STORM-851) Storm Solr connector
[ https://issues.apache.org/jira/browse/STORM-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653083#comment-14653083 ] ASF GitHub Bot commented on STORM-851: -- GitHub user hmcl opened a pull request: https://github.com/apache/storm/pull/665 STORM-851: Storm Solr Connector 1. SolrUpdate Bolt 2. Trident State implementation 3. Fields Mapper 4. JSON Mapper 5. Integration Tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/hmcl/storm-apache STORM-851 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/storm/pull/665.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #665 commit 0c5e00caf801e8a7455f0bc7976a7dcd0f8ab335 Author: Hugo Louro hmclo...@gmail.com Date: 2015-07-17T02:10:30Z STORM-851: Storm Solr Connector 1. SolrUpdate Bolt 2. Trident State implementation 3. Fields Mapper 4. JSON Mapper 5. Integration Tests Storm Solr connector Key: STORM-851 URL: https://issues.apache.org/jira/browse/STORM-851 Project: Apache Storm Issue Type: Improvement Reporter: Sriharsha Chintalapani Assignee: Hugo Louro Storm solr connector should provide bolt and trident implementation to allow users to index data coming through the topology into solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: STORM-851: Storm Solr Connector
GitHub user hmcl opened a pull request: https://github.com/apache/storm/pull/665 STORM-851: Storm Solr Connector 1. SolrUpdate Bolt 2. Trident State implementation 3. Fields Mapper 4. JSON Mapper 5. Integration Tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/hmcl/storm-apache STORM-851 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/storm/pull/665.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #665 commit 0c5e00caf801e8a7455f0bc7976a7dcd0f8ab335 Author: Hugo Louro hmclo...@gmail.com Date: 2015-07-17T02:10:30Z STORM-851: Storm Solr Connector 1. SolrUpdate Bolt 2. Trident State implementation 3. Fields Mapper 4. JSON Mapper 5. Integration Tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] storm pull request: STORM-851: Storm Solr Connector
Github user hmcl commented on the pull request: https://github.com/apache/storm/pull/665#issuecomment-127483109 I am planning on pushing a few more unit tests while the community does the review. I have provided a set of functional tests that use the Solr gettingstarted example. I ran the functional tests successfully with Solr running locally. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (STORM-851) Storm Solr connector
[ https://issues.apache.org/jira/browse/STORM-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653085#comment-14653085 ] ASF GitHub Bot commented on STORM-851: -- Github user hmcl commented on the pull request: https://github.com/apache/storm/pull/665#issuecomment-127483109 I am planning on pushing a few more unit tests while the community does the review. I have provided a set of functional tests that use the Solr gettingstarted example. I ran the functional tests successfully with Solr running locally. Storm Solr connector Key: STORM-851 URL: https://issues.apache.org/jira/browse/STORM-851 Project: Apache Storm Issue Type: Improvement Reporter: Sriharsha Chintalapani Assignee: Hugo Louro Storm solr connector should provide bolt and trident implementation to allow users to index data coming through the topology into solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (STORM-770) NullPointerException in consumeBatchToCursor
[ https://issues.apache.org/jira/browse/STORM-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng updated STORM-770: -- Comment: was deleted (was: hi all, we encountered the same problem at 0.9.2-incubating. it happens when blt0 (140 executors on 18 workers, process asynchronously) tries to send a tuple to blt1 grouped by one field once two days or more. I wondering is this a problem when worker trying to init the executor data concurrently. executor.clj : mk-executor-data/ :stream-component-grouper (outbound-components worker-context component-id) - outbound-components/ - outbound-groupings/ (.getComponentTasks worker-context component) (line: 106) - mk-grouper/ target-tasks (vec (sort target-tasks)) (line: 57) is the target-tasks elements got by the code have a chance to be null when the target-tasks (sort target-tasks) belongs to the worker? thanks! ) NullPointerException in consumeBatchToCursor Key: STORM-770 URL: https://issues.apache.org/jira/browse/STORM-770 Project: Apache Storm Issue Type: Bug Affects Versions: 0.9.2-incubating Reporter: Stas Levin We got the following exception after our topology had been up for ~2 days, and I was wondering if it might be related. Looks like task in mk-transfer-fn is null, making (.add remote (TaskMessage. task (.serialize serializer tuple))) fail on NPE (worker.clj:128, storm-core-0.9.2-incubating.jar) java.lang.RuntimeException: java.lang.NullPointerException at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.disruptor$consume_loop_STAR_$fn__758.invoke(disruptor.clj:94) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.util$async_loop$fn__457.invoke(util.clj:431) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_72] Caused by: java.lang.NullPointerException: null at clojure.lang.RT.intCast(RT.java:1087) ~[clojure-1.5.1.jar:na] at backtype.storm.daemon.worker$mk_transfer_fn$fn__5748.invoke(worker.clj:128) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.daemon.executor$start_batch_transfer_GT_worker_handler_BANG$fn__5483.invoke(executor.clj:256) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.disruptor$clojure_handler$reify__745.onEvent(disruptor.clj:58) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] ... 6 common frames omitted,java.lang.RuntimeException: java.lang.NullPointerException Any ideas? P.S. Also saw it here: http://mail-archives.apache.org/mod_mbox/storm-user/201501.mbox/%3CCABcMBhCusXXU=v1e66wfuatgyh1euqnd1siog65-tp8xlwx...@mail.gmail.com%3E https://mail-archives.apache.org/mod_mbox/storm-user/201408.mbox/%3ccajuqm_4kxhsh2_x08ujuqr76m2c+dswp0fcijbmfcaeyqgs...@mail.gmail.com%3E Comment from Bobby http://mail-archives.apache.org/mod_mbox/storm-user/201501.mbox/%3c574363643.2791948.1420470097280.javamail.ya...@jws10027.mail.ne1.yahoo.com%3E {quote} What version of storm are you using? Are any of the bolts shell bolts? There is a known issue where this can happen if two shell bolts share an executor, because they are multi-threaded. - Bobby {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-837) HdfsState ignores commits
[ https://issues.apache.org/jira/browse/STORM-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651543#comment-14651543 ] ASF GitHub Bot commented on STORM-837: -- Github user arunmahadevan commented on the pull request: https://github.com/apache/storm/pull/644#issuecomment-127146640 Hi @harshach, added a note to storm hdfs README.md and modified code to disable exactly once based on file size as well. HdfsState ignores commits - Key: STORM-837 URL: https://issues.apache.org/jira/browse/STORM-837 Project: Apache Storm Issue Type: Bug Reporter: Robert Joseph Evans Assignee: Arun Mahadevan Priority: Critical HdfsState works with trident which is supposed to provide exactly once processing. It does this two ways, first by informing the state about commits so it can be sure the data is written out, and second by having a commit id, so that double commits can be handled. HdfsState ignores the beginCommit and commit calls, and with that ignores the ids. This means that if you use HdfsState and your worker crashes you may both lose data and get some data twice. At a minimum the flush and file rotation should be tied to the commit in some way. The commit ID should at a minimum be written out with the data so someone reading the data can have a hope of deduping it themselves. Also with the rotationActions it is possible for a file that was partially written is leaked, and never moved to the final location, because it is not rotated. I personally think the actions are too generic for this case and need to be deprecated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: [STORM-837] Support for exactly once semantics...
Github user arunmahadevan commented on the pull request: https://github.com/apache/storm/pull/644#issuecomment-127146640 Hi @harshach, added a note to storm hdfs README.md and modified code to disable exactly once based on file size as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Assigned] (STORM-963) Frozen topology (KafkaSpout + Multilang bolt)
[ https://issues.apache.org/jira/browse/STORM-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Sobrino reassigned STORM-963: -- Assignee: Alex Sobrino Frozen topology (KafkaSpout + Multilang bolt) - Key: STORM-963 URL: https://issues.apache.org/jira/browse/STORM-963 Project: Apache Storm Issue Type: Bug Components: storm-kafka Affects Versions: 0.9.4, 0.9.5, 0.9.6 Environment: - VMware ESX 5.5 - Ubuntu Server 14.04 LTS (kernel 3.16.0-41-generic) - Java (TM) SE Runtime Environment (build 1.8.0_45-b14) - Python 2.7.6 (default, Jun 22 2015, 17:58:13) - Zookeeper 3.4.6 Reporter: Alex Sobrino Assignee: Alex Sobrino Labels: multilang Hi, We've got a pretty simple topology running with Storm 0.9.5 (tried also with 0.9.4 and 0.9.6-INCUBATING) in a 3 machine cluster: {code}kafkaSpout (3) - processBolt (12){code} Some info: - kafkaSpout reads from a topic with 3 partitions and 2 replications - processBolt iterates throught the message and saves the results in MongoDB - processBolt is implemented in Python and has a storm.log(I'm doing something) just to add a simple debug message in the logs - The messages can be quite big (~25-40 MB) and are in JSON format - The kafka topic has a retention of 2 hours - We use the same ZooKeeper cluster to both Kafka and Storm The topology gets frozen after several hours (not days) running. We don't see any message in the logs... In fact, the periodic message from s.k.KafkaUtils and s.k.ZkCoordinator disapears. As you can imagine, the message from the Bolt also dissapears. Logs are copy/pasted further on. If we redeploy the topology everything starts to work again until it becomes frozen again. Our kafkaSpout config is: {code} ZkHosts zkHosts = new ZkHosts(zkhost01:2181,zkhost02:2181,zkhost03:2181); SpoutConfig kafkaConfig = new SpoutConfig(zkHosts, topic, /topic/ourclientid, ourclientid); kafkaConfig.scheme = new SchemeAsMultiScheme(new StringScheme()); kafkaConfig.fetchSizeBytes = 50*1024*1024; kafkaConfig.bufferSizeBytes = 50*1024*1024; {code} We've also tried setting the following options {code} kafkaConfig.forceFromStart = true; kafkaConfig.startOffsetTime = kafka.api.OffsetRequest.EarliestTime(); // Also with kafka.api.OffsetRequest.LatestTime(); kafkaConfig.useStartOffsetTimeIfOffsetOutOfRange = true; {code} Right now the topology is running without acking the messages since there's a bug in kafkaSpout with failed messages and deleted offsets in Kafka. This is what can be seen in the logs in one of the workers: {code} 2015-07-23T12:37:38.008+0200 b.s.t.ShellBolt [INFO] ShellLog pid:28364, name:processBolt I'm doing something 2015-07-23T12:37:39.079+0200 b.s.t.ShellBolt [INFO] ShellLog pid:28364, name:processBolt I'm doing something 2015-07-23T12:37:51.013+0200 b.s.t.ShellBolt [INFO] ShellLog pid:28364, name:processBolt I'm doing something 2015-07-23T12:37:51.091+0200 b.s.t.ShellBolt [INFO] ShellLog pid:28364, name:processBolt I'm doing something 2015-07-23T12:38:02.684+0200 s.k.ZkCoordinator [INFO] Task [2/3] Refreshing partition manager connections 2015-07-23T12:38:02.687+0200 s.k.DynamicBrokersReader [INFO] Read partition info from zookeeper: GlobalPartitionInformation{partitionMap={0=kafka1:9092, 1=kafka2:9092, 2=kafka3:9092}} 2015-07-23T12:38:02.687+0200 s.k.KafkaUtils [INFO] Task [2/3] assigned [Partition{host=kafka2, partition=1}] 2015-07-23T12:38:02.687+0200 s.k.ZkCoordinator [INFO] Task [2/3] Deleted partition managers: [] 2015-07-23T12:38:02.687+0200 s.k.ZkCoordinator [INFO] Task [2/3] New partition managers: [] 2015-07-23T12:38:02.687+0200 s.k.ZkCoordinator [INFO] Task [2/3] Finished refreshing 2015-07-23T12:38:09.012+0200 b.s.t.ShellBolt [INFO] ShellLog pid:28364, name:processBolt I'm doing something 2015-07-23T12:38:41.878+0200 b.s.t.ShellBolt [INFO] ShellLog pid:28364, name:processBolt I'm doing something 2015-07-23T12:39:02.688+0200 s.k.ZkCoordinator [INFO] Task [2/3] Refreshing partition manager connections 2015-07-23T12:39:02.691+0200 s.k.DynamicBrokersReader [INFO] Read partition info from zookeeper: GlobalPartitionInformation{partitionMap={0=kafka1:9092, 1=kafka2:9092, 2=kafka3:9092}} 2015-07-23T12:39:02.691+0200 s.k.KafkaUtils [INFO] Task [2/3] assigned [Partition{host=kafka2:9092, partition=1}] 2015-07-23T12:39:02.691+0200 s.k.ZkCoordinator [INFO] Task [2/3] Deleted partition managers: [] 2015-07-23T12:39:02.691+0200 s.k.ZkCoordinator [INFO] Task [2/3] New partition managers: [] 2015-07-23T12:39:02.691+0200 s.k.ZkCoordinator [INFO] Task [2/3] Finished refreshing 2015-07-23T12:40:02.692+0200 s.k.ZkCoordinator [INFO] Task [2/3] Refreshing partition manager connections 2015-07-23T12:40:02.695+0200
[jira] [Commented] (STORM-966) ConfigValidation.DoubleValidator doesn't really validate whether the type of the object is a double
[ https://issues.apache.org/jira/browse/STORM-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652116#comment-14652116 ] ASF GitHub Bot commented on STORM-966: -- Github user jerrypeng commented on the pull request: https://github.com/apache/storm/pull/658#issuecomment-127331868 added unit tests. Please merge? or do I need to squash some of the commits? ConfigValidation.DoubleValidator doesn't really validate whether the type of the object is a double --- Key: STORM-966 URL: https://issues.apache.org/jira/browse/STORM-966 Project: Apache Storm Issue Type: Improvement Reporter: Boyang Jerry Peng Assignee: Boyang Jerry Peng Priority: Minor ConfigValidation.DoubleValidator code only checks if the object is null whether if the object is a instance of Number which is a parent class of Double. DoubleValidator is only used once in Config.java and in that instance: public static final Object TOPOLOGY_STATS_SAMPLE_RATE_SCHEMA = ConfigValidation.DoubleValidator; can just be set to: public static final Object TOPOLOGY_STATS_SAMPLE_RATE_SCHEMA = NUMBER.class; Then we can just get rid of the misleading function ConfigValidation.DoubleValidator since it doesn't really check if a object is of double type thus the validator function doesn't really do anything and the name is misleading. In previous commit https://github.com/apache/storm/commit/214ee7454548b884c591991b1faea770d1478cec Number.Class was used anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: STORM-966 ConfigValidation.DoubleValidator is ...
Github user jerrypeng commented on the pull request: https://github.com/apache/storm/pull/658#issuecomment-127331868 added unit tests. Please merge? or do I need to squash some of the commits? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [DISCUSS] Backport bugfixes (to 0.10.x / 0.9.x)
Dealing with branches is a pain, and it is good we are paying attention to back-porting. It is good to bring it up for discussion, and I agree checking with those who do releases is a reasonable thing to do. I do not think there are special restrictions on back-porting fixes to previous branches. I would be comfortable with the normal rules for a pull request. Effort is one cost, and we could eventually run into some more challenging merge conflicts as well. There are multiple things to consider, and I think it is a judgment call. On the other hand, if it does become clear that clarifying principles helpful in our BYLAWS, then I am all for it. If we commit to supporting specific branches with certain kinds of fixes, then we need to stick to such a commitment. -- Derek - Original Message - From: Parth Brahmbhatt pbrahmbh...@hortonworks.com To: dev@storm.apache.org dev@storm.apache.org Cc: Sent: Monday, August 3, 2015 11:26 AM Subject: Re: [DISCUSS] Backport bugfixes (to 0.10.x / 0.9.x) Given how huge 0.10 release was I feel trying to back port all bug fixes and testing that it does not brake something else might turn out to be a huge PITA. I think going with a stable 0.10 release might be the best solution for now. I don’t think back porting requires confirmation however given we will probably have to do release for each version where back porting was done it is probably best to notify Release manager and discuss options. I agree having a rule/bylaw would help clarify things for future. Thanks Parth On 8/2/15, 4:30 PM, 임정택 kabh...@gmail.com wrote: Bump. Does anyone have opinions about this? I already did back-port some bugfixes (not in list) into 0.10.x and 0.9.x lines, but I'm not 100% sure that it is preferred way. Seems like we don't have explicit rules about doing back-port. Only thing I know is Taylor was (or has been) a gatekeeper. Now I really want to know that it still need to be confirmed by Taylor before doing back-port. Thanks, Jungtaek Lim (HeartSaVioR) 2015-07-28 8:27 GMT+09:00 임정택 kabh...@gmail.com: Hi all, Recently I see many bugfixes are only merged to master, or 0.10.x-branch. Since 0.10.0-beta1 introduces huge changeset, and it contains a lot of bugfixes, I think we can consider backporting them to 0.9.x-branch before releasing 0.9.6. I create a sheet and write down bugfix issues which could be backported, and status of issue. (what versions it is applied, and what versions it can be applied) https://docs.google.com/spreadsheets/d/1KQrOlqk1hlE2oDmXFY34lJaY0PU7V5uxq 9U1vfIhLq4/edit?usp=sharing Please let me know whenever you find missing spots or wrong contents. There seems to be other approach: - release stable version of 0.10.0, and drop plan to release 0.9.6 so that let all users who want bugfix release move to 0.10.0 Since a lot of bugfix issues are waiting for backporting, alternative approach may be make sense. I'm open to hear any thoughts, so please share your opinions. Thanks, Jungtaek Lim (HeartSaVioR) to. Taylor I don't know I can do backport without your confirmation. (by each issue) If you want to decide about backporting yourself, I'll follow you. -- Name : 임 정택 Blog : http://www.heartsavior.net / http://dev.heartsavior.net Twitter : http://twitter.com/heartsavior LinkedIn : http://www.linkedin.com/in/heartsavior
[jira] [Commented] (STORM-963) Frozen topology (KafkaSpout + Multilang bolt)
[ https://issues.apache.org/jira/browse/STORM-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651841#comment-14651841 ] Alex Sobrino commented on STORM-963: Hi [~kabhwan], We're not able to reproduce it when we want. It just happens every now and then, but it's quite frequent, so we're able to provide some test results. Executing {{kill -SIGABRT PID}} in one of the Python processes writes this into the worker's log: {code} 2015-08-03T14:41:30.315+0200 b.s.t.ShellBolt [ERROR] Halting process: ShellBolt died. java.lang.RuntimeException: subprocess heartbeat timeout at backtype.storm.task.ShellBolt$BoltHeartbeatTimerTask.run(ShellBolt.java:305) [storm-core-0.9.5.jar:0.9.5] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_45] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_45] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_45] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_45] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_45] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_45] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] 2015-08-03T14:41:30.315+0200 b.s.d.executor [ERROR] java.lang.RuntimeException: subprocess heartbeat timeout at backtype.storm.task.ShellBolt$BoltHeartbeatTimerTask.run(ShellBolt.java:305) [storm-core-0.9.5.jar:0.9.5] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_45] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_45] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_45] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_45] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_45] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_45] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] 2015-08-03T14:41:30.317+0200 b.s.t.ShellBolt [ERROR] Halting process: ShellBolt died. java.lang.RuntimeException: backtype.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read. Serializer Exception: at backtype.storm.utils.ShellProcess.readShellMsg(ShellProcess.java:101) ~[storm-core-0.9.5.jar:0.9.5] at backtype.storm.task.ShellBolt$BoltReaderRunnable.run(ShellBolt.java:318) ~[storm-core-0.9.5.jar:0.9.5] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] 2015-08-03T14:41:30.318+0200 b.s.d.executor [ERROR] java.lang.RuntimeException: backtype.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read. Serializer Exception: at backtype.storm.utils.ShellProcess.readShellMsg(ShellProcess.java:101) ~[storm-core-0.9.5.jar:0.9.5] at backtype.storm.task.ShellBolt$BoltReaderRunnable.run(ShellBolt.java:318) ~[storm-core-0.9.5.jar:0.9.5] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] 2015-08-03T14:41:30.320+0200 b.s.t.ShellBolt [ERROR] Halting process: ShellBolt died. java.io.IOException: Broken pipe at java.io.FileOutputStream.writeBytes(Native Method) ~[na:1.8.0_45] at java.io.FileOutputStream.write(FileOutputStream.java:326) ~[na:1.8.0_45] at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[na:1.8.0_45] at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[na:1.8.0_45] at java.io.DataOutputStream.flush(DataOutputStream.java:123) ~[na:1.8.0_45] at backtype.storm.multilang.JsonSerializer.writeString(JsonSerializer.java:96) ~[storm-core-0.9.5.jar:0.9.5] at backtype.storm.multilang.JsonSerializer.writeMessage(JsonSerializer.java:89) ~[storm-core-0.9.5.jar:0.9.5] at backtype.storm.multilang.JsonSerializer.writeBoltMsg(JsonSerializer.java:74) ~[storm-core-0.9.5.jar:0.9.5] at backtype.storm.utils.ShellProcess.writeBoltMsg(ShellProcess.java:106) ~[storm-core-0.9.5.jar:0.9.5] at backtype.storm.task.ShellBolt$BoltWriterRunnable.run(ShellBolt.java:355) ~[storm-core-0.9.5.jar:0.9.5] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] 2015-08-03T14:41:30.320+0200 b.s.d.executor [ERROR] java.io.IOException: Broken pipe at java.io.FileOutputStream.writeBytes(Native Method) ~[na:1.8.0_45] at java.io.FileOutputStream.write(FileOutputStream.java:326)
[jira] [Commented] (STORM-963) Frozen topology (KafkaSpout + Multilang bolt)
[ https://issues.apache.org/jira/browse/STORM-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651859#comment-14651859 ] Jungtaek Lim commented on STORM-963: jstack with -F option forces leaving thread dump. When you meet circumstance again, try jstack with -F option. It should be a good bet. Frozen topology (KafkaSpout + Multilang bolt) - Key: STORM-963 URL: https://issues.apache.org/jira/browse/STORM-963 Project: Apache Storm Issue Type: Bug Components: storm-kafka Affects Versions: 0.9.4, 0.9.5, 0.9.6 Environment: - VMware ESX 5.5 - Ubuntu Server 14.04 LTS (kernel 3.16.0-41-generic) - Java (TM) SE Runtime Environment (build 1.8.0_45-b14) - Python 2.7.6 (default, Jun 22 2015, 17:58:13) - Zookeeper 3.4.6 Reporter: Alex Sobrino Labels: multilang Hi, We've got a pretty simple topology running with Storm 0.9.5 (tried also with 0.9.4 and 0.9.6-INCUBATING) in a 3 machine cluster: {code}kafkaSpout (3) - processBolt (12){code} Some info: - kafkaSpout reads from a topic with 3 partitions and 2 replications - processBolt iterates throught the message and saves the results in MongoDB - processBolt is implemented in Python and has a storm.log(I'm doing something) just to add a simple debug message in the logs - The messages can be quite big (~25-40 MB) and are in JSON format - The kafka topic has a retention of 2 hours - We use the same ZooKeeper cluster to both Kafka and Storm The topology gets frozen after several hours (not days) running. We don't see any message in the logs... In fact, the periodic message from s.k.KafkaUtils and s.k.ZkCoordinator disapears. As you can imagine, the message from the Bolt also dissapears. Logs are copy/pasted further on. If we redeploy the topology everything starts to work again until it becomes frozen again. Our kafkaSpout config is: {code} ZkHosts zkHosts = new ZkHosts(zkhost01:2181,zkhost02:2181,zkhost03:2181); SpoutConfig kafkaConfig = new SpoutConfig(zkHosts, topic, /topic/ourclientid, ourclientid); kafkaConfig.scheme = new SchemeAsMultiScheme(new StringScheme()); kafkaConfig.fetchSizeBytes = 50*1024*1024; kafkaConfig.bufferSizeBytes = 50*1024*1024; {code} We've also tried setting the following options {code} kafkaConfig.forceFromStart = true; kafkaConfig.startOffsetTime = kafka.api.OffsetRequest.EarliestTime(); // Also with kafka.api.OffsetRequest.LatestTime(); kafkaConfig.useStartOffsetTimeIfOffsetOutOfRange = true; {code} Right now the topology is running without acking the messages since there's a bug in kafkaSpout with failed messages and deleted offsets in Kafka. This is what can be seen in the logs in one of the workers: {code} 2015-07-23T12:37:38.008+0200 b.s.t.ShellBolt [INFO] ShellLog pid:28364, name:processBolt I'm doing something 2015-07-23T12:37:39.079+0200 b.s.t.ShellBolt [INFO] ShellLog pid:28364, name:processBolt I'm doing something 2015-07-23T12:37:51.013+0200 b.s.t.ShellBolt [INFO] ShellLog pid:28364, name:processBolt I'm doing something 2015-07-23T12:37:51.091+0200 b.s.t.ShellBolt [INFO] ShellLog pid:28364, name:processBolt I'm doing something 2015-07-23T12:38:02.684+0200 s.k.ZkCoordinator [INFO] Task [2/3] Refreshing partition manager connections 2015-07-23T12:38:02.687+0200 s.k.DynamicBrokersReader [INFO] Read partition info from zookeeper: GlobalPartitionInformation{partitionMap={0=kafka1:9092, 1=kafka2:9092, 2=kafka3:9092}} 2015-07-23T12:38:02.687+0200 s.k.KafkaUtils [INFO] Task [2/3] assigned [Partition{host=kafka2, partition=1}] 2015-07-23T12:38:02.687+0200 s.k.ZkCoordinator [INFO] Task [2/3] Deleted partition managers: [] 2015-07-23T12:38:02.687+0200 s.k.ZkCoordinator [INFO] Task [2/3] New partition managers: [] 2015-07-23T12:38:02.687+0200 s.k.ZkCoordinator [INFO] Task [2/3] Finished refreshing 2015-07-23T12:38:09.012+0200 b.s.t.ShellBolt [INFO] ShellLog pid:28364, name:processBolt I'm doing something 2015-07-23T12:38:41.878+0200 b.s.t.ShellBolt [INFO] ShellLog pid:28364, name:processBolt I'm doing something 2015-07-23T12:39:02.688+0200 s.k.ZkCoordinator [INFO] Task [2/3] Refreshing partition manager connections 2015-07-23T12:39:02.691+0200 s.k.DynamicBrokersReader [INFO] Read partition info from zookeeper: GlobalPartitionInformation{partitionMap={0=kafka1:9092, 1=kafka2:9092, 2=kafka3:9092}} 2015-07-23T12:39:02.691+0200 s.k.KafkaUtils [INFO] Task [2/3] assigned [Partition{host=kafka2:9092, partition=1}] 2015-07-23T12:39:02.691+0200 s.k.ZkCoordinator [INFO] Task [2/3] Deleted partition managers: [] 2015-07-23T12:39:02.691+0200 s.k.ZkCoordinator [INFO] Task [2/3] New partition managers: [] 2015-07-23T12:39:02.691+0200 s.k.ZkCoordinator [INFO] Task [2/3] Finished refreshing
[jira] [Commented] (STORM-963) Frozen topology (KafkaSpout + Multilang bolt)
[ https://issues.apache.org/jira/browse/STORM-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651850#comment-14651850 ] Alex Sobrino commented on STORM-963: No luck with a {{jstack}} execution: {code} ps aux|grep java| grep storm | grep worker storm 850 24.9 46.1 11092720 4736148 ?Sl 15:04 3:58 /opt/java/latest/bin/java -server -Xmx6144m -Djava.library.path=/var/lib/storm/supervisor/stormdist/my-topology-48-1438607054/resources/Linux-amd64:/var/lib/storm/supervisor/stormdist/my-topology-48-1438607054/resources:/usr/local/lib:/opt/local/lib:/usr/lib -Dlogfile.name=worker-6700.log -Dstorm.home=/opt/mydir/apache-storm-0.9.5 -Dstorm.conf.file= -Dstorm.options= -Dstorm.log.dir=/opt/mydir/apache-storm-0.9.5/logs -Dlogback.configurationFile=/opt/mydir/apache-storm-0.9.5/logback/cluster.xml -Dstorm.id=my-topology-48-1438607054 -Dworker.id=3904443c-7532-4b76-a0f3-63a873bae8f0 -Dworker.port=6700 -cp /opt/mydir/apache-storm-0.9.5/lib/minlog-1.2.jar:/opt/mydir/apache-storm-0.9.5/lib/carbonite-1.4.0.jar:/opt/mydir/apache-storm-0.9.5/lib/json-simple-1.1.jar:/opt/mydir/apache-storm-0.9.5/lib/slf4j-api-1.7.5.jar:/opt/mydir/apache-storm-0.9.5/lib/kryo-2.21.jar:/opt/mydir/apache-storm-0.9.5/lib/tools.logging-0.2.3.jar:/opt/mydir/apache-storm-0.9.5/lib/ring-servlet-0.3.11.jar:/opt/mydir/apache-storm-0.9.5/lib/tools.cli-0.2.4.jar:/opt/mydir/apache-storm-0.9.5/lib/disruptor-2.10.1.jar:/opt/mydir/apache-storm-0.9.5/lib/clj-stacktrace-0.2.2.jar:/opt/mydir/apache-storm-0.9.5/lib/math.numeric-tower-0.0.1.jar:/opt/mydir/apache-storm-0.9.5/lib/ring-jetty-adapter-0.3.11.jar:/opt/mydir/apache-storm-0.9.5/lib/commons-io-2.4.jar:/opt/mydir/apache-storm-0.9.5/lib/servlet-api-2.5.jar:/opt/mydir/apache-storm-0.9.5/lib/log4j-over-slf4j-1.6.6.jar:/opt/mydir/apache-storm-0.9.5/lib/core.incubator-0.1.0.jar:/opt/mydir/apache-storm-0.9.5/lib/asm-4.0.jar:/opt/mydir/apache-storm-0.9.5/lib/hiccup-0.3.6.jar:/opt/mydir/apache-storm-0.9.5/lib/jetty-util-6.1.26.jar:/opt/mydir/apache-storm-0.9.5/lib/tools.macro-0.1.0.jar:/opt/mydir/apache-storm-0.9.5/lib/ring-devel-0.3.11.jar:/opt/mydir/apache-storm-0.9.5/lib/commons-exec-1.1.jar:/opt/mydir/apache-storm-0.9.5/lib/ring-core-1.1.5.jar:/opt/mydir/apache-storm-0.9.5/lib/clout-1.0.1.jar:/opt/mydir/apache-storm-0.9.5/lib/jetty-6.1.26.jar:/opt/mydir/apache-storm-0.9.5/lib/objenesis-1.2.jar:/opt/mydir/apache-storm-0.9.5/lib/logback-core-1.0.13.jar:/opt/mydir/apache-storm-0.9.5/lib/jgrapht-core-0.9.0.jar:/opt/mydir/apache-storm-0.9.5/lib/commons-codec-1.6.jar:/opt/mydir/apache-storm-0.9.5/lib/commons-lang-2.5.jar:/opt/mydir/apache-storm-0.9.5/lib/clojure-1.5.1.jar:/opt/mydir/apache-storm-0.9.5/lib/storm-core-0.9.5.jar:/opt/mydir/apache-storm-0.9.5/lib/chill-java-0.3.5.jar:/opt/mydir/apache-storm-0.9.5/lib/reflectasm-1.07-shaded.jar:/opt/mydir/apache-storm-0.9.5/lib/joda-time-2.0.jar:/opt/mydir/apache-storm-0.9.5/lib/commons-logging-1.1.3.jar:/opt/mydir/apache-storm-0.9.5/lib/compojure-1.1.3.jar:/opt/mydir/apache-storm-0.9.5/lib/clj-time-0.4.1.jar:/opt/mydir/apache-storm-0.9.5/lib/jline-2.11.jar:/opt/mydir/apache-storm-0.9.5/lib/commons-fileupload-1.2.1.jar:/opt/mydir/apache-storm-0.9.5/lib/logback-classic-1.0.13.jar:/opt/mydir/apache-storm-0.9.5/lib/snakeyaml-1.11.jar:/opt/mydir/apache-storm-0.9.5/conf:/var/lib/storm/supervisor/stormdist/my-topology-48-1438607054/stormjar.jar backtype.storm.daemon.worker my-topology-48-1438607054 63b8c93e-7d6e-4d67-b4f8-957c9650e7ba 6700 3904443c-7532-4b76-a0f3-63a873bae8f0 jstack 850 850: Unable to open socket file: target process not responding or HotSpot VM not loaded The -F option can be used when the target process is not responding {code} And after some minutes the worker process dies, as can be seen in the supervisor's log: {code} 2015-08-03T15:20:56.584+0200 b.s.d.supervisor [INFO] Shutting down and clearing state for id 3904443c-7532-4b76-a0f3-63a873bae8f0. Current supervisor time: 1438608056. State: :timed-out, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1438608025, :storm-id my-topology-48-1438607054, :executors #{[2 2] [5 5] [8 8] [11 11] [14 14] [-1 -1]}, :port 6700} 2015-08-03T15:20:56.584+0200 b.s.d.supervisor [INFO] Shutting down 63b8c93e-7d6e-4d67-b4f8-957c9650e7ba:3904443c-7532-4b76-a0f3-63a873bae8f0 2015-08-03T15:20:57.606+0200 b.s.util [INFO] Error when trying to kill 936. Process is probably already dead. 2015-08-03T15:20:57.610+0200 b.s.util [INFO] Error when trying to kill 937. Process is probably already dead. 2015-08-03T15:20:57.618+0200 b.s.util [INFO] Error when trying to kill 938. Process is probably already dead. 2015-08-03T15:20:57.619+0200 b.s.util [INFO] Error when trying to kill 939. Process is probably already dead. 2015-08-03T15:20:57.623+0200 b.s.d.supervisor [INFO] Shut down 63b8c93e-7d6e-4d67-b4f8-957c9650e7ba:3904443c-7532-4b76-a0f3-63a873bae8f0 2015-08-03T15:20:57.624+0200
[jira] [Commented] (STORM-963) Frozen topology (KafkaSpout + Multilang bolt)
[ https://issues.apache.org/jira/browse/STORM-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651860#comment-14651860 ] Jungtaek Lim commented on STORM-963: Could you please refer my e-mail address from profile and send full logs? Thanks in advance! Frozen topology (KafkaSpout + Multilang bolt) - Key: STORM-963 URL: https://issues.apache.org/jira/browse/STORM-963 Project: Apache Storm Issue Type: Bug Components: storm-kafka Affects Versions: 0.9.4, 0.9.5, 0.9.6 Environment: - VMware ESX 5.5 - Ubuntu Server 14.04 LTS (kernel 3.16.0-41-generic) - Java (TM) SE Runtime Environment (build 1.8.0_45-b14) - Python 2.7.6 (default, Jun 22 2015, 17:58:13) - Zookeeper 3.4.6 Reporter: Alex Sobrino Labels: multilang Hi, We've got a pretty simple topology running with Storm 0.9.5 (tried also with 0.9.4 and 0.9.6-INCUBATING) in a 3 machine cluster: {code}kafkaSpout (3) - processBolt (12){code} Some info: - kafkaSpout reads from a topic with 3 partitions and 2 replications - processBolt iterates throught the message and saves the results in MongoDB - processBolt is implemented in Python and has a storm.log(I'm doing something) just to add a simple debug message in the logs - The messages can be quite big (~25-40 MB) and are in JSON format - The kafka topic has a retention of 2 hours - We use the same ZooKeeper cluster to both Kafka and Storm The topology gets frozen after several hours (not days) running. We don't see any message in the logs... In fact, the periodic message from s.k.KafkaUtils and s.k.ZkCoordinator disapears. As you can imagine, the message from the Bolt also dissapears. Logs are copy/pasted further on. If we redeploy the topology everything starts to work again until it becomes frozen again. Our kafkaSpout config is: {code} ZkHosts zkHosts = new ZkHosts(zkhost01:2181,zkhost02:2181,zkhost03:2181); SpoutConfig kafkaConfig = new SpoutConfig(zkHosts, topic, /topic/ourclientid, ourclientid); kafkaConfig.scheme = new SchemeAsMultiScheme(new StringScheme()); kafkaConfig.fetchSizeBytes = 50*1024*1024; kafkaConfig.bufferSizeBytes = 50*1024*1024; {code} We've also tried setting the following options {code} kafkaConfig.forceFromStart = true; kafkaConfig.startOffsetTime = kafka.api.OffsetRequest.EarliestTime(); // Also with kafka.api.OffsetRequest.LatestTime(); kafkaConfig.useStartOffsetTimeIfOffsetOutOfRange = true; {code} Right now the topology is running without acking the messages since there's a bug in kafkaSpout with failed messages and deleted offsets in Kafka. This is what can be seen in the logs in one of the workers: {code} 2015-07-23T12:37:38.008+0200 b.s.t.ShellBolt [INFO] ShellLog pid:28364, name:processBolt I'm doing something 2015-07-23T12:37:39.079+0200 b.s.t.ShellBolt [INFO] ShellLog pid:28364, name:processBolt I'm doing something 2015-07-23T12:37:51.013+0200 b.s.t.ShellBolt [INFO] ShellLog pid:28364, name:processBolt I'm doing something 2015-07-23T12:37:51.091+0200 b.s.t.ShellBolt [INFO] ShellLog pid:28364, name:processBolt I'm doing something 2015-07-23T12:38:02.684+0200 s.k.ZkCoordinator [INFO] Task [2/3] Refreshing partition manager connections 2015-07-23T12:38:02.687+0200 s.k.DynamicBrokersReader [INFO] Read partition info from zookeeper: GlobalPartitionInformation{partitionMap={0=kafka1:9092, 1=kafka2:9092, 2=kafka3:9092}} 2015-07-23T12:38:02.687+0200 s.k.KafkaUtils [INFO] Task [2/3] assigned [Partition{host=kafka2, partition=1}] 2015-07-23T12:38:02.687+0200 s.k.ZkCoordinator [INFO] Task [2/3] Deleted partition managers: [] 2015-07-23T12:38:02.687+0200 s.k.ZkCoordinator [INFO] Task [2/3] New partition managers: [] 2015-07-23T12:38:02.687+0200 s.k.ZkCoordinator [INFO] Task [2/3] Finished refreshing 2015-07-23T12:38:09.012+0200 b.s.t.ShellBolt [INFO] ShellLog pid:28364, name:processBolt I'm doing something 2015-07-23T12:38:41.878+0200 b.s.t.ShellBolt [INFO] ShellLog pid:28364, name:processBolt I'm doing something 2015-07-23T12:39:02.688+0200 s.k.ZkCoordinator [INFO] Task [2/3] Refreshing partition manager connections 2015-07-23T12:39:02.691+0200 s.k.DynamicBrokersReader [INFO] Read partition info from zookeeper: GlobalPartitionInformation{partitionMap={0=kafka1:9092, 1=kafka2:9092, 2=kafka3:9092}} 2015-07-23T12:39:02.691+0200 s.k.KafkaUtils [INFO] Task [2/3] assigned [Partition{host=kafka2:9092, partition=1}] 2015-07-23T12:39:02.691+0200 s.k.ZkCoordinator [INFO] Task [2/3] Deleted partition managers: [] 2015-07-23T12:39:02.691+0200 s.k.ZkCoordinator [INFO] Task [2/3] New partition managers: [] 2015-07-23T12:39:02.691+0200 s.k.ZkCoordinator [INFO] Task [2/3] Finished refreshing 2015-07-23T12:40:02.692+0200 s.k.ZkCoordinator [INFO] Task [2/3]
[jira] [Commented] (STORM-966) ConfigValidation.DoubleValidator doesn't really validate whether the type of the object is a double
[ https://issues.apache.org/jira/browse/STORM-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651864#comment-14651864 ] ASF GitHub Bot commented on STORM-966: -- Github user HeartSaVioR commented on the pull request: https://github.com/apache/storm/pull/658#issuecomment-127237261 @jerrypeng Great. It couldn't assert it is not big number or big decimal, but it's better than current. +1. ConfigValidation.DoubleValidator doesn't really validate whether the type of the object is a double --- Key: STORM-966 URL: https://issues.apache.org/jira/browse/STORM-966 Project: Apache Storm Issue Type: Improvement Reporter: Boyang Jerry Peng Assignee: Boyang Jerry Peng Priority: Minor ConfigValidation.DoubleValidator code only checks if the object is null whether if the object is a instance of Number which is a parent class of Double. DoubleValidator is only used once in Config.java and in that instance: public static final Object TOPOLOGY_STATS_SAMPLE_RATE_SCHEMA = ConfigValidation.DoubleValidator; can just be set to: public static final Object TOPOLOGY_STATS_SAMPLE_RATE_SCHEMA = NUMBER.class; Then we can just get rid of the misleading function ConfigValidation.DoubleValidator since it doesn't really check if a object is of double type thus the validator function doesn't really do anything and the name is misleading. In previous commit https://github.com/apache/storm/commit/214ee7454548b884c591991b1faea770d1478cec Number.Class was used anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] storm pull request: STORM-966 ConfigValidation.DoubleValidator is ...
Github user HeartSaVioR commented on the pull request: https://github.com/apache/storm/pull/658#issuecomment-127237261 @jerrypeng Great. It couldn't assert it is not big number or big decimal, but it's better than current. +1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (STORM-966) ConfigValidation.DoubleValidator doesn't really validate whether the type of the object is a double
[ https://issues.apache.org/jira/browse/STORM-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651865#comment-14651865 ] ASF GitHub Bot commented on STORM-966: -- Github user HeartSaVioR commented on the pull request: https://github.com/apache/storm/pull/658#issuecomment-127237670 @jerrypeng Sorry, but could you provide unit test for PositiveNumberValidator? Thanks! ConfigValidation.DoubleValidator doesn't really validate whether the type of the object is a double --- Key: STORM-966 URL: https://issues.apache.org/jira/browse/STORM-966 Project: Apache Storm Issue Type: Improvement Reporter: Boyang Jerry Peng Assignee: Boyang Jerry Peng Priority: Minor ConfigValidation.DoubleValidator code only checks if the object is null whether if the object is a instance of Number which is a parent class of Double. DoubleValidator is only used once in Config.java and in that instance: public static final Object TOPOLOGY_STATS_SAMPLE_RATE_SCHEMA = ConfigValidation.DoubleValidator; can just be set to: public static final Object TOPOLOGY_STATS_SAMPLE_RATE_SCHEMA = NUMBER.class; Then we can just get rid of the misleading function ConfigValidation.DoubleValidator since it doesn't really check if a object is of double type thus the validator function doesn't really do anything and the name is misleading. In previous commit https://github.com/apache/storm/commit/214ee7454548b884c591991b1faea770d1478cec Number.Class was used anyway -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-961) Investigate adding squall and trident-ml as modules
[ https://issues.apache.org/jira/browse/STORM-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651903#comment-14651903 ] Mohammed commented on STORM-961: Hello Taylor, I am one of the main authors of Squall. We also get interest from many people who contact us privately or publicly. We are very open to integrate into Apache Storm and participate in the process. Recently, we have also incorporated a Functional Scala interface with SBT-REPL support, like that of SPARK. Looking forwards to hearing from you :) p.s. Greetings to bobby! Cheers, Investigate adding squall and trident-ml as modules --- Key: STORM-961 URL: https://issues.apache.org/jira/browse/STORM-961 Project: Apache Storm Issue Type: New Feature Reporter: Xin Wang Add squall(https://github.com/epfldata/squall) as Storm-SQL and trident-ml(https://github.com/pmerienne/trident-ml) as Storm-ML to Storm External Modules. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] Backport bugfixes (to 0.10.x / 0.9.x)
Thanks for putting together this list Jungtaek. Back-porting is a pain, and the more the 0.9.x, 0.10.x and master lines diverge, the harder it gets. I propose we back-port the 4 fixes you identified for the 0.10 branch, and start discussing releasing 0.10.0 (final, not beta). Once 0.10.0 is out, I think we can start phasing out the 0.9.x line. The idea was to continue to support 0.9.x while 0.10.0 stabilized and allow early upgraders had a chance to kick the tires and report any glaring issues. IMO more than enough time has passed and we should move forward with a 0.10.0 release. In terms of the who and when of back porting, the general principle I’ve followed is that once a patch has been merged, it is a candidate for back-porting, and that any committer can do that since the patch had already been reviewed and accepted. I don’t think a separate pull request is necessary. In fact, I think extra pull requests for back-porting makes JIRA/Github issues a little messy and confusing. IMO the only time we need back-port pull requests is: a) A non-committer contributor is requesting a patch be applied to an earlier version. b) A committer back-ported a patch with a lot of conflicts, and feels it warrants further review before committing. Basically a way of saying “This merge was messy. Could others check my work?” If things go wrong at any time, there’s always “git revert”. I don’t think we need to codify any of this in our BYLAWS unless there is some sort of conflict, which for now there isn’t. If we feel the need to document the process I feel documenting it README/wiki entry should suffice. I’m more in favor of mutual trust among committers than hard and fast rules. Once a particular practice gets formalized in our bylaws, it can be very difficult to change. -Taylor On Aug 3, 2015, at 12:56 PM, Derek Dagit der...@yahoo-inc.com.INVALID wrote: Dealing with branches is a pain, and it is good we are paying attention to back-porting. It is good to bring it up for discussion, and I agree checking with those who do releases is a reasonable thing to do. I do not think there are special restrictions on back-porting fixes to previous branches. I would be comfortable with the normal rules for a pull request. Effort is one cost, and we could eventually run into some more challenging merge conflicts as well. There are multiple things to consider, and I think it is a judgment call. On the other hand, if it does become clear that clarifying principles helpful in our BYLAWS, then I am all for it. If we commit to supporting specific branches with certain kinds of fixes, then we need to stick to such a commitment. -- Derek - Original Message - From: Parth Brahmbhatt pbrahmbh...@hortonworks.com To: dev@storm.apache.org dev@storm.apache.org Cc: Sent: Monday, August 3, 2015 11:26 AM Subject: Re: [DISCUSS] Backport bugfixes (to 0.10.x / 0.9.x) Given how huge 0.10 release was I feel trying to back port all bug fixes and testing that it does not brake something else might turn out to be a huge PITA. I think going with a stable 0.10 release might be the best solution for now. I don’t think back porting requires confirmation however given we will probably have to do release for each version where back porting was done it is probably best to notify Release manager and discuss options. I agree having a rule/bylaw would help clarify things for future. Thanks Parth On 8/2/15, 4:30 PM, 임정택 kabh...@gmail.com wrote: Bump. Does anyone have opinions about this? I already did back-port some bugfixes (not in list) into 0.10.x and 0.9.x lines, but I'm not 100% sure that it is preferred way. Seems like we don't have explicit rules about doing back-port. Only thing I know is Taylor was (or has been) a gatekeeper. Now I really want to know that it still need to be confirmed by Taylor before doing back-port. Thanks, Jungtaek Lim (HeartSaVioR) 2015-07-28 8:27 GMT+09:00 임정택 kabh...@gmail.com: Hi all, Recently I see many bugfixes are only merged to master, or 0.10.x-branch. Since 0.10.0-beta1 introduces huge changeset, and it contains a lot of bugfixes, I think we can consider backporting them to 0.9.x-branch before releasing 0.9.6. I create a sheet and write down bugfix issues which could be backported, and status of issue. (what versions it is applied, and what versions it can be applied) https://docs.google.com/spreadsheets/d/1KQrOlqk1hlE2oDmXFY34lJaY0PU7V5uxq 9U1vfIhLq4/edit?usp=sharing Please let me know whenever you find missing spots or wrong contents. There seems to be other approach: - release stable version of 0.10.0, and drop plan to release 0.9.6 so that let all users who want bugfix release move to 0.10.0 Since a lot of bugfix issues are waiting for backporting, alternative approach may be make sense. I'm open to hear any thoughts, so please share your opinions. Thanks,