Re: Storm 0.9.0.1 and Zookeeper 3.4.5 hung issue.
Some changes to storm code are necessary for this. See https://github.com/apache/incubator-storm/pull/29/files -- Derek On 2/14/14, 11:50, Saurabh Agarwal (BLOOMBERG/ 731 LEXIN) wrote: Thanks Bijoy for reply. We can't downgrade to 3.3.3 as our system has zookeeper 3.4.5 server running. and we would like to keep same version of zookeeper client to avoid any incompatibility issues. The error we are getting with 3.4.5 is. Caused by: java.lang.ClassNotFoundException: org.apache.zookeeper.server.NIOServerCnxn$Factory After looking at zookeeper code, static Factory class within NIOSeverCnxn class has been removed in 3.4.5 version. zookeeper version 3.3.3 is 3 years old. Should not Storm be updated the code to run with zookeeper latest version. Should I create a jira for this? - Original Message - From: user@storm.incubator.apache.org To: SAURABH AGARWAL (BLOOMBERG/ 731 LEXIN), user@storm.incubator.apache.org At: Feb 14 2014 11:45:50 Hi, We had also downgraded zookeeper from 3.4.5 to 3.3.3 due to issues with Storm.But we are not facing any issues related to Kafka after the downgrade.We are using Storm 0.9.0-rc2 and Kafka 0.8.0. Thanks Bijoy On Fri, Feb 14, 2014 at 9:57 PM, Saurabh Agarwal (BLOOMBERG/ 731 LEXIN) sagarwal...@bloomberg.net wrote: Hi, Storm 0.9.0.1 client linked with zookeeper 3.4.5 library hung on zookeeper initialize. Is it known issue? 453 [main] INFO org.apache.zookeeper.server.ZooKeeperServer - init - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 4 datadir /tmp/7b520ac7-ff87-4eb6-9fc5-3a16deec0272/version-2 snapdir /tmp/7b520ac7-ff87-4eb6-9fc5-3a16deec0272/version-2 The client works fine with zookeeper 3.3.3. As we are using storm with kafka, kafka does not work with zookeeper 3.3.3 but work with 3.4.5. any help is appreciated... Thanks, Saurabh.
Re: How to consume from last offset when topology restarts(STORM-KAFKA)
I have exactly the same question. I am using kafka spout from https://github.com/wurstmeister/storm-kafka-0.8-plus.git with kafka 0.8 release and ordinary (non-trident) storm topology. How can I guarantee processing of messages sent while topology was down or while e.g. storm cluster was down for maintenance? -- Andrey Yegorov On Wed, Feb 12, 2014 at 8:05 AM, Danijel Schiavuzzi dschi...@gmail.comwrote: Hi Chitra, Which Kafka spout version are you exactly using, and what spout type -- Trident or the ordinary Storm spout? I ask that because, unfortunately, there are multiple Kafka spout versions around the web. According to my research, your best bet is the one in storm-contrib in case you use Kafka version 0.7, and storm-kafka-0.8-plus in case you use Kafka 0.8. Best regards, Danijel Schiavuzzi www.schiavuzzi.com On Wed, Feb 12, 2014 at 8:42 AM, Chitra Raveendran chitra.raveend...@flutura.com wrote: Hi I have a topology in production which uses the default kafka spout, I have set this parameter *spoutConfig.forceStartOffsetTime(-1);* This parameter -1 helps me in such a way that, it consumes from the latest message, and doesn't start reading data from kafka right from the beginning (That would be unnecessary and redundant in my usecase). But in production, whenever a new release goes in, I stop and start the topology which would take a few seconds to minutes. I have been loosing out on some data during the time that the topology is down. How can I avoid this. I have tried running without the ForcedOffsetTime parameter, but that did not work. What am I doing wrong, how can I continue reading from the last offest ? Thanks, Chitra -- Danijel Schiavuzzi
Re: Storm 0.9.0.1 and Zookeeper 3.4.5 hung issue.
Thanks Derek. I think that is what I need. Let me build with your suggested changes... - Original Message - From: user@storm.incubator.apache.org To: SAURABH AGARWAL (BLOOMBERG/ 731 LEXIN), user@storm.incubator.apache.org At: Feb 14 2014 13:17:31 Some changes to storm code are necessary for this. See https://github.com/apache/incubator-storm/pull/29/files -- Derek On 2/14/14, 11:50, Saurabh Agarwal (BLOOMBERG/ 731 LEXIN) wrote: Thanks Bijoy for reply. We can't downgrade to 3.3.3 as our system has zookeeper 3.4.5 server running. and we would like to keep same version of zookeeper client to avoid any incompatibility issues. The error we are getting with 3.4.5 is. Caused by: java.lang.ClassNotFoundException: org.apache.zookeeper.server.NIOServerCnxn$Factory After looking at zookeeper code, static Factory class within NIOSeverCnxn class has been removed in 3.4.5 version. zookeeper version 3.3.3 is 3 years old. Should not Storm be updated the code to run with zookeeper latest version. Should I create a jira for this? - Original Message - From: user@storm.incubator.apache.org To: SAURABH AGARWAL (BLOOMBERG/ 731 LEXIN), user@storm.incubator.apache.org At: Feb 14 2014 11:45:50 Hi, We had also downgraded zookeeper from 3.4.5 to 3.3.3 due to issues with Storm.But we are not facing any issues related to Kafka after the downgrade.We are using Storm 0.9.0-rc2 and Kafka 0.8.0. Thanks Bijoy On Fri, Feb 14, 2014 at 9:57 PM, Saurabh Agarwal (BLOOMBERG/ 731 LEXIN) sagarwal...@bloomberg.net wrote: Hi, Storm 0.9.0.1 client linked with zookeeper 3.4.5 library hung on zookeeper initialize. Is it known issue? 453 [main] INFO org.apache.zookeeper.server.ZooKeeperServer - init - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 4 datadir /tmp/7b520ac7-ff87-4eb6-9fc5-3a16deec0272/version-2 snapdir /tmp/7b520ac7-ff87-4eb6-9fc5-3a16deec0272/version-2 The client works fine with zookeeper 3.3.3. As we are using storm with kafka, kafka does not work with zookeeper 3.3.3 but work with 3.4.5. any help is appreciated... Thanks, Saurabh.
Re: Storm 0.9.0.1 and Zookeeper 3.4.5 hung issue.
Look for project.clj in storm-core. -Taylor On Feb 14, 2014, at 6:12 PM, Saurabh Agarwal (BLOOMBERG/ 731 LEXIN) sagarwal...@bloomberg.net wrote: After adding the changes, I am building storm 0.9.0.1 source code using Leiningen . Can anyone point me out from where lein pick up the classpath? I am trying to change from zookeeper 3.3.3 to 3.4.5, but unable to find how to change it. - Original Message - From: der...@yahoo-inc.com To: SAURABH AGARWAL (BLOOMBERG/ 731 LEXIN), user@storm.incubator.apache.org At: Feb 14 2014 13:17:04 Some changes to storm code are necessary for this. See https://github.com/apache/incubator-storm/pull/29/files -- Derek On 2/14/14, 11:50, Saurabh Agarwal (BLOOMBERG/ 731 LEXIN) wrote: Thanks Bijoy for reply. We can't downgrade to 3.3.3 as our system has zookeeper 3.4.5 server running. and we would like to keep same version of zookeeper client to avoid any incompatibility issues. The error we are getting with 3.4.5 is. Caused by: java.lang.ClassNotFoundException: org.apache.zookeeper.server.NIOServerCnxn$Factory After looking at zookeeper code, static Factory class within NIOSeverCnxn class has been removed in 3.4.5 version. zookeeper version 3.3.3 is 3 years old. Should not Storm be updated the code to run with zookeeper latest version. Should I create a jira for this? - Original Message - From: user@storm.incubator.apache.org To: SAURABH AGARWAL (BLOOMBERG/ 731 LEXIN), user@storm.incubator.apache.org At: Feb 14 2014 11:45:50 Hi, We had also downgraded zookeeper from 3.4.5 to 3.3.3 due to issues with Storm.But we are not facing any issues related to Kafka after the downgrade.We are using Storm 0.9.0-rc2 and Kafka 0.8.0. Thanks Bijoy On Fri, Feb 14, 2014 at 9:57 PM, Saurabh Agarwal (BLOOMBERG/ 731 LEXIN) sagarwal...@bloomberg.net wrote: Hi, Storm 0.9.0.1 client linked with zookeeper 3.4.5 library hung on zookeeper initialize. Is it known issue? 453 [main] INFO org.apache.zookeeper.server.ZooKeeperServer - init - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 4 datadir /tmp/7b520ac7-ff87-4eb6-9fc5-3a16deec0272/version-2 snapdir /tmp/7b520ac7-ff87-4eb6-9fc5-3a16deec0272/version-2 The client works fine with zookeeper 3.3.3. As we are using storm with kafka, kafka does not work with zookeeper 3.3.3 but work with 3.4.5. any help is appreciated... Thanks, Saurabh.
Streaming DRPC?
Hello, I noticed that the DRPC client only allows a single response (and for that response to be JSON encoded). I was hoping to implement some sort of DRPC stream where I a constant stream of data based on a given query. Also, is there a way to serialize the response using Kryo instead of JSON? The specific format of my data is not very JSON friendly. Cheers, Carl
Re: storm jar slowness
thanks for the pointer...it looks like it is using default 15K chunk sizes, I'll see if tweaking that has any effect. On Fri, Feb 14, 2014 at 5:48 PM, Jon Logan jmlo...@buffalo.edu wrote: The code that uploads it is at backtype.storm.StormSubmitter#submitJar(java.util.Map, java.lang.String). It looks like it's just a simple upload over Thrift...SCP is specifically designed for file uploads, and is probably better-tuned for large transfers, through compression, or whatever other means. You could just upload the Jar using SCP, and then submit it from the server itself. I think many (most?) use cases are submitted on a local network, where upload speed is not a concern. On Fri, Feb 14, 2014 at 5:18 PM, Adam Lewis superca...@gmail.com wrote: I've seen this issue raised on the list in the past, but with no clear suggestions: storm jar is very slow at sending the jar file, averaging about 250 KB/s between my system and EC2...is there some reason for this in the way storm sends the jar? scp goes about 2.5 MB/s, the same 10x difference I've seen reported previously. I'm using storm 0.9.0.1 Any ideas? Thanks, Adam
Storm topology jar with gradle
Hi, I want to build storm topology jar using gradle. I have seen https://groups.google.com/forum/#!searchin/storm-user/gradle/storm-user/ogRxhLL-pHE/RQ4UcwcISIAJ and http://forums.gradle.org/gradle/topics/removing_dependencies_from_a_jar_file_during_jar_task but problem is I've eclipse project, with other projects dependencies. My project contains many topologies in different packages. At one particular point of time I want to build only one topology jar I want to pass package name for stormjar as argument while building topology jar. Something like this gradle stormjar my.package.topologies.wordcount During building this jar I also want to exclude storm jars and include only necessary dependant jars. Otherwise as of now I am building fat jar for whole project it's size is more than 80MB including other dependant jars and projects. Which I think not necessary for storm topology jar. But Gradle should be intelligent enough to resolve dependencies (ex. DTO of other projects, log4j,slf4j jars) for my topology classes and add them in output jar. I hope I am clear with question. Any help is very much appreciated.
Re: storm jar slowness
This is promising, 150KB chunk size is giving me over 1 MB/sec; 300KB chunk size up to 2 MB/sec although spikier performance. My experimental setup leaves a lot to be desired here, but it seems pretty conclusive that 15KB is not optimal (at least over-the-internet). As far as I can tell the only downside of larger chunks is that thrift keeps an entire chunk in memory. Something in the 100 to 200 KB range seems reasonable. Any thoughts? Perhaps I should open a JIRA ticket for this. As for use cases: I'm sure in production everything will be on the same network. For me, it is the transition from using local cluster to deploying to a real cluster (which happens to run on AWS across the Internet from my dev machine) and dealing with classpath, serialization and other fun issues that don't crop up in local mode...and my fat jar hasn't been put on a diet yet so it is large...all of which adds up to long code/build/test cycle times. But, increasing chunk size is really helping, and the 15KB default seems arbitrary. If this seems reasonable I'll file a JIRA and a PR. On Fri, Feb 14, 2014 at 7:57 PM, Adam Lewis gm...@adamlewis.com wrote: thanks for the pointer...it looks like it is using default 15K chunk sizes, I'll see if tweaking that has any effect. On Fri, Feb 14, 2014 at 5:48 PM, Jon Logan jmlo...@buffalo.edu wrote: The code that uploads it is at backtype.storm.StormSubmitter#submitJar(java.util.Map, java.lang.String). It looks like it's just a simple upload over Thrift...SCP is specifically designed for file uploads, and is probably better-tuned for large transfers, through compression, or whatever other means. You could just upload the Jar using SCP, and then submit it from the server itself. I think many (most?) use cases are submitted on a local network, where upload speed is not a concern. On Fri, Feb 14, 2014 at 5:18 PM, Adam Lewis superca...@gmail.com wrote: I've seen this issue raised on the list in the past, but with no clear suggestions: storm jar is very slow at sending the jar file, averaging about 250 KB/s between my system and EC2...is there some reason for this in the way storm sends the jar? scp goes about 2.5 MB/s, the same 10x difference I've seen reported previously. I'm using storm 0.9.0.1 Any ideas? Thanks, Adam
Re: conflicting java library problem
It looks like a similar topic was discusses regarding httpclient here: http://mail-archives.apache.org/mod_mbox/incubator-storm-user/201402.mbox/%3CCAKgXx6LWetozLvv6RCHYamD0gXwNvLMi9-NnhM4ghQn%3DC5%2BnfA%40mail.gmail.com%3E I'll look into the shade plugin with Maven. On Fri, Feb 14, 2014 at 7:46 PM, Tom Gullo tomgu...@gmail.com wrote: I've got bolts that are using the Solrj (solr-solrj) library and the component and version Solrj uses - 'org.apache.httpcomponents:httpclient:4.2.3' - conflicts with the version Storm uses, version 4.1.1. Because Solr is expecting a method (SchemeRegistryFactory.createSystemDefault) it throws No Such Method error. I can't exclude 4.1.1 as far as I can see because Storm is not part of my uber jar. Any ideas on how to solve this problem? Are the Storm libraries going to be updated anytime soon. 4.1.1 of httpclient is about 2 years old. Thanks -Tom
How to identify Transaction success in IPartitionedTridentSpout?
Hi, I have written a PartitionedSpout implementing the IPartitionedTridentSpout interface. The IBatchSpout interface has a method void ack(long batchId); which indicates completion of a transaction. What is the equivalent methododology to receive Ack for transactions, when using IPartitionedTridentSpout? I need to build some logic to trigger an external system on completion of a few transactions. Thanks Regards MK
Re: storm jar slowness
Great. STORM-241 has been filed. https://issues.apache.org/jira/browse/STORM-241 On Fri, Feb 14, 2014 at 8:42 PM, Nathan Marz nat...@nathanmarz.com wrote: Yes, it is arbitrary. Opening an issue for this is a good idea. On Fri, Feb 14, 2014 at 5:24 PM, Adam Lewis gm...@adamlewis.com wrote: This is promising, 150KB chunk size is giving me over 1 MB/sec; 300KB chunk size up to 2 MB/sec although spikier performance. My experimental setup leaves a lot to be desired here, but it seems pretty conclusive that 15KB is not optimal (at least over-the-internet). As far as I can tell the only downside of larger chunks is that thrift keeps an entire chunk in memory. Something in the 100 to 200 KB range seems reasonable. Any thoughts? Perhaps I should open a JIRA ticket for this. As for use cases: I'm sure in production everything will be on the same network. For me, it is the transition from using local cluster to deploying to a real cluster (which happens to run on AWS across the Internet from my dev machine) and dealing with classpath, serialization and other fun issues that don't crop up in local mode...and my fat jar hasn't been put on a diet yet so it is large...all of which adds up to long code/build/test cycle times. But, increasing chunk size is really helping, and the 15KB default seems arbitrary. If this seems reasonable I'll file a JIRA and a PR. On Fri, Feb 14, 2014 at 7:57 PM, Adam Lewis gm...@adamlewis.com wrote: thanks for the pointer...it looks like it is using default 15K chunk sizes, I'll see if tweaking that has any effect. On Fri, Feb 14, 2014 at 5:48 PM, Jon Logan jmlo...@buffalo.edu wrote: The code that uploads it is at backtype.storm.StormSubmitter#submitJar(java.util.Map, java.lang.String). It looks like it's just a simple upload over Thrift...SCP is specifically designed for file uploads, and is probably better-tuned for large transfers, through compression, or whatever other means. You could just upload the Jar using SCP, and then submit it from the server itself. I think many (most?) use cases are submitted on a local network, where upload speed is not a concern. On Fri, Feb 14, 2014 at 5:18 PM, Adam Lewis superca...@gmail.comwrote: I've seen this issue raised on the list in the past, but with no clear suggestions: storm jar is very slow at sending the jar file, averaging about 250 KB/s between my system and EC2...is there some reason for this in the way storm sends the jar? scp goes about 2.5 MB/s, the same 10x difference I've seen reported previously. I'm using storm 0.9.0.1 Any ideas? Thanks, Adam -- Twitter: @nathanmarz http://nathanmarz.com
[Need help] [How to benchmark storm topologies]
I wish to benchmark two topologies for storm, one is stateless and doesn't store any intermediate state and tuples are replayed and re-processed when failure occurs. Whereas the other is stateful and intermediate state is saved. I want to see the performance in both the cases with equal amount of failures. How should I go about it ? Any help is appreciated. Thanks, *Abhishek Bhattacharjee* *Pune Institute of Computer Technology*
Re: [Need help] [How to benchmark storm topologies]
We instrument our spout and bolts using statsd and graphite due to its ease of use and flexibility. Its UDP and won't add additional risk or vulnerabilities to your topologies. On Feb 15, 2014 1:05 AM, Abhishek Bhattacharjee abhishek.bhattacharje...@gmail.com wrote: I wish to benchmark two topologies for storm, one is stateless and doesn't store any intermediate state and tuples are replayed and re-processed when failure occurs. Whereas the other is stateful and intermediate state is saved. I want to see the performance in both the cases with equal amount of failures. How should I go about it ? Any help is appreciated. Thanks, *Abhishek Bhattacharjee* *Pune Institute of Computer Technology*