Re: Storm 0.9.0.1 and Zookeeper 3.4.5 hung issue.

2014-02-14 Thread Derek Dagit

Some changes to storm code are necessary for this.

See https://github.com/apache/incubator-storm/pull/29/files
--
Derek

On 2/14/14, 11:50, Saurabh Agarwal (BLOOMBERG/ 731 LEXIN) wrote:

Thanks Bijoy for reply.

We can't downgrade to 3.3.3 as our system has zookeeper 3.4.5 server running. 
and we would like to keep same version of zookeeper client to avoid any 
incompatibility issues.

The error we are getting with 3.4.5 is.
Caused by: java.lang.ClassNotFoundException: 
org.apache.zookeeper.server.NIOServerCnxn$Factory

After looking at zookeeper code, static Factory class within NIOSeverCnxn class 
has been removed in 3.4.5 version.

zookeeper version 3.3.3 is 3 years old. Should not Storm be updated the code to 
run with zookeeper latest version. Should I create a jira for this?


- Original Message -
From: user@storm.incubator.apache.org
To: SAURABH AGARWAL (BLOOMBERG/ 731 LEXIN), user@storm.incubator.apache.org
At: Feb 14 2014 11:45:50

Hi,

We had also downgraded zookeeper from 3.4.5 to 3.3.3 due to issues with
Storm.But we are not facing any issues related to Kafka after the
downgrade.We are using Storm 0.9.0-rc2 and Kafka 0.8.0.

Thanks
Bijoy

On Fri, Feb 14, 2014 at 9:57 PM, Saurabh Agarwal (BLOOMBERG/ 731 LEXIN) 
sagarwal...@bloomberg.net wrote:


Hi,

Storm 0.9.0.1 client linked with zookeeper 3.4.5 library hung on zookeeper
initialize. Is it known issue?

453  [main] INFO  org.apache.zookeeper.server.ZooKeeperServer - init  -
Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout
4 datadir /tmp/7b520ac7-ff87-4eb6-9fc5-3a16deec0272/version-2 snapdir
/tmp/7b520ac7-ff87-4eb6-9fc5-3a16deec0272/version-2

The client works fine with zookeeper 3.3.3. As we are using storm with
kafka, kafka does not work with zookeeper 3.3.3 but work with 3.4.5.

any help is appreciated...
Thanks,
Saurabh.




Re: How to consume from last offset when topology restarts(STORM-KAFKA)

2014-02-14 Thread Andrey Yegorov
I have exactly the same question.
I am using kafka spout from
https://github.com/wurstmeister/storm-kafka-0.8-plus.git with kafka 0.8
release and ordinary (non-trident) storm topology.

How can I guarantee processing of messages sent while topology was down or
while e.g. storm cluster was down for maintenance?

--
Andrey Yegorov


On Wed, Feb 12, 2014 at 8:05 AM, Danijel Schiavuzzi dschi...@gmail.comwrote:

 Hi Chitra,

 Which Kafka spout version are you exactly using, and what spout type --
 Trident or the ordinary Storm spout?

 I ask that because, unfortunately, there are multiple Kafka spout versions
 around the web. According to my research, your best bet is the one in
 storm-contrib in case you use Kafka version 0.7, and storm-kafka-0.8-plus
 in case you use Kafka 0.8.

 Best regards,

 Danijel Schiavuzzi
 www.schiavuzzi.com


 On Wed, Feb 12, 2014 at 8:42 AM, Chitra Raveendran 
 chitra.raveend...@flutura.com wrote:

 Hi

 I have a topology in production which uses the default kafka spout, I
 have set this parameter
 *spoutConfig.forceStartOffsetTime(-1);*

 This parameter -1 helps me in such a way that, it consumes from the
 latest message, and doesn't start reading data from kafka right from the
 beginning (That would be unnecessary and redundant in my usecase).

 But in production, whenever a new release goes in, I stop and start the
 topology which would take a few seconds to minutes. I have been loosing out
 on some data during the time that the topology is down.

 How can I avoid this. I have tried running without the ForcedOffsetTime
 parameter, but that did not work. What am I doing wrong, how can I continue
 reading from the last offest ?

 Thanks,
 Chitra




 --
 Danijel Schiavuzzi



Re: Storm 0.9.0.1 and Zookeeper 3.4.5 hung issue.

2014-02-14 Thread Saurabh Agarwal (BLOOMBERG/ 731 LEXIN)
Thanks Derek. I think that is what I need. Let me build with your suggested 
changes...

- Original Message -
From: user@storm.incubator.apache.org
To: SAURABH AGARWAL (BLOOMBERG/ 731 LEXIN), user@storm.incubator.apache.org
At: Feb 14 2014 13:17:31

Some changes to storm code are necessary for this.

See https://github.com/apache/incubator-storm/pull/29/files
-- 
Derek

On 2/14/14, 11:50, Saurabh Agarwal (BLOOMBERG/ 731 LEXIN) wrote:
 Thanks Bijoy for reply.

 We can't downgrade to 3.3.3 as our system has zookeeper 3.4.5 server running. 
 and we would like to keep same version of zookeeper client to avoid any 
 incompatibility issues.

 The error we are getting with 3.4.5 is.
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.zookeeper.server.NIOServerCnxn$Factory

 After looking at zookeeper code, static Factory class within NIOSeverCnxn 
 class has been removed in 3.4.5 version.

 zookeeper version 3.3.3 is 3 years old. Should not Storm be updated the code 
 to run with zookeeper latest version. Should I create a jira for this?


 - Original Message -
 From: user@storm.incubator.apache.org
 To: SAURABH AGARWAL (BLOOMBERG/ 731 LEXIN), user@storm.incubator.apache.org
 At: Feb 14 2014 11:45:50

 Hi,

 We had also downgraded zookeeper from 3.4.5 to 3.3.3 due to issues with
 Storm.But we are not facing any issues related to Kafka after the
 downgrade.We are using Storm 0.9.0-rc2 and Kafka 0.8.0.

 Thanks
 Bijoy

 On Fri, Feb 14, 2014 at 9:57 PM, Saurabh Agarwal (BLOOMBERG/ 731 LEXIN) 
 sagarwal...@bloomberg.net wrote:

 Hi,

 Storm 0.9.0.1 client linked with zookeeper 3.4.5 library hung on zookeeper
 initialize. Is it known issue?

 453  [main] INFO  org.apache.zookeeper.server.ZooKeeperServer - init  -
 Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout
 4 datadir /tmp/7b520ac7-ff87-4eb6-9fc5-3a16deec0272/version-2 snapdir
 /tmp/7b520ac7-ff87-4eb6-9fc5-3a16deec0272/version-2

 The client works fine with zookeeper 3.3.3. As we are using storm with
 kafka, kafka does not work with zookeeper 3.3.3 but work with 3.4.5.

 any help is appreciated...
 Thanks,
 Saurabh.




Re: Storm 0.9.0.1 and Zookeeper 3.4.5 hung issue.

2014-02-14 Thread P. Taylor Goetz
Look for project.clj in storm-core.

-Taylor

 On Feb 14, 2014, at 6:12 PM, Saurabh Agarwal (BLOOMBERG/ 731 LEXIN) 
 sagarwal...@bloomberg.net wrote:
 
 
 After adding the changes, I am building storm 0.9.0.1 source code using 
 Leiningen . Can anyone point me out from where lein pick up the classpath? I 
 am trying to change from zookeeper 3.3.3 to 3.4.5, but unable to find how to 
 change it. 
 
 
 - Original Message -
 From: der...@yahoo-inc.com
 To: SAURABH AGARWAL (BLOOMBERG/ 731 LEXIN), user@storm.incubator.apache.org
 At: Feb 14 2014 13:17:04
 
 Some changes to storm code are necessary for this.
 
 See https://github.com/apache/incubator-storm/pull/29/files
 -- 
 Derek
 
 On 2/14/14, 11:50, Saurabh Agarwal (BLOOMBERG/ 731 LEXIN) wrote:
 Thanks Bijoy for reply.
 
 We can't downgrade to 3.3.3 as our system has zookeeper 3.4.5 server 
 running. and we would like to keep same version of zookeeper client to avoid 
 any incompatibility issues.
 
 The error we are getting with 3.4.5 is.
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.zookeeper.server.NIOServerCnxn$Factory
 
 After looking at zookeeper code, static Factory class within NIOSeverCnxn 
 class has been removed in 3.4.5 version.
 
 zookeeper version 3.3.3 is 3 years old. Should not Storm be updated the code 
 to run with zookeeper latest version. Should I create a jira for this?
 
 
 - Original Message -
 From: user@storm.incubator.apache.org
 To: SAURABH AGARWAL (BLOOMBERG/ 731 LEXIN), user@storm.incubator.apache.org
 At: Feb 14 2014 11:45:50
 
 Hi,
 
 We had also downgraded zookeeper from 3.4.5 to 3.3.3 due to issues with
 Storm.But we are not facing any issues related to Kafka after the
 downgrade.We are using Storm 0.9.0-rc2 and Kafka 0.8.0.
 
 Thanks
 Bijoy
 
 On Fri, Feb 14, 2014 at 9:57 PM, Saurabh Agarwal (BLOOMBERG/ 731 LEXIN) 
 sagarwal...@bloomberg.net wrote:
 
 Hi,
 
 Storm 0.9.0.1 client linked with zookeeper 3.4.5 library hung on zookeeper
 initialize. Is it known issue?
 
 453  [main] INFO  org.apache.zookeeper.server.ZooKeeperServer - init  -
 Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout
 4 datadir /tmp/7b520ac7-ff87-4eb6-9fc5-3a16deec0272/version-2 snapdir
 /tmp/7b520ac7-ff87-4eb6-9fc5-3a16deec0272/version-2
 
 The client works fine with zookeeper 3.3.3. As we are using storm with
 kafka, kafka does not work with zookeeper 3.3.3 but work with 3.4.5.
 
 any help is appreciated...
 Thanks,
 Saurabh.
 


Streaming DRPC?

2014-02-14 Thread Carl Lerche
Hello,

I noticed that the DRPC client only allows a single response (and for
that response to be JSON encoded). I was hoping to implement some sort
of DRPC stream where I a constant stream of data based on a given
query. Also, is there a way to serialize the response using Kryo
instead of JSON? The specific format of my data is not very JSON
friendly.

Cheers,
Carl


Re: storm jar slowness

2014-02-14 Thread Adam Lewis
thanks for the pointer...it looks like it is using default 15K chunk sizes,
I'll see if tweaking that has any effect.


On Fri, Feb 14, 2014 at 5:48 PM, Jon Logan jmlo...@buffalo.edu wrote:

 The code that uploads it is
 at backtype.storm.StormSubmitter#submitJar(java.util.Map,
 java.lang.String). It looks like it's just a simple upload over
 Thrift...SCP is specifically designed for file uploads, and is probably
 better-tuned for large transfers, through compression, or whatever other
 means.

 You could just upload the Jar using SCP, and then submit it from the
 server itself. I think many (most?) use cases are submitted on a local
 network, where upload speed is not a concern.


 On Fri, Feb 14, 2014 at 5:18 PM, Adam Lewis superca...@gmail.com wrote:

 I've seen this issue raised on the list in the past, but with no clear
 suggestions:

 storm jar is very slow at sending the jar file, averaging about 250
 KB/s between my system and EC2...is there some reason for this in the way
 storm sends the jar?  scp goes about 2.5 MB/s, the same 10x difference I've
 seen reported previously.

 I'm using storm 0.9.0.1

 Any ideas?

 Thanks,
 Adam





Storm topology jar with gradle

2014-02-14 Thread Kavi Kumar
Hi,

I want to build storm topology jar using gradle. I have seen
https://groups.google.com/forum/#!searchin/storm-user/gradle/storm-user/ogRxhLL-pHE/RQ4UcwcISIAJ

and
http://forums.gradle.org/gradle/topics/removing_dependencies_from_a_jar_file_during_jar_task

but problem is I've eclipse project, with other projects dependencies. My
project contains many topologies in different packages. At one particular
point of time I want to build only one topology jar

I want to pass package name for stormjar as argument while building
topology jar.

Something like this  gradle stormjar my.package.topologies.wordcount

During building this jar I also want to exclude storm jars and include only
necessary dependant jars.
Otherwise as of now I am building fat jar for whole project it's size is
more than 80MB including other dependant jars and projects. Which I think
not necessary for storm topology jar.

But Gradle should be intelligent enough to resolve dependencies (ex. DTO of
other projects, log4j,slf4j jars) for my topology classes and add them in
output jar.

I hope I am clear with question.

Any help is very much appreciated.


Re: storm jar slowness

2014-02-14 Thread Adam Lewis
This is promising, 150KB chunk size is giving me over 1 MB/sec; 300KB chunk
size up to 2 MB/sec although spikier performance.  My experimental setup
leaves a lot to be desired here, but it seems pretty conclusive that 15KB
is not optimal (at least over-the-internet).  As far as I can tell the only
downside of larger chunks is that thrift keeps an entire chunk in memory.

Something in the 100 to 200 KB range seems reasonable.  Any thoughts?
 Perhaps I should open a JIRA ticket for this.

As for use cases: I'm sure in production everything will be on the same
network.  For me, it is the transition from using local cluster to
deploying to a real cluster (which happens to run on AWS across the
Internet from my dev machine) and dealing with classpath, serialization and
other fun issues that don't crop up in local mode...and my fat jar hasn't
been put on a diet yet so it is large...all of which adds up to long
code/build/test cycle times.

But, increasing chunk size is really helping, and the 15KB default seems
arbitrary.  If this seems reasonable I'll file a JIRA and a PR.


On Fri, Feb 14, 2014 at 7:57 PM, Adam Lewis gm...@adamlewis.com wrote:

 thanks for the pointer...it looks like it is using default 15K chunk
 sizes, I'll see if tweaking that has any effect.


 On Fri, Feb 14, 2014 at 5:48 PM, Jon Logan jmlo...@buffalo.edu wrote:

 The code that uploads it is
 at backtype.storm.StormSubmitter#submitJar(java.util.Map,
 java.lang.String). It looks like it's just a simple upload over
 Thrift...SCP is specifically designed for file uploads, and is probably
 better-tuned for large transfers, through compression, or whatever other
 means.

 You could just upload the Jar using SCP, and then submit it from the
 server itself. I think many (most?) use cases are submitted on a local
 network, where upload speed is not a concern.


 On Fri, Feb 14, 2014 at 5:18 PM, Adam Lewis superca...@gmail.com wrote:

 I've seen this issue raised on the list in the past, but with no clear
 suggestions:

 storm jar is very slow at sending the jar file, averaging about 250
 KB/s between my system and EC2...is there some reason for this in the way
 storm sends the jar?  scp goes about 2.5 MB/s, the same 10x difference I've
 seen reported previously.

 I'm using storm 0.9.0.1

 Any ideas?

 Thanks,
 Adam






Re: conflicting java library problem

2014-02-14 Thread Tom Gullo
It looks like a similar topic was discusses regarding httpclient here:

http://mail-archives.apache.org/mod_mbox/incubator-storm-user/201402.mbox/%3CCAKgXx6LWetozLvv6RCHYamD0gXwNvLMi9-NnhM4ghQn%3DC5%2BnfA%40mail.gmail.com%3E

I'll look into the shade plugin with Maven.



On Fri, Feb 14, 2014 at 7:46 PM, Tom Gullo tomgu...@gmail.com wrote:

 I've got bolts that are using the Solrj (solr-solrj) library and the
 component and version Solrj uses -
 'org.apache.httpcomponents:httpclient:4.2.3' - conflicts with the version
 Storm uses, version 4.1.1.  Because Solr is expecting a method
 (SchemeRegistryFactory.createSystemDefault) it throws No Such Method
 error.  I can't exclude 4.1.1 as far as I can see because Storm is not part
 of my uber jar.

 Any ideas on how to solve this problem?  Are the Storm libraries going to
 be updated anytime soon.  4.1.1 of httpclient is about 2 years old.

 Thanks

  -Tom





How to identify Transaction success in IPartitionedTridentSpout?

2014-02-14 Thread Karthikeyan Muthukumarasamy
Hi,
I have written a PartitionedSpout implementing the IPartitionedTridentSpout
interface.
The IBatchSpout interface has a method
void ack(long batchId);
which indicates completion of a transaction. What is the equivalent
methododology to receive Ack for transactions, when using
IPartitionedTridentSpout?
I need to build some logic to trigger an external system on completion of a
few transactions.
Thanks  Regards
MK


Re: storm jar slowness

2014-02-14 Thread Adam Lewis
Great.  STORM-241 has been filed.

https://issues.apache.org/jira/browse/STORM-241


On Fri, Feb 14, 2014 at 8:42 PM, Nathan Marz nat...@nathanmarz.com wrote:

 Yes, it is arbitrary. Opening an issue for this is a good idea.


 On Fri, Feb 14, 2014 at 5:24 PM, Adam Lewis gm...@adamlewis.com wrote:

 This is promising, 150KB chunk size is giving me over 1 MB/sec; 300KB
 chunk size up to 2 MB/sec although spikier performance.  My experimental
 setup leaves a lot to be desired here, but it seems pretty conclusive that
 15KB is not optimal (at least over-the-internet).  As far as I can tell the
 only downside of larger chunks is that thrift keeps an entire chunk in
 memory.

 Something in the 100 to 200 KB range seems reasonable.  Any thoughts?
  Perhaps I should open a JIRA ticket for this.

 As for use cases: I'm sure in production everything will be on the same
 network.  For me, it is the transition from using local cluster to
 deploying to a real cluster (which happens to run on AWS across the
 Internet from my dev machine) and dealing with classpath, serialization and
 other fun issues that don't crop up in local mode...and my fat jar hasn't
 been put on a diet yet so it is large...all of which adds up to long
 code/build/test cycle times.

 But, increasing chunk size is really helping, and the 15KB default seems
 arbitrary.  If this seems reasonable I'll file a JIRA and a PR.


 On Fri, Feb 14, 2014 at 7:57 PM, Adam Lewis gm...@adamlewis.com wrote:

 thanks for the pointer...it looks like it is using default 15K chunk
 sizes, I'll see if tweaking that has any effect.


 On Fri, Feb 14, 2014 at 5:48 PM, Jon Logan jmlo...@buffalo.edu wrote:

 The code that uploads it is
 at backtype.storm.StormSubmitter#submitJar(java.util.Map,
 java.lang.String). It looks like it's just a simple upload over
 Thrift...SCP is specifically designed for file uploads, and is probably
 better-tuned for large transfers, through compression, or whatever other
 means.

 You could just upload the Jar using SCP, and then submit it from the
 server itself. I think many (most?) use cases are submitted on a local
 network, where upload speed is not a concern.


 On Fri, Feb 14, 2014 at 5:18 PM, Adam Lewis superca...@gmail.comwrote:

 I've seen this issue raised on the list in the past, but with no clear
 suggestions:

 storm jar is very slow at sending the jar file, averaging about 250
 KB/s between my system and EC2...is there some reason for this in the way
 storm sends the jar?  scp goes about 2.5 MB/s, the same 10x difference 
 I've
 seen reported previously.

 I'm using storm 0.9.0.1

 Any ideas?

 Thanks,
 Adam







 --
 Twitter: @nathanmarz
 http://nathanmarz.com



[Need help] [How to benchmark storm topologies]

2014-02-14 Thread Abhishek Bhattacharjee
I wish to benchmark two topologies for storm, one is stateless and doesn't
store any intermediate state and tuples are replayed and re-processed when
failure occurs. Whereas the other is stateful and intermediate state is
saved.

I want to see the performance in both the cases with equal amount of
failures.
How should I go about it ?
Any help is appreciated.

Thanks,

*Abhishek Bhattacharjee*
*Pune Institute of Computer Technology*


Re: [Need help] [How to benchmark storm topologies]

2014-02-14 Thread Mark Hu
We instrument our spout and bolts using statsd and graphite due to its ease
of use and flexibility. Its UDP and won't add additional risk or
vulnerabilities to your topologies.
On Feb 15, 2014 1:05 AM, Abhishek Bhattacharjee 
abhishek.bhattacharje...@gmail.com wrote:

 I wish to benchmark two topologies for storm, one is stateless and doesn't
 store any intermediate state and tuples are replayed and re-processed when
 failure occurs. Whereas the other is stateful and intermediate state is
 saved.

 I want to see the performance in both the cases with equal amount of
 failures.
 How should I go about it ?
 Any help is appreciated.

 Thanks,

 *Abhishek Bhattacharjee*
 *Pune Institute of Computer Technology*