Storm Performance

2014-01-10 Thread Klausen Schaefersinho
Hi,

how is the performance of a real cluster compared to a local development
cluster? I was trying to benchmark Storm now and found  that it only
manages to consume 10 events per second in a simple topology which only
consists out of i spout and one bolt where the spout creates random objects
and the bolt acknowledge the tuple.

10 events terrible slow, and I am not sure if this is related to the
develop mode.

Cheers,

Klaus


Re: Cassandra bolt

2014-01-10 Thread Vladi Feigin
Hi,
If you use Cassandra counters, eventually you will have 8 value in all
nodes.
3 will not override 5 or vice verse.
Certainly it's going to happen eventually and during some time you could be
possible seeing different values from different clients but finally it will
be 8
Vladi



On Mon, Jan 6, 2014 at 5:21 PM, Adrian Mocanu amoc...@verticalscope.comwrote:

  Hi

 I am actually looking into using CassandraCounterBatchingBolt but atm I’m
 not sure how Cassandra handles these eventual consistency issues so I need
 to research that. The reason I mention this issues is because I cannot find
 anywhere in the code where before a write there is a read .. which bothers
 me .. maybe Cassandra does it w counter columns? IDK.



 The issue I’m talking ab is updating the same counter consecutively, but
 faster than the updates propagate to  other Cassandra nodes.



 Example:

 Say I have 3 cassandra nodes. The counters on each of these nodes are 0.

 Node1:0, node2:0, node3:0



 An increment comes: 5

 5 - Node1:0, node2:0, node3:0



 Increment starts at node 5 – still needs to propagate to node1 and node3

 Node1:0, node2:5, node3:0



 In the meantime, another increment arrives before previous increment is
 propagated:

 3 - Node1:0, node2:5, node3:0



 Assuming 3 starts at a different node than where 5 started we have:

 Node1:3, node2:5, node3:0



 Now if 3 gets propagated to the other nodes AS AN INCREMENT and not as a
 new value (and the same for 5) then eventually they would all equal 8 and
 this is what I want.



 If 3 overwrites 5 (because it has a later timestamp) this is problematic –
 not what I want.



 Will see what the Cassandra group says... or if the creators of
 CassandraCounterBatchingBolt is on this group please let me know J



 Thanks

 Adrian





 *From:* Vladi Feigin [mailto:vladi...@gmail.com]
 *Sent:* January-04-14 2:00 AM

 *To:* user@storm.incubator.apache.org
 *Subject:* Re: Cassandra bolt



 Hi Adrian,



 Why you don't use C* counters? Looks like your scenario fits for this. I
 think CassandraCounterBatchingBolt provides  what you need

 Vladi



 On Fri, Jan 3, 2014 at 11:00 PM, Adrian Mocanu amoc...@verticalscope.com
 wrote:

  Happy New Year all!



 I'm working on a solution for the following scenario: I have tuples coming
 to a cassandra bolt. The tuples are of this form: TupleData(String name,
 Int count, Long time) Time field is unique per batch only but not overall
 because some tuples may come in late but have the same name and time but
 different count.



 For example:

 I can receive these tuples for the same time: (x1,3,), (x2,4,)

 Then the bolt may receive (x1,5,)

 After these are put in cassandra, column family x1 should have value 8 for
 time  and column family x2 should have value 4 for time 



 Caching aside, cassandra bolt needs to check if there is a count already
 in the db for the tuple with given name and time. If it does exist then
 retrieve, increment it with newly received value, and update db exntry w
 the new value. (At this point I'm not sure if update or delete+reinsert is
 speedier)

 If no db entry exists, then add the new tuple.



 I've looked at cassandra bolts code from
 https://github.com/hmsonline/storm-cassandra/tree/master/src/main/java/com/hmsonline/storm/cassandra/bolt

 which is the same as cassandra bolt from storm-contrib.



 There is a class CassandraCounterBatchingBolt, but after looking at it I
 don't believe it does the look up in db first before saving the value to
 db, which leads me to believe that this will not work.



 What I'm looking for seems pretty basic and I wonder if there is a
 cassandra bolt to do db lookup before updating db. Does such a bolt exist
 open-sourced?

 Otherwise I'm thinking of building mine on top of CassandraBatchingBolt.



 -Adrian







Re: How does one submit a topology in the inactive state?

2014-01-10 Thread Jon Logan
Just keep in mind, if you have a slow prepare method -- and this is what
you're trying to do to solve it (minimizing the topology gap), Storm *will
not* call your prepare method until the topology is active.


Re: Storm Performance

2014-01-10 Thread Jon Logan
Why would you benchmark local mode? It's intended for debugging and
development purposes, not actual production use...


per the website, Storm itself is  benchmarked at  1 million tuples per
second per node.


On Fri, Jan 10, 2014 at 8:40 AM, Klausen Schaefersinho 
klaus.schaef...@gmail.com wrote:

 Hi,

 how is the performance of a real cluster compared to a local development
 cluster? I was trying to benchmark Storm now and found  that it only
 manages to consume 10 events per second in a simple topology which only
 consists out of i spout and one bolt where the spout creates random objects
 and the bolt acknowledge the tuple.

 10 events terrible slow, and I am not sure if this is related to the
 develop mode.

 Cheers,

 Klaus



Re: Large binary payloads with storm

2014-01-10 Thread Jon Logan
You're going to run into issues if you have large tuples, because they are
buffered in memory. I would suggest moving it to an exterior channel, like
Redis, etc, and only passing meta-data through Storm.

Your other solution is to use quirky things like reflection to prevent your
application from running out of memory when tuples are buffered.


On Fri, Jan 10, 2014 at 8:49 AM, Ruhollah Farchtchi 
ruhollah.farcht...@gmail.com wrote:

 I am using storm to process small ( 100k) image files. I don't have a
 real-time requirement as yet, but my bottle neck is more in the image
 processing than message passing between bolts. I am using the Clojure DSL
 and the python bolt. Everything I've put together right now is very much a
 prototype so my next steps are some further processing and integration.
 Passing byte arrays didn't seem to work so well so I have had to
 encode/decode into base64 binary as it seems the JSON parsers on the python
 side didn't like byte arrays. I plan to go back and perhaps re-do the
 integration with a native C++ bolt, however I believe that there are other
 ways to do this integration as well. I'm As with Wilson, I'm interested if
 anyone else is using Storm to process binary payloads and what they have
 found works.

 Thanks,

 Ruhollah

 Ruhollah Farchtchi
 ruhollah.farcht...@gmail.com


 On Thu, Jan 9, 2014 at 10:24 PM, Lochlainn Wilson 
 lochlainn.wil...@gmail.com wrote:

 Hi all,

 I am new to Storm and have been tasked with determining whether it is
 feasible for us to use Apache storm in my company. I have of course
 configured the sample projects and have been poking around. A red flag is
 raised with the stream processing style JSON parsing.

 I am considering using storm with real time image processing bolts in
 C++. Packaging binary data into a JSON (by escaping it) looks like it will
 be slow and expensive. Is there a better way? Does anyone have experience
 processing large streams of binary data through storm?

 How did it go?

 Regards,

 Lochlainn





Re: Storm Performance

2014-01-10 Thread Nathan Leung
I've benched storm at 1.8 million tuples per second on a big (24 core) box
using local or shuffle grouping between a spout and bolt. If you're only
seeing 10 events per second make sure you don't have any sleeps (whether in
your code or elsewhere e.g. a library or triggered due to lack of data in
the storm spout). Also note that the default sleep period in the spout when
there is no data (in storm 0.9) is I believe 1ms so even if you're hitting
this condition you should see much more than 10 events per second.
On Jan 10, 2014 10:19 AM, Jon Logan jmlo...@buffalo.edu wrote:

 Why would you benchmark local mode? It's intended for debugging and
 development purposes, not actual production use...


 per the website, Storm itself is  benchmarked at  1 million tuples per
 second per node.


 On Fri, Jan 10, 2014 at 8:40 AM, Klausen Schaefersinho 
 klaus.schaef...@gmail.com wrote:

 Hi,

 how is the performance of a real cluster compared to a local development
 cluster? I was trying to benchmark Storm now and found  that it only
 manages to consume 10 events per second in a simple topology which only
 consists out of i spout and one bolt where the spout creates random objects
 and the bolt acknowledge the tuple.

 10 events terrible slow, and I am not sure if this is related to the
 develop mode.

 Cheers,

 Klaus





Re: Storm Performance

2014-01-10 Thread Klausen Schaefersinho
 I've benched storm at 1.8 million tuples per second on a big (24 core)
box using local or shuffle grouping between a spout and bolt.
Production or development mode?

 you're only seeing 10 events per second make sure you don't have any
sleeps
Yeah I checked  for sleeps etc. and stripped down my code. Now my bolts do
nothing and the spout just creates random data...




On Fri, Jan 10, 2014 at 4:37 PM, Nathan Leung ncle...@gmail.com wrote:

 I've benched storm at 1.8 million tuples per second on a big (24 core) box
 using local or shuffle grouping between a spout and bolt. If you're only
 seeing 10 events per second make sure you don't have any sleeps (whether in
 your code or elsewhere e.g. a library or triggered due to lack of data in
 the storm spout). Also note that the default sleep period in the spout when
 there is no data (in storm 0.9) is I believe 1ms so even if you're hitting
 this condition you should see much more than 10 events per second.
 On Jan 10, 2014 10:19 AM, Jon Logan jmlo...@buffalo.edu wrote:

 Why would you benchmark local mode? It's intended for debugging and
 development purposes, not actual production use...


 per the website, Storm itself is  benchmarked at  1 million tuples per
 second per node.


 On Fri, Jan 10, 2014 at 8:40 AM, Klausen Schaefersinho 
 klaus.schaef...@gmail.com wrote:

 Hi,

 how is the performance of a real cluster compared to a local development
 cluster? I was trying to benchmark Storm now and found  that it only
 manages to consume 10 events per second in a simple topology which only
 consists out of i spout and one bolt where the spout creates random objects
 and the bolt acknowledge the tuple.

 10 events terrible slow, and I am not sure if this is related to the
 develop mode.

 Cheers,

 Klaus





Re: Storm Performance

2014-01-10 Thread Michael Ritsema
I deploy with LocalCluster mode. I get several thousands messages processed
a second but expect my botteneck is not at the storm layer.

It would be interesting to see a full explanation of the differences in
LocalCluster and a real deployment. I expect quite a few people would
benefit from having storm run in process in actual deployments.

-Rits


On Fri, Jan 10, 2014 at 9:46 AM, Michael Rose mich...@fullcontact.comwrote:

 Post your code.  Even Dev mode is far faster for us.
 On Jan 10, 2014 8:44 AM, Klausen Schaefersinho 
 klaus.schaef...@gmail.com wrote:

  I've benched storm at 1.8 million tuples per second on a big (24 core)
 box using local or shuffle grouping between a spout and bolt.
 Production or development mode?

  you're only seeing 10 events per second make sure you don't have any
 sleeps
 Yeah I checked  for sleeps etc. and stripped down my code. Now my bolts
 do nothing and the spout just creates random data...




 On Fri, Jan 10, 2014 at 4:37 PM, Nathan Leung ncle...@gmail.com wrote:

 I've benched storm at 1.8 million tuples per second on a big (24 core)
 box using local or shuffle grouping between a spout and bolt. If you're
 only seeing 10 events per second make sure you don't have any sleeps
 (whether in your code or elsewhere e.g. a library or triggered due to lack
 of data in the storm spout). Also note that the default sleep period in the
 spout when there is no data (in storm 0.9) is I believe 1ms so even if
 you're hitting this condition you should see much more than 10 events per
 second.
  On Jan 10, 2014 10:19 AM, Jon Logan jmlo...@buffalo.edu wrote:

 Why would you benchmark local mode? It's intended for debugging and
 development purposes, not actual production use...


 per the website, Storm itself is  benchmarked at  1 million tuples per
 second per node.


 On Fri, Jan 10, 2014 at 8:40 AM, Klausen Schaefersinho 
 klaus.schaef...@gmail.com wrote:

 Hi,

 how is the performance of a real cluster compared to a local
 development cluster? I was trying to benchmark Storm now and found  that 
 it
 only manages to consume 10 events per second in a simple topology which
 only consists out of i spout and one bolt where the spout creates random
 objects and the bolt acknowledge the tuple.

 10 events terrible slow, and I am not sure if this is related to the
 develop mode.

 Cheers,

 Klaus






ZeroMQ Exception causes Topology to get killed

2014-01-10 Thread Gaurav Sehgal
Hi,
 I am getting the following exception in the cluster. These exceptions
happen in the worker log; but they eventually cause to topology to die. Can
anyone please share some inputs.


java.lang.UnsatisfiedLinkError: org.zeromq.ZMQ$Socket.destroy()V
at org.zeromq.ZMQ$Socket.destroy(Native Method)
at org.zeromq.ZMQ$Socket.close(ZMQ.java:432)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:298)
at backtype.storm.messaging.zmq.ZMQConnection.close(zmq.clj:45)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:298)
at
backtype.storm.daemon.worker$mk_refresh_connections$this__4293.invoke(worker.clj:252)
at
backtype.storm.daemon.worker$mk_refresh_connections$this__4293.invoke(worker.clj:218)
at
backtype.storm.timer$schedule_recurring$this__1776.invoke(timer.clj:69)
at backtype.storm.timer$mk_timer$fn__1759$fn__1760.invoke(timer.clj:33)
at backtype.storm.timer$mk_timer$fn__1759.invoke(timer.clj:26)
at clojure.lang.AFn.run(AFn.java:24)
at java.lang.Thread.run(Thread.java:662)


Cheers!
Gaurav


Re: ZeroMQ Exception causes Topology to get killed

2014-01-10 Thread Michael Rose
What version of ZeroMQ are you running?

You should be running 2.1.7 with nathan's provided fork of JZMQ.

Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com


On Fri, Jan 10, 2014 at 9:09 PM, Gaurav Sehgal gsehg...@gmail.com wrote:

 Hi,
  I am getting the following exception in the cluster. These exceptions
 happen in the worker log; but they eventually cause to topology to die. Can
 anyone please share some inputs.


 java.lang.UnsatisfiedLinkError: org.zeromq.ZMQ$Socket.destroy()V
 at org.zeromq.ZMQ$Socket.destroy(Native Method)
 at org.zeromq.ZMQ$Socket.close(ZMQ.java:432)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
 at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:298)
 at backtype.storm.messaging.zmq.ZMQConnection.close(zmq.clj:45)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
 at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:298)
 at
 backtype.storm.daemon.worker$mk_refresh_connections$this__4293.invoke(worker.clj:252)
 at
 backtype.storm.daemon.worker$mk_refresh_connections$this__4293.invoke(worker.clj:218)
 at
 backtype.storm.timer$schedule_recurring$this__1776.invoke(timer.clj:69)
 at backtype.storm.timer$mk_timer$fn__1759$fn__1760.invoke(timer.clj:33)
 at backtype.storm.timer$mk_timer$fn__1759.invoke(timer.clj:26)
 at clojure.lang.AFn.run(AFn.java:24)
 at java.lang.Thread.run(Thread.java:662)


 Cheers!
 Gaurav