Re: Is there any way for my application code to get notified after it gets deserialized on a worker node and before spouts/bolts are opened/prepared ?

2014-06-02 Thread Marc Vaillant
The bolt base classes have a prepare method: https://storm.incubator.apache.org/apidocs/backtype/storm/topology/base/BaseBasicBolt.html and the spout base classes have a similar activate method: https://storm.incubator.apache.org/apidocs/backtype/storm/topology/base/BaseRichSpout.html Is that

Re: Interesting Comparison

2014-05-12 Thread Marc Vaillant
To play devil's advocate, if you believe the stream performance gains, then the 40k will likely pay for itself in needing to deploy a fraction of the resources for the same throughput. On Mon, May 12, 2014 at 09:02:53AM -0400, John Welcher wrote: Hi Streams also cost 40,000 US while Storm

Re: Doubts on Apache Storm

2014-05-06 Thread Marc Vaillant
On Tue, May 06, 2014 at 03:21:13PM +0530, milind.pa...@polarisft.com wrote: Hi, Is Nimbus mandatory for storm? (Our development env is neither using Nimbus nor any other cloud environment) I think you might have misunderstood nimbus. It is a daemon that is part of storm, *not*

Re: PDF processing use case in storm!!

2014-04-28 Thread Marc Vaillant
I think it's important to know whether or not some form of parallelism (other than throughput) is required, otherwise a standard webservice seems sufficient for this use case. On Mon, Apr 28, 2014 at 07:46:35AM -0400, Andrew Perepelytsya wrote: You can build request response type topologies via

Re: PDF processing use case in storm!!

2014-04-28 Thread Marc Vaillant
be using the service at the same time.It may be for different file or the same file. Thanks Deepak On Mon, Apr 28, 2014 at 7:41 PM, Marc Vaillant vaill...@animetrics.com wrote: I think it's important to know whether or not some form of parallelism (other than throughput

One solution to the stdio redirect issue

2014-03-24 Thread Marc Vaillant
I put together a more complete solution to the insidious STDOUT/STDERR buffer filling issue. Basically, if STDOUT/STDERR is not redirected/consumed in cluster mode it will fill the buffer and eventually take down your topology. The original thread on this issue was not migrated to JIRA but

Re: Wirbelsturm released, 1-click deployments of Storm clusters

2014-03-19 Thread Marc Vaillant
Hi Michael, Thanks very much for your hard work on this, your puppet scripts have been very helpful. We are having a specific issue with supervision of zookeeper and I wonder if you have encountered something similar or if we are doing something wrong. Even with the stopasgroup=true

which heartbeat(s) to modify so that debug sessions don't timeout?

2014-02-27 Thread Marc Vaillant
I'm trying to debug some native code that runs in a task using gdb. When I attach to the process, storm holds me to one or more of its 30s heartbeat timeouts while stepping through code, at which point it kills the process and therefore prematurely ends my debugging session. I'm having trouble

Can a topology be configured to force a maximum of 1 executor per worker?

2014-02-05 Thread Marc Vaillant
Suppose that you have a bolt whose tasks are not thread safe but you still want parallelism. It seems that this could be achieved via multiprocessing by forcing a maximium of 1 executor per worker. With this constraint, if you chose a parallelism hint of 4 (with default executors) you would get