Re: Is there any way for my application code to get notified after it gets deserialized on a worker node and before spouts/bolts are opened/prepared ?
The bolt base classes have a prepare method: https://storm.incubator.apache.org/apidocs/backtype/storm/topology/base/BaseBasicBolt.html and the spout base classes have a similar activate method: https://storm.incubator.apache.org/apidocs/backtype/storm/topology/base/BaseRichSpout.html Is that sufficient for your needs or were you thinking of something different? Marc On Sun, Jun 01, 2014 at 04:47:03PM -0700, Chris Bedford wrote: Hi there - I would like to set up some state that spouts and bolts share, and I'd like to prepare this state when the StormTopology gets 'activated' on a worker. it would be great if the StormTopology had something like a prepare or open method to indicate when it is starting. I looked but i could find no such API. Maybe I should submit an enhancement request ? Thanks in advance for your responses, - Chris [ if anyone is curious, the shared state is for all my application code to check or not check invariants. the invariant checking takes additional time, so we don't want to do it in production.. but during testing/development it helps catch bugs]. -- Chris Bedford Founder Lead Lackey Build Lackey Labs: http://buildlackey.com Go Grails!: http://blog.buildlackey.com
Re: Interesting Comparison
To play devil's advocate, if you believe the stream performance gains, then the 40k will likely pay for itself in needing to deploy a fraction of the resources for the same throughput. On Mon, May 12, 2014 at 09:02:53AM -0400, John Welcher wrote: Hi Streams also cost 40,000 US while Storm is free. John On Mon, May 12, 2014 at 3:49 AM, Klausen Schaefersinho klaus.schaef...@gmail.com wrote: Hi, I found some interesting comparison of IBM Stream and Storm: https://www.ibmdw.net/streamsdev/2014/04/22/streams-apache-storm/ It also includes an interesting comparison between ZeroMQ and the Netty Performance. Cheers, Klaus
Re: Doubts on Apache Storm
On Tue, May 06, 2014 at 03:21:13PM +0530, milind.pa...@polarisft.com wrote: Hi, Is Nimbus mandatory for storm? (Our development env is neither using Nimbus nor any other cloud environment) I think you might have misunderstood nimbus. It is a daemon that is part of storm, *not* nimbus from the nimbus project. (I am new to Apache storm, it would really help me if any basic document available on Apache storm) Regards Milind Patil | Intellect Liquidity cash management 8SWS 031 | Silver metropolis | Western express highway | Goregaon (East) | Mumbai | 400 063. INDIA Board: 91-22-67801500 | 91-22-42029200 | Ext: 1734 Mobile: +91 9920612360 | mail id: milind.pa...@polarisft.com This e-Mail may contain proprietary and confidential information and is sent for the intended recipient(s) only. If by an addressing or transmission error this mail has been misdirected to you, you are requested to delete this mail immediately. You are also hereby notified that any use, any form of reproduction, dissemination, copying, disclosure, modification, distribution and/or publication of this e-mail message, contents or its attachment other than by its intended recipient/s is strictly prohibited. Visit us at http://www.polarisFT.com
Re: PDF processing use case in storm!!
I think it's important to know whether or not some form of parallelism (other than throughput) is required, otherwise a standard webservice seems sufficient for this use case. On Mon, Apr 28, 2014 at 07:46:35AM -0400, Andrew Perepelytsya wrote: You can build request response type topologies via DRPC. However, unless we're talking about processing numerous pdfs at once - bad fit, IMO. If there is parallelism required you might be better off with a custom yarn app - looks like YAYA makes it tolerable top write. Andrew On Apr 28, 2014 2:41 AM, Deepak Sharma deepakmc...@gmail.com wrote: Hi All, Just wanted to check if this can be valid storm use case. I want to write 1 simple storm topology which can read pdf file , process it , make some changes like convert it to doc and save the new file. I know this can be easily done in batch mode using hadoop.But we want to do it in real time ,i.e. when the user demands it. We already do it using some java api but it takes lot of time in all conversions. Can this be achieved in Storm?If yes , Is there any pointer to any examples similar to this use case? -- Thanks Deepak www.bigdatabig.com CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: PDF processing use case in storm!!
Ok, so why isn't a standard webservice cgi/fascgi/mod_php, etc not sufficient for that kind of parallelism? With multiple users making requests, they will all happen in parallel. On Mon, Apr 28, 2014 at 07:53:37PM +0530, Deepak Sharma wrote: We need parallelism here. As lot of users may be using the service at the same time.It may be for different file or the same file. Thanks Deepak On Mon, Apr 28, 2014 at 7:41 PM, Marc Vaillant vaill...@animetrics.com wrote: I think it's important to know whether or not some form of parallelism (other than throughput) is required, otherwise a standard webservice seems sufficient for this use case. On Mon, Apr 28, 2014 at 07:46:35AM -0400, Andrew Perepelytsya wrote: You can build request response type topologies via DRPC. However, unless we're talking about processing numerous pdfs at once - bad fit, IMO. If there is parallelism required you might be better off with a custom yarn app - looks like YAYA makes it tolerable top write. Andrew On Apr 28, 2014 2:41 AM, Deepak Sharma deepakmc...@gmail.com wrote: Hi All, Just wanted to check if this can be valid storm use case. I want to write 1 simple storm topology which can read pdf file , process it , make some changes like convert it to doc and save the new file. I know this can be easily done in batch mode using hadoop.But we want to do it in real time ,i.e. when the user demands it. We already do it using some java api but it takes lot of time in all conversions. Can this be achieved in Storm?If yes , Is there any pointer to any examples similar to this use case? -- Thanks Deepak www.bigdatabig.com CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Thanks Deepak www.bigdatabig.com www.keosha.net
One solution to the stdio redirect issue
I put together a more complete solution to the insidious STDOUT/STDERR buffer filling issue. Basically, if STDOUT/STDERR is not redirected/consumed in cluster mode it will fill the buffer and eventually take down your topology. The original thread on this issue was not migrated to JIRA but related issues can be found here: https://issues.apache.org/jira/browse/STORM-202?jql=project%20%3D%20STORM%20AND%20text%20~%20%22stdout%22 Early on, Nathan added a redirect to Log4j that will work for Java, but not for STDOUT/STDERR coming from native code. So if your spouts and/or bolts use a native JNI library and that library writes to STDOUT or STDERR, you will encounter this problem. I put together a library that will redirect all (including native) STDOUT and STDERR to Log4j. We have been using it now for a couple weeks in our topologies. https://github.com/animetrics/STDIORedirect I hope someone else can find it useful. Best, Marc
Re: Wirbelsturm released, 1-click deployments of Storm clusters
Hi Michael, Thanks very much for your hard work on this, your puppet scripts have been very helpful. We are having a specific issue with supervision of zookeeper and I wonder if you have encountered something similar or if we are doing something wrong. Even with the stopasgroup=true supervisord option, there still seems to be a problem with orphaned child processes when the parent process (zookeeper-server script) goes down from an external event. Although running supervisorctl stop zookeeper will take down the zookeeper-server script and its child processes, issuing a killall zookeeper-server will take down only the script, leaving the child processes running. This sends supervisord into an infinite loop of attempting to restart zookeeper, but failing because the child processes are still alive and occupying the required ports. A fix we've found (refer to the last answer here http://stackoverflow.com/questions/9090683/supervisord-stopping-child-processesis) is to put trap kill -- -$$ EXIT at the top of the zookeeper-server script. However, it seems like the stopasgroup=true setting was designed to handle this case. I know that stopasgroup was part of the 3.0b2 (05.28.2013) release of supervisord. We are using the 3.0 (07.30.2013) release from your RPM https://github.com/miguno/wirbelsturm-rpm-supervisord so I believe it should be available. Thanks, Marc On Mon, Mar 17, 2014 at 09:02:11PM +0100, Michael G. Noll wrote: Hi everyone, I have released a tool called Wirbelsturm (https://github.com/miguno/wirbelsturm) that allows you to perform local and remote deployments of Storm. It's also a small way of saying a big thank you to the Storm community. Wirbelsturm uses Vagrant for creating and managing machines, and Puppet for provisioning the machines once they're up and running. You can also use Ansible to interact with deployed machines. Deploying Storm is but one example, of course -- you can deploy other software with Wirbelsturm as well (e.g. Graphite, Kafka, Redis, ZooKeeper). I also wrote a quick intro and behind-the-scenes blog post at [1], which covers, for instance, the motivation behind building Wirbelsturm and lessons learned along the way (read: mistakes made :-P). Enjoy! Michael [1] http://www.michael-noll.com/blog/2014/03/17/wirbelsturm-one-click-deploy-storm-kafka-clusters-with-vagrant-puppet/
which heartbeat(s) to modify so that debug sessions don't timeout?
I'm trying to debug some native code that runs in a task using gdb. When I attach to the process, storm holds me to one or more of its 30s heartbeat timeouts while stepping through code, at which point it kills the process and therefore prematurely ends my debugging session. I'm having trouble figuring out how to extend this timeout so that I can debug effectively. I've tried setting supervisor.worker.timeout.secs but that doesn't do it (I can tell from ps on the worker process that it is indeed set to the value I've given). What other timeout should I need to set? Thanks, Marc
Can a topology be configured to force a maximum of 1 executor per worker?
Suppose that you have a bolt whose tasks are not thread safe but you still want parallelism. It seems that this could be achieved via multiprocessing by forcing a maximium of 1 executor per worker. With this constraint, if you chose a parallelism hint of 4 (with default executors) you would get 4 tasks in 4 executors each running in a separate worker. Can this constraint be configured? Thanks, Marc