Re: storm ui show problem
hey do you find storm ui show wrong number of execute tuple sometimes the number of execute tuple is bigger than its real sometime is less anybody can help ? 2014-07-23 hushengbo 发件人: 唐思成 发送时间: 2014-07-17 19:24:08 收件人: storm_user 抄送: 主题: storm upgrade issue Hi all: I try to upgrade storm from 0.9.1 to 0.9.2-incubating, and when the worknode supervisor startup, the nimbus process goes down, here is what the nimbus.log say: Before upgrade, I already change storm.local.dir: to a new location and remove storm node in zookeeper using zkCli.sh, however that dont help. AnyIdea? 2014-07-17 19:15:29 b.s.d.nimbus [ERROR] Error when processing event java.lang.RuntimeException: java.io.InvalidClassException: backtype.storm.daemon.common.SupervisorInfo; local class incompatible: stream classdesc serialVersionUID = 7648414326720210054, local class serialVersionUID = 7463898661547835557 at backtype.storm.utils.Utils.deserialize(Utils.java:93) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.cluster$maybe_deserialize.invoke(cluster.clj:200) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.cluster$mk_storm_cluster_state$reify__2284.supervisor_info(cluster.clj:299) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:na] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) ~[na:na] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) ~[na:na] at java.lang.reflect.Method.invoke(Method.java:597) ~[na:na] at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) ~[clojure-1.5.1.jar:na] at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) ~[clojure-1.5.1.jar:na] at backtype.storm.daemon.nimbus$all_supervisor_info$fn__4715.invoke(nimbus.clj:277) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at clojure.core$map$fn__4207.invoke(core.clj:2487) ~[clojure-1.5.1.jar:na] at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.5.1.jar:na] at clojure.lang.LazySeq.seq(LazySeq.java:60) ~[clojure-1.5.1.jar:na] at clojure.lang.RT.seq(RT.java:484) ~[clojure-1.5.1.jar:na] at clojure.core$seq.invoke(core.clj:133) ~[clojure-1.5.1.jar:na] at clojure.core$apply.invoke(core.clj:617) ~[clojure-1.5.1.jar:na] at clojure.core$mapcat.doInvoke(core.clj:2514) ~[clojure-1.5.1.jar:na] at clojure.lang.RestFn.invoke(RestFn.java:423) ~[clojure-1.5.1.jar:na] at backtype.storm.daemon.nimbus$all_supervisor_info.invoke(nimbus.clj:275) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.daemon.nimbus$all_scheduling_slots.invoke(nimbus.clj:288) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.daemon.nimbus$compute_new_topology__GT_executor__GT_node_PLUS_port.invoke(nimbus.clj:580) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.daemon.nimbus$mk_assignments.doInvoke(nimbus.clj:660) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at clojure.lang.RestFn.invoke(RestFn.java:410) ~[clojure-1.5.1.jar:na] at backtype.storm.daemon.nimbus$fn__5210$exec_fn__1396__auto5211$fn__5216$fn__5217.invoke(nimbus.clj:905) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.daemon.nimbus$fn__5210$exec_fn__1396__auto5211$fn__5216.invoke(nimbus.clj:904) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.timer$schedule_recurring$this__1134.invoke(timer.clj:99) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.timer$mk_timer$fn__1117$fn__1118.invoke(timer.clj:50) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.timer$mk_timer$fn__1117.invoke(timer.clj:42) [storm-core-0.9.2-incubating.jar:0.9.2-incubating] at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na] at java.lang.Thread.run(Thread.java:619) [na:na] Caused by: java.io.InvalidClassException: backtype.storm.daemon.common.SupervisorInfo; local class incompatible: stream classdesc serialVersionUID = 7648414326720210054, local class serialVersionUID = 7463898661547835557 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:562) ~[na:na] at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1583) ~[na:na] at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1496) ~[na:na] at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1732) ~[na:na] at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329) ~[na:na] at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) ~[na:na] at backtype.storm.utils.Utils.deserialize(Utils.java:89)
How to find when Supervisor fails.
I have figured how to track the spouts and bolts and noticed the supervisor has stopped when i submitted a topology and noticed nothing was happening due to it. Is there any automatic way of knowing when the daemons stop such as supervisor or nimbus or ZK. Thanks Sai
use normal storm bolt in trident topology
Is it recommended to use regular storm bolts in a trident topology? or is it at all possible? Thanks for your advice in advance. -Costco
Modifying tipples, is the below use case a fit for STORM
Hi, I am a newbie to Storm and wanted to try STORM in my project since it involves streams of messages and big data I have the following use case. I have 2 files, One file has 20 million user information. and the other file has user id and the scores associated to that user. I need to perform the following actions using STORM. For each each user record I need to get the associated scores for that user and append it to the user record. The scores data is around 110 million User record something like EmailAdd ID a...@gmail.com 1 Score data 180 190 110 120 2180 2920 2110 2210 result ani...@gmail.com 1 80 90 10 20 Can you please help how could I do this association using storm. thanks ! anil
Re: use normal storm bolt in trident topology
You cannot use regular storm bolts in a trident topology. Regular spouts can be used with trident, but not bolts. - Taylor On Jul 23, 2014, at 1:16 PM, A.M. costco.s...@yahoo.com wrote: Is it recommended to use regular storm bolts in a trident topology? or is it at all possible? Thanks for your advice in advance. -Costco signature.asc Description: Message signed with OpenPGP using GPGMail
Re: use normal storm bolt in trident topology
Hi Taylor, thanks for confirming. -Costco On Wednesday, July 23, 2014 10:24 AM, P. Taylor Goetz ptgo...@gmail.com wrote: You cannot use regular storm bolts in a trident topology. Regular spouts can be used with trident, but not bolts. - Taylor On Jul 23, 2014, at 1:16 PM, A.M. costco.s...@yahoo.com wrote: Is it recommended to use regular storm bolts in a trident topology? or is it at all possible? Thanks for your advice in advance. -Costco
Re: Netty transport errors.
No, its on RHE on real hardware... Although we went through a network upgrade couple day ago and all sort of things started failing On Wed, Jul 23, 2014 at 1:43 PM, Andrew Montalenti and...@parsely.com wrote: Tomas, You don't happen to be running Ubuntu 14.04 on Xen kernel, do you? Eg on Amazon EC2. I discovered an issue where running Storm across many workers on that OS led to me hitting an annoying network driver bug that would cause timeouts and topology freezes like you are seeing. Check dmesg for odd messages from your network stack. Just a guess. (copied from my reply to another similar thread) -AM On Jul 23, 2014 10:07 AM, Tomas Mazukna tomas.mazu...@gmail.com wrote: I am really puzzled why processing stopped in the topology. Looks like the acking threads all stopped communicating. Only hint I saw was this netty exception: Any hints how to prevent this from happening again? 2014-07-23 08:56:03 b.s.m.n.Client [INFO] Closing Netty Client Netty-Client-ndhhadappp3.tsh.mis.mckesson.com/10.48.132.224:9703 2014-07-23 08:56:03 b.s.m.n.Client [INFO] Waiting for pending batchs to be sent with Netty-Client-ndhhadappp3.tsh.mis.mckesson.com/10.48.132.224:9703..., timeout: 60ms, pendings: 0 2014-07-23 08:56:03 b.s.m.n.Client [INFO] Closing Netty Client Netty-Client-ndhhadappp3.tsh.mis.mckesson.com/10.48.132.224:9700 2014-07-23 08:56:03 b.s.m.n.Client [INFO] Waiting for pending batchs to be sent with Netty-Client-ndhhadappp3.tsh.mis.mckesson.com/10.48.132.224:9700..., timeout: 60ms, pendings: 0 2014-07-23 08:56:03 b.s.util [ERROR] Async loop died! java.lang.RuntimeException: java.lang.RuntimeException: Client is being closed, and does not take requests any more at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.disruptor$consume_loop_STAR_$fn__758.invoke(disruptor.clj:94) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.util$async_loop$fn__457.invoke(util.clj:431) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_60] Caused by: java.lang.RuntimeException: Client is being closed, and does not take requests any more at backtype.storm.messaging.netty.Client.send(Client.java:194) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5927$fn__5928.invoke(worker.clj:322) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5927.invoke(worker.clj:320) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.disruptor$clojure_handler$reify__745.onEvent(disruptor.clj:58) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating] ... 6 common frames omitted 2014-07-23 08:56:03 b.s.util [INFO] Halting process: (Async loop died!) Configuration: worker.childopts: -Xmx2048m -Xss256k -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:NewSize=128m -XX:CMSInitiatingOccupancyFraction=70 -XX:-CMSConcurrentMTEnabled -Djava.net.preferIPv4Stack=true supervisor.childopts: -Xmx256m -Djava.net.preferIPv4Stack=true nimbus.childopts: -Xmx1024m -Djava.net.preferIPv4Stack=true ui.childopts: -Xmx768m -Djava.net.preferIPv4Stack=true nimbus.thrift.threads: 256 storm.messaging.transport: backtype.storm.messaging.netty.Context storm.messaging.netty.server_worker_threads: 1 storm.messaging.netty.client_worker_threads: 1 storm.messaging.netty.buffer_size: 5242880 storm.messaging.netty.max_retries: 100 storm.messaging.netty.max_wait_ms: 1000 storm.messaging.netty.min_wait_ms: 100 Thanks, -- Tomas Mazukna 678-557-3834 -- Tomas Mazukna 678-557-3834
intra-topology SSL transport
Hi, I've been working with storm on mesos but I need to make sure all workers are messaging over SSL since streams may contain sensitive information for almost all of my use cases. stunnel seems like a viable option but I dislike having complex port forwarding arrangements and would prefer code to config in this case. As an exercise to see how much work it would be, I forked storm and modified the storm-netty package to use SSL with the existing nio. Not so bad, and lein tests pass. Still wrapping my head around the storm codebase. Would using my modified storm-netty Context as storm.messaging.transport be enough to ensure streams are encrypted, or would I need to also attack the thrift transport plugin? Also, is anyone else interested in locking storm down with SSL?
Re: intra-topology SSL transport
In the security branch of storm, worker-worker communication are encrypted (blowfish) with a shared secret. STORM-348 will add authentication to worker-worker. For thrift (nimbus drpc), the security branch has SASL/kerberos authentication, and you should be able to configure encryption via SASL as well. We have not tried enabling encryption with SASL. -- Derek On 7/23/14, 14:05, Isaac Councill wrote: Hi, I've been working with storm on mesos but I need to make sure all workers are messaging over SSL since streams may contain sensitive information for almost all of my use cases. stunnel seems like a viable option but I dislike having complex port forwarding arrangements and would prefer code to config in this case. As an exercise to see how much work it would be, I forked storm and modified the storm-netty package to use SSL with the existing nio. Not so bad, and lein tests pass. Still wrapping my head around the storm codebase. Would using my modified storm-netty Context as storm.messaging.transport be enough to ensure streams are encrypted, or would I need to also attack the thrift transport plugin? Also, is anyone else interested in locking storm down with SSL?
Re: intra-topology SSL transport
great, thanks! On Wed, Jul 23, 2014 at 3:25 PM, Derek Dagit der...@yahoo-inc.com wrote: In the security branch of storm, worker-worker communication are encrypted (blowfish) with a shared secret. STORM-348 will add authentication to worker-worker. For thrift (nimbus drpc), the security branch has SASL/kerberos authentication, and you should be able to configure encryption via SASL as well. We have not tried enabling encryption with SASL. -- Derek On 7/23/14, 14:05, Isaac Councill wrote: Hi, I've been working with storm on mesos but I need to make sure all workers are messaging over SSL since streams may contain sensitive information for almost all of my use cases. stunnel seems like a viable option but I dislike having complex port forwarding arrangements and would prefer code to config in this case. As an exercise to see how much work it would be, I forked storm and modified the storm-netty package to use SSL with the existing nio. Not so bad, and lein tests pass. Still wrapping my head around the storm codebase. Would using my modified storm-netty Context as storm.messaging.transport be enough to ensure streams are encrypted, or would I need to also attack the thrift transport plugin? Also, is anyone else interested in locking storm down with SSL?
Parallelism for KafkaSpout
Hi, Is the no. of executors for KafkaSpout dependent on the partitions for the topic? For E.g., Say kafka TopicA has 5 partitions. If I have a KafkaSpout that has the parallelism hint set to 10, will all the executors get the data? Or am i constrained by the no. of partitions declared for the topic? Regards, Kashyap
Re: Parallelism for KafkaSpout
From my understanding, the executors without a partition to read remain idle. Its recommended for efficiency to set the number of executors to the number of partitions. On Wed, Jul 23, 2014 at 5:11 PM, Kashyap Mhaisekar kashya...@gmail.com wrote: Hi, Is the no. of executors for KafkaSpout dependent on the partitions for the topic? For E.g., Say kafka TopicA has 5 partitions. If I have a KafkaSpout that has the parallelism hint set to 10, will all the executors get the data? Or am i constrained by the no. of partitions declared for the topic? Regards, Kashyap
Re: Parallelism for KafkaSpout
In your example, five spouts would get data and the other five would not. On Jul 23, 2014 5:11 PM, Kashyap Mhaisekar kashya...@gmail.com wrote: Hi, Is the no. of executors for KafkaSpout dependent on the partitions for the topic? For E.g., Say kafka TopicA has 5 partitions. If I have a KafkaSpout that has the parallelism hint set to 10, will all the executors get the data? Or am i constrained by the no. of partitions declared for the topic? Regards, Kashyap
Re: Parallelism for KafkaSpout
how do you tell how many partitions are used in a specific bolt on the topology ? On Wed, Jul 23, 2014 at 2:15 PM, Nathan Leung ncle...@gmail.com wrote: In your example, five spouts would get data and the other five would not. On Jul 23, 2014 5:11 PM, Kashyap Mhaisekar kashya...@gmail.com wrote: Hi, Is the no. of executors for KafkaSpout dependent on the partitions for the topic? For E.g., Say kafka TopicA has 5 partitions. If I have a KafkaSpout that has the parallelism hint set to 10, will all the executors get the data? Or am i constrained by the no. of partitions declared for the topic? Regards, Kashyap -- Raphael Hsieh
Re: NotAliveException via Storm UI
I solved my own issue I think. There is a space in my topology name. So I believe the storm UI is not escaping the link. On Wed, Jul 23, 2014 at 6:49 PM, Vincent Russell vincent.russ...@gmail.com wrote: I'm using Storm 0.9.0.1 and I'm getting a NotAliveException via the Storm UI: NotAliveException(msg:Ingest-33-1406154838) at backtype.storm.generated.Nimbus$getTopologyInfo_result.read(Nimbus.java:11330) at org.apache.thrift7.TServiceClient.receiveBase(TServiceClient.java:78) at backtype.storm.generated.Nimbus$Client.recv_getTopologyInfo(Nimbus.java:474) at backtype.storm.generated.Nimbus$Client.getTopologyInfo(Nimbus.java:461) at backtype.storm.ui.core$topology_page.invoke(core.clj:481) at backtype.storm.ui.core$fn__7877.invoke(core.clj:745) at compojure.core$make_route$fn__3855.invoke(core.clj:93) at compojure.core$if_route$fn__3843.invoke(core.clj:39) at compojure.core$if_method$fn__3836.invoke(core.clj:24) at compojure.core$routing$fn__3861.invoke(core.clj:106) at clojure.core$some.invoke(core.clj:2390) at compojure.core$routing.doInvoke(core.clj:106) at clojure.lang.RestFn.applyTo(RestFn.java:139) at clojure.core$apply.invoke(core.clj:603) at compojure.core$routes$fn__3865.invoke(core.clj:111) at ring.middleware.reload$wrap_reload$fn__7170.invoke(reload.clj:14) at backtype.storm.ui.core$catch_errors$fn__7912.invoke(core.clj:798) at ring.middleware.keyword_params$wrap_keyword_params$fn__4391.invoke(keyword_params.clj:27) at ring.middleware.nested_params$wrap_nested_params$fn__4428.invoke(nested_params.clj:65) at ring.middleware.params$wrap_params$fn__4365.invoke(params.clj:55) at ring.middleware.multipart_params$wrap_multipart_params$fn__4454.invoke(multipart_params.clj:103) at ring.middleware.flash$wrap_flash$fn__4729.invoke(flash.clj:14) at ring.middleware.session$wrap_session$fn__4720.invoke(session.clj:43) at ring.middleware.cookies$wrap_cookies$fn__4657.invoke(cookies.clj:160) at ring.adapter.jetty$proxy_handler$fn__4204.invoke(jetty.clj:16) at ring.adapter.jetty.proxy$org.mortbay.jetty.handler.AbstractHandler$0.handle(Unknown Source) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) but if I get the status from the command line it looks fine: # storm list Running: java -client -Dstorm.options= -Dstorm.home=/opt/storm/storm-0.9.0.1 -Djava.library.path=/usr/lib64:/usr/lib -Dstorm.conf.file= -cp
storm reliablity
Guys, i am a bit confused about the fault tolerance feature of storm. I have read https://storm.incubator.apache.org/documentation/Guaranteeing-message-processing.html My question is, if a bolt failed with some runtime exception, does it means that the tuple is failed and the same tuple sent to this bolt will be replayed again? Or it means that the message has been processed although it failed. It already met the at least one processing, and will not be replayed. Or it depends on how and when I call .ack and .fail? Also about replaying message if time out, does it mean that if a downstream bolt takes more than the specified time out, the message will be replayed? Or its again depending on when I call ack and fail.. Thanks for clarification. Chen
Re: storm reliablity
All reply's occur from the spout. Think of what you emit from the spout as your root tuple and all other tuples generated from that tuple as child tuples. If any downstream tuples fail you can code to reemit the same tuple again from your spout and reply the entire set. There is no way I know of to reply just the bolt tuples. When all downstream processing is complete the ack method in the spout is called by the storm framework and fail is called when one of the tuples downstream fails/timesout. Storm provides you a framework to setup reply but it is you as a developer that needs to code for it. Hope this helps On Wed, Jul 23, 2014 at 9:13 PM, Chen Wang chen.apache.s...@gmail.com wrote: Guys, i am a bit confused about the fault tolerance feature of storm. I have read https://storm.incubator.apache.org/documentation/Guaranteeing-message-processing.html My question is, if a bolt failed with some runtime exception, does it means that the tuple is failed and the same tuple sent to this bolt will be replayed again? Or it means that the message has been processed although it failed. It already met the at least one processing, and will not be replayed. Or it depends on how and when I call .ack and .fail? Also about replaying message if time out, does it mean that if a downstream bolt takes more than the specified time out, the message will be replayed? Or its again depending on when I call ack and fail.. Thanks for clarification. Chen
Re: storm reliablity
So if my downstream bolt is doing some time consuming task after receiving a signal(tuple) from the spout. e.g: public void execute(Tuple tuple) { String sentence = tuple.getString(0); while(){ // time consuming job } _collector.ack(tuple); } if the while loop takes longer than the specified time(30s), then the storm will consider the signal failed and the .fail() will be called in the spout. Is that correct? Whether the tuple will be replayed or not entirely depends on my implementation in fail()? Another case, if I move _collector.ack(tuple); before the while loop, then the storm will consider the tuple at least succeeded already on this bolt? Even in while loop there is exception thrown, the fail() will not be called on spout. Is that right? Thanks, Chen On Wed, Jul 23, 2014 at 6:22 PM, Naresh Kosgi nareshko...@gmail.com wrote: All reply's occur from the spout. Think of what you emit from the spout as your root tuple and all other tuples generated from that tuple as child tuples. If any downstream tuples fail you can code to reemit the same tuple again from your spout and reply the entire set. There is no way I know of to reply just the bolt tuples. When all downstream processing is complete the ack method in the spout is called by the storm framework and fail is called when one of the tuples downstream fails/timesout. Storm provides you a framework to setup reply but it is you as a developer that needs to code for it. Hope this helps On Wed, Jul 23, 2014 at 9:13 PM, Chen Wang chen.apache.s...@gmail.com wrote: Guys, i am a bit confused about the fault tolerance feature of storm. I have read https://storm.incubator.apache.org/documentation/Guaranteeing-message-processing.html My question is, if a bolt failed with some runtime exception, does it means that the tuple is failed and the same tuple sent to this bolt will be replayed again? Or it means that the message has been processed although it failed. It already met the at least one processing, and will not be replayed. Or it depends on how and when I call .ack and .fail? Also about replaying message if time out, does it mean that if a downstream bolt takes more than the specified time out, the message will be replayed? Or its again depending on when I call ack and fail.. Thanks for clarification. Chen
Re: storm reliablity
yes that is correct. Two quick points: 1. If you are not going to use the fail method for some kind of reply then you probably should not anchor your tuples as storm runs a little bit faster if reply framework is not used. 2. The timeout setting is configurable so you can set it to be more then 30 seconds if your process takes more time. On Wed, Jul 23, 2014 at 10:46 PM, Chen Wang chen.apache.s...@gmail.com wrote: So if my downstream bolt is doing some time consuming task after receiving a signal(tuple) from the spout. e.g: public void execute(Tuple tuple) { String sentence = tuple.getString(0); while(){ // time consuming job } _collector.ack(tuple); } if the while loop takes longer than the specified time(30s), then the storm will consider the signal failed and the .fail() will be called in the spout. Is that correct? Whether the tuple will be replayed or not entirely depends on my implementation in fail()? Another case, if I move _collector.ack(tuple); before the while loop, then the storm will consider the tuple at least succeeded already on this bolt? Even in while loop there is exception thrown, the fail() will not be called on spout. Is that right? Thanks, Chen On Wed, Jul 23, 2014 at 6:22 PM, Naresh Kosgi nareshko...@gmail.com wrote: All reply's occur from the spout. Think of what you emit from the spout as your root tuple and all other tuples generated from that tuple as child tuples. If any downstream tuples fail you can code to reemit the same tuple again from your spout and reply the entire set. There is no way I know of to reply just the bolt tuples. When all downstream processing is complete the ack method in the spout is called by the storm framework and fail is called when one of the tuples downstream fails/timesout. Storm provides you a framework to setup reply but it is you as a developer that needs to code for it. Hope this helps On Wed, Jul 23, 2014 at 9:13 PM, Chen Wang chen.apache.s...@gmail.com wrote: Guys, i am a bit confused about the fault tolerance feature of storm. I have read https://storm.incubator.apache.org/documentation/Guaranteeing-message-processing.html My question is, if a bolt failed with some runtime exception, does it means that the tuple is failed and the same tuple sent to this bolt will be replayed again? Or it means that the message has been processed although it failed. It already met the at least one processing, and will not be replayed. Or it depends on how and when I call .ack and .fail? Also about replaying message if time out, does it mean that if a downstream bolt takes more than the specified time out, the message will be replayed? Or its again depending on when I call ack and fail.. Thanks for clarification. Chen