>From the Spark Streaming Programming Guide ( http://spark.apache.org/docs/latest/streaming-programming-guide.html#failure-of-a-worker-node ):
*...output operations (like foreachRDD) have at-least once semantics, that is, the transformed data may get written to an external entity more than once in the event of a worker failure.* I think that when a worker fails the entire graph of transformations/actions will be reapplied again on that RDD. This means that, in your case, both the storing operations will be executed again. For this reason, in a video I've watched on youtube, they suggest to make all the output operations idempotent. Obviously not always this is possible unfortunately: e.g. you are building an analytics system and you need to increment counters. This is what I've got so far, anyone having a different point of view? On 6 October 2014 08:59, Jahagirdar, Madhu <madhu.jahagir...@philips.com> wrote: > Given that I have multiple worker nodes and when Spark schedules the job > again on the worker nodes that are alive, does it then again store the > data in elastic search and then flume or does it only run functions to > store in flume ? > > Regards, > Madhu Jahagirdar > > ------------------------------ > *From:* Akhil Das [ak...@sigmoidanalytics.com] > *Sent:* Monday, October 06, 2014 1:20 PM > *To:* Jahagirdar, Madhu > *Cc:* user > *Subject:* Re: Dstream Transformations > > AFAIK spark doesn't restart worker nodes itself. You can have multiple > worker nodes and in that case if one worker node goes down, then spark will > try to recompute those lost RDDs again with those workers who are alive. > > Thanks > Best Regards > > On Sun, Oct 5, 2014 at 5:19 AM, Jahagirdar, Madhu < > madhu.jahagir...@philips.com> wrote: > >> In my spark streaming program I have created kafka utils to receive data >> and store data in elastic search and in flume. Storing function is applied >> on same dstream. My question what is the behavior of spark if after storing >> data in elastic search the worker node dies before storing in flume? Does >> it restart worker and then again store the data in elastic search and then >> flume or does it only run functions to store in flume. >> >> Regards >> Madhu Jahagirdar >> >> ________________________________ >> The information contained in this message may be confidential and legally >> protected under applicable law. The message is intended solely for the >> addressee(s). If you are not the intended recipient, you are hereby >> notified that any use, forwarding, dissemination, or reproduction of this >> message is strictly prohibited and may be unlawful. If you are not the >> intended recipient, please contact the sender by return e-mail and destroy >> all copies of the original message. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > -- ------------------------------------------------ Massimiliano Tomassi ------------------------------------------------ web: http://about.me/maxtomassi e-mail: max.toma...@gmail.com ------------------------------------------------