subject:"Adding \/ Removing worker nodes for Spark Streaming"

Re: Adding / Removing worker nodes for Spark Streaming

2015-09-29 Thread Adrian Tanase

this is part of the checkpointed metadata in the spark context. -adrian From: Cody Koeninger Date: Tuesday, September 29, 2015 at 12:49 AM To: Sourabh Chandak Cc: Augustus Hong, "user@spark.apache.org<mailto:user@spark.apache.org>" Subject: Re: Adding / Removing worker nodes for

Adding / Removing worker nodes for Spark Streaming

2015-09-28 Thread Augustus Hong

Hey all, I'm evaluating using Spark Streaming with Kafka direct streaming, and I have a couple of questions: 1. Would it be possible to add / remove worker nodes without stopping and restarting the spark streaming driver? 2. I understand that we can enable checkpointing to recover from node

Re: Adding / Removing worker nodes for Spark Streaming

2015-09-28 Thread Sourabh Chandak

I also have the same use case as Augustus, and have some basic questions about recovery from checkpoint. I have a 10 node Kafka cluster and a 30 node Spark cluster running streaming job, how is the (topic, partition) data handled in checkpointing. The scenario I want to understand is, in case of

Re: Adding / Removing worker nodes for Spark Streaming

2015-09-28 Thread Cody Koeninger

If a node fails, the partition / offset range that it was working on will be scheduled to run on another node. This is generally true of spark, regardless of checkpointing. The offset ranges for a given batch are stored in the checkpoint for that batch. That's relevant if your entire job fails

Re: Adding / Removing worker nodes for Spark Streaming

Adding / Removing worker nodes for Spark Streaming

Re: Adding / Removing worker nodes for Spark Streaming

Re: Adding / Removing worker nodes for Spark Streaming

4 matches

Site Navigation

Mail list logo

Footer information