Re: What happens when an executor crashes?

2016-10-12 Thread Cody Koeninger
Yes, partitionBy will shuffle unless it happens to be partitioning with the exact same partitioner the parent rdd had. On Wed, Oct 12, 2016 at 8:34 AM, Samy Dindane wrote: > Hey Cody, > > I ended up choosing a different way to do things, which is using Kafka to > commit my

Re: What happens when an executor crashes?

2016-10-12 Thread Samy Dindane
Hey Cody, I ended up choosing a different way to do things, which is using Kafka to commit my offsets. It works fine, except it stores the offset in ZK instead of a Kafka topic (investigating this right now). I understand your explanations, thank you, but I have one question: when you say

Re: What happens when an executor crashes?

2016-10-10 Thread Cody Koeninger
Repartition almost always involves a shuffle. Let me see if I can explain the recovery stuff... Say you start with two kafka partitions, topic-0 and topic-1. You shuffle those across 3 spark parittions, we'll label them A B and C. Your job is has written fileA: results for A, offset ranges

Re: What happens when an executor crashes?

2016-10-10 Thread Samy Dindane
On 10/10/2016 8:14 PM, Cody Koeninger wrote: Glad it was helpful :) As far as executors, my expectation is that if you have multiple executors running, and one of them crashes, the failed task will be submitted on a different executor. That is typically what I observe in spark apps, if

Re: What happens when an executor crashes?

2016-10-10 Thread Cody Koeninger
Glad it was helpful :) As far as executors, my expectation is that if you have multiple executors running, and one of them crashes, the failed task will be submitted on a different executor. That is typically what I observe in spark apps, if that's not what you're seeing I'd try to get help on

Re: What happens when an executor crashes?

2016-10-10 Thread Samy Dindane
I just noticed that you're the author of the code I linked in my previous email. :) It's helpful. When using `foreachPartition` or `mapPartitions`, I noticed I can't ask Spark to write the data on the disk using `df.write()` but I need to use the iterator to do so, which means losing the

Re: What happens when an executor crashes?

2016-10-10 Thread Cody Koeninger
What is it you're actually trying to accomplish? On Mon, Oct 10, 2016 at 5:26 AM, Samy Dindane wrote: > I managed to make a specific executor crash by using > TaskContext.get.partitionId and throwing an exception for a specific > executor. > > The issue I have now is that the