Re: How does Spark streaming's Kafka direct stream survive from worker node failure?

2016-02-26 Thread yuhang.chenn
Thanks a lot.

发自WPS邮箱客戶端在 Cody Koeninger ,2016年2月27日 上午1:02写道:Yes.On Thu, Feb 25, 2016 at 9:45 PM, yuhang.chenn  wrote:Thanks a lot.
And I got another question: What would happen if I didn't set "spark.streaming.kafka.maxRatePerPartition"? Will Spark Streamning try to consume all the messages in Kafka?

发自WPS邮箱客戶端在 Cody Koeninger ,2016年2月25日 上午11:58写道:The per partition offsets are part of the rdd as defined on the driver. Have you readhttps://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.mdand/or watchedhttps://www.youtube.com/watch?v=fXnNEq1v3VAOn Wed, Feb 24, 2016 at 9:05 PM, Yuhang Chen  wrote:Hi, as far as I know, there is a 1:1 mapping between Spark partition and Kafka partition, and in Spark's fault-tolerance mechanism, if a partition failed, another partition will be used to recompute those data. And my questions are below:When a partition (worker node) fails in Spark Streaming,1. Is its computation passed to another partition, or just waits for the failed partition to restart? 2. How does the restarted partition know the offset range it should consume from Kafka? It should consume the some data as the before-failed one, right?




Re: How does Spark streaming's Kafka direct stream survive from worker node failure?

2016-02-26 Thread Cody Koeninger
Yes.

On Thu, Feb 25, 2016 at 9:45 PM, yuhang.chenn 
wrote:

> Thanks a lot.
> And I got another question: What would happen if I didn't set
> "spark.streaming.kafka.maxRatePerPartition"? Will Spark Streamning try to
> consume all the messages in Kafka?
>
> 发自WPS邮箱客戶端
> 在 Cody Koeninger ,2016年2月25日 上午11:58写道:
>
> The per partition offsets are part of the rdd as defined on the driver.
> Have you read
>
> https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md
>
> and/or watched
>
> https://www.youtube.com/watch?v=fXnNEq1v3VA
>
> On Wed, Feb 24, 2016 at 9:05 PM, Yuhang Chen 
> wrote:
>
>> Hi, as far as I know, there is a 1:1 mapping between Spark partition and
>> Kafka partition, and in Spark's fault-tolerance mechanism, if a partition
>> failed, another partition will be used to recompute those data. And my
>> questions are below:
>>
>> When a partition (worker node) fails in Spark Streaming,
>> 1. Is its computation passed to another partition, or just waits for the
>> failed partition to restart?
>> 2. How does the restarted partition know the offset range it should
>> consume from Kafka? It should consume the some data as the before-failed
>> one, right?
>>
>
>


Re: How does Spark streaming's Kafka direct stream survive from worker node failure?

2016-02-26 Thread yuhang.chenn
Thanks a lot.
And I got another question: What would happen if I didn't set "spark.streaming.kafka.maxRatePerPartition"? Will Spark Streamning try to consume all the messages in Kafka?

发自WPS邮箱客戶端在 Cody Koeninger ,2016年2月25日 上午11:58写道:The per partition offsets are part of the rdd as defined on the driver. Have you readhttps://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.mdand/or watchedhttps://www.youtube.com/watch?v=fXnNEq1v3VAOn Wed, Feb 24, 2016 at 9:05 PM, Yuhang Chen  wrote:Hi, as far as I know, there is a 1:1 mapping between Spark partition and Kafka partition, and in Spark's fault-tolerance mechanism, if a partition failed, another partition will be used to recompute those data. And my questions are below:When a partition (worker node) fails in Spark Streaming,1. Is its computation passed to another partition, or just waits for the failed partition to restart? 2. How does the restarted partition know the offset range it should consume from Kafka? It should consume the some data as the before-failed one, right?


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How does Spark streaming's Kafka direct stream survive from worker node failure?

2016-02-24 Thread Cody Koeninger
The per partition offsets are part of the rdd as defined on the driver.
Have you read

https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md

and/or watched

https://www.youtube.com/watch?v=fXnNEq1v3VA

On Wed, Feb 24, 2016 at 9:05 PM, Yuhang Chen  wrote:

> Hi, as far as I know, there is a 1:1 mapping between Spark partition and
> Kafka partition, and in Spark's fault-tolerance mechanism, if a partition
> failed, another partition will be used to recompute those data. And my
> questions are below:
>
> When a partition (worker node) fails in Spark Streaming,
> 1. Is its computation passed to another partition, or just waits for the
> failed partition to restart?
> 2. How does the restarted partition know the offset range it should
> consume from Kafka? It should consume the some data as the before-failed
> one, right?
>


How does Spark streaming's Kafka direct stream survive from worker node failure?

2016-02-24 Thread Yuhang Chen
Hi, as far as I know, there is a 1:1 mapping between Spark partition and
Kafka partition, and in Spark's fault-tolerance mechanism, if a partition
failed, another partition will be used to recompute those data. And my
questions are below:

When a partition (worker node) fails in Spark Streaming,
1. Is its computation passed to another partition, or just waits for the
failed partition to restart?
2. How does the restarted partition know the offset range it should consume
from Kafka? It should consume the some data as the before-failed one, right?