Re: Kafka partition increased while Spark Streaming is running
Makes sense. thank you cody. Regards, Chandan On Fri, May 13, 2016 at 8:10 PM, Cody Koeningerwrote: > No, I wouldn't expect it to, once the stream is defined (at least for > the direct stream integration for kafka 0.8), the topicpartitions are > fixed. > > My answer to any question about "but what if checkpoints don't let me > do this" is always going to be "well, don't rely on checkpoints." > > If you want dynamic topicpartitions, > https://issues.apache.org/jira/browse/SPARK-12177 > > > On Fri, May 13, 2016 at 4:24 AM, chandan prakash > wrote: > > Follow up question : > > > > If spark streaming is using checkpointing (/tmp/checkpointDir) for > > AtLeastOnce and number of Topics or/and partitions has increased > then > > will gracefully shutting down and restarting from checkpoint will > consider > > new topics or/and partitions ? > > If the answer is NO then how to start from the same checkpoint with new > > partitions/topics included? > > > > Thanks, > > Chandan > > > > > > On Wed, Feb 24, 2016 at 9:30 PM, Cody Koeninger > wrote: > >> > >> That's correct, when you create a direct stream, you specify the > >> topicpartitions you want to be a part of the stream (the other method > for > >> creating a direct stream is just a convenience wrapper). > >> > >> On Wed, Feb 24, 2016 at 2:15 AM, 陈宇航 wrote: > >>> > >>> Here I use the 'KafkaUtils.createDirectStream' to integrate Kafka with > >>> Spark Streaming. I submitted the app, then I changed (increased) > Kafka's > >>> partition number after it's running for a while. Then I check the input > >>> offset with 'rdd.asInstanceOf[HasOffsetRanges].offsetRanges', seeing > that > >>> only the offset of the initial partitions are returned. > >>> > >>> Does this mean Spark Streaming's Kafka integration can't update its > >>> parallelism when Kafka's partition number is changed? > >> > >> > > > > > > > > -- > > Chandan Prakash > > > -- Chandan Prakash
Re: Kafka partition increased while Spark Streaming is running
No, I wouldn't expect it to, once the stream is defined (at least for the direct stream integration for kafka 0.8), the topicpartitions are fixed. My answer to any question about "but what if checkpoints don't let me do this" is always going to be "well, don't rely on checkpoints." If you want dynamic topicpartitions, https://issues.apache.org/jira/browse/SPARK-12177 On Fri, May 13, 2016 at 4:24 AM, chandan prakashwrote: > Follow up question : > > If spark streaming is using checkpointing (/tmp/checkpointDir) for > AtLeastOnce and number of Topics or/and partitions has increased then > will gracefully shutting down and restarting from checkpoint will consider > new topics or/and partitions ? > If the answer is NO then how to start from the same checkpoint with new > partitions/topics included? > > Thanks, > Chandan > > > On Wed, Feb 24, 2016 at 9:30 PM, Cody Koeninger wrote: >> >> That's correct, when you create a direct stream, you specify the >> topicpartitions you want to be a part of the stream (the other method for >> creating a direct stream is just a convenience wrapper). >> >> On Wed, Feb 24, 2016 at 2:15 AM, 陈宇航 wrote: >>> >>> Here I use the 'KafkaUtils.createDirectStream' to integrate Kafka with >>> Spark Streaming. I submitted the app, then I changed (increased) Kafka's >>> partition number after it's running for a while. Then I check the input >>> offset with 'rdd.asInstanceOf[HasOffsetRanges].offsetRanges', seeing that >>> only the offset of the initial partitions are returned. >>> >>> Does this mean Spark Streaming's Kafka integration can't update its >>> parallelism when Kafka's partition number is changed? >> >> > > > > -- > Chandan Prakash > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Kafka partition increased while Spark Streaming is running
Follow up question : If spark streaming is using checkpointing (/tmp/checkpointDir) for AtLeastOnce and number of Topics or/and partitions has increased then will gracefully shutting down and restarting from checkpoint will consider new topics or/and partitions ? If the answer is NO then how to start from the same checkpoint with new partitions/topics included? Thanks, Chandan On Wed, Feb 24, 2016 at 9:30 PM, Cody Koeningerwrote: > That's correct, when you create a direct stream, you specify the > topicpartitions you want to be a part of the stream (the other method for > creating a direct stream is just a convenience wrapper). > > On Wed, Feb 24, 2016 at 2:15 AM, 陈宇航 wrote: > >> Here I use the *'KafkaUtils.createDirectStream'* to integrate Kafka with >> Spark Streaming. I submitted the app, then I changed (increased) Kafka's >> partition number after it's running for a while. Then I check the input >> offset with '*rdd.asInstanceOf[HasOffsetRanges].offsetRanges*', seeing >> that only the offset of the initial partitions are returned. >> >> Does this mean Spark Streaming's Kafka integration can't update its >> parallelism when Kafka's partition number is changed? >> > > -- Chandan Prakash
Re: Kafka partition increased while Spark Streaming is running
That's correct, when you create a direct stream, you specify the topicpartitions you want to be a part of the stream (the other method for creating a direct stream is just a convenience wrapper). On Wed, Feb 24, 2016 at 2:15 AM, 陈宇航wrote: > Here I use the *'KafkaUtils.createDirectStream'* to integrate Kafka with > Spark Streaming. I submitted the app, then I changed (increased) Kafka's > partition number after it's running for a while. Then I check the input > offset with '*rdd.asInstanceOf[HasOffsetRanges].offsetRanges*', seeing > that only the offset of the initial partitions are returned. > > Does this mean Spark Streaming's Kafka integration can't update its > parallelism when Kafka's partition number is changed? >