Follow up question : If spark streaming is using checkpointing (/tmp/checkpointDir) for AtLeastOnce and number of Topics or/and partitions has increased then....
will gracefully shutting down and restarting from checkpoint will consider new topics or/and partitions ? If the answer is NO then how to start from the same checkpoint with new partitions/topics included? Thanks, Chandan On Wed, Feb 24, 2016 at 9:30 PM, Cody Koeninger <c...@koeninger.org> wrote: > That's correct, when you create a direct stream, you specify the > topicpartitions you want to be a part of the stream (the other method for > creating a direct stream is just a convenience wrapper). > > On Wed, Feb 24, 2016 at 2:15 AM, 陈宇航 <yuhang.c...@foxmail.com> wrote: > >> Here I use the *'KafkaUtils.createDirectStream'* to integrate Kafka with >> Spark Streaming. I submitted the app, then I changed (increased) Kafka's >> partition number after it's running for a while. Then I check the input >> offset with '*rdd.asInstanceOf[HasOffsetRanges].offsetRanges*', seeing >> that only the offset of the initial partitions are returned. >> >> Does this mean Spark Streaming's Kafka integration can't update its >> parallelism when Kafka's partition number is changed? >> > > -- Chandan Prakash