Re: Kafka partition increased while Spark Streaming is running

2016-05-13 Thread chandan prakash
Makes sense. thank you cody.

Regards,
Chandan

On Fri, May 13, 2016 at 8:10 PM, Cody Koeninger  wrote:

> No, I wouldn't expect it to, once the stream is defined (at least for
> the direct stream integration for kafka 0.8), the topicpartitions are
> fixed.
>
> My answer to any question about "but what if checkpoints don't let me
> do this" is always going to be "well, don't rely on checkpoints."
>
> If you want dynamic topicpartitions,
> https://issues.apache.org/jira/browse/SPARK-12177
>
>
> On Fri, May 13, 2016 at 4:24 AM, chandan prakash
>  wrote:
> > Follow up question :
> >
> >  If spark streaming is using checkpointing (/tmp/checkpointDir)  for
> > AtLeastOnce and  number of Topics or/and partitions has increased
> then
> >  will gracefully shutting down and restarting from checkpoint will
> consider
> > new topics or/and partitions ?
> >  If the answer is NO then how to start from the same checkpoint with new
> > partitions/topics included?
> >
> > Thanks,
> > Chandan
> >
> >
> > On Wed, Feb 24, 2016 at 9:30 PM, Cody Koeninger 
> wrote:
> >>
> >> That's correct, when you create a direct stream, you specify the
> >> topicpartitions you want to be a part of the stream (the other method
> for
> >> creating a direct stream is just a convenience wrapper).
> >>
> >> On Wed, Feb 24, 2016 at 2:15 AM, 陈宇航  wrote:
> >>>
> >>> Here I use the 'KafkaUtils.createDirectStream' to integrate Kafka with
> >>> Spark Streaming. I submitted the app, then I changed (increased)
> Kafka's
> >>> partition number after it's running for a while. Then I check the input
> >>> offset with 'rdd.asInstanceOf[HasOffsetRanges].offsetRanges', seeing
> that
> >>> only the offset of the initial partitions are returned.
> >>>
> >>> Does this mean Spark Streaming's Kafka integration can't update its
> >>> parallelism when Kafka's partition number is changed?
> >>
> >>
> >
> >
> >
> > --
> > Chandan Prakash
> >
>



-- 
Chandan Prakash


Re: Kafka partition increased while Spark Streaming is running

2016-05-13 Thread Cody Koeninger
No, I wouldn't expect it to, once the stream is defined (at least for
the direct stream integration for kafka 0.8), the topicpartitions are
fixed.

My answer to any question about "but what if checkpoints don't let me
do this" is always going to be "well, don't rely on checkpoints."

If you want dynamic topicpartitions,
https://issues.apache.org/jira/browse/SPARK-12177


On Fri, May 13, 2016 at 4:24 AM, chandan prakash
 wrote:
> Follow up question :
>
>  If spark streaming is using checkpointing (/tmp/checkpointDir)  for
> AtLeastOnce and  number of Topics or/and partitions has increased  then
>  will gracefully shutting down and restarting from checkpoint will consider
> new topics or/and partitions ?
>  If the answer is NO then how to start from the same checkpoint with new
> partitions/topics included?
>
> Thanks,
> Chandan
>
>
> On Wed, Feb 24, 2016 at 9:30 PM, Cody Koeninger  wrote:
>>
>> That's correct, when you create a direct stream, you specify the
>> topicpartitions you want to be a part of the stream (the other method for
>> creating a direct stream is just a convenience wrapper).
>>
>> On Wed, Feb 24, 2016 at 2:15 AM, 陈宇航  wrote:
>>>
>>> Here I use the 'KafkaUtils.createDirectStream' to integrate Kafka with
>>> Spark Streaming. I submitted the app, then I changed (increased) Kafka's
>>> partition number after it's running for a while. Then I check the input
>>> offset with 'rdd.asInstanceOf[HasOffsetRanges].offsetRanges', seeing that
>>> only the offset of the initial partitions are returned.
>>>
>>> Does this mean Spark Streaming's Kafka integration can't update its
>>> parallelism when Kafka's partition number is changed?
>>
>>
>
>
>
> --
> Chandan Prakash
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Kafka partition increased while Spark Streaming is running

2016-05-13 Thread chandan prakash
Follow up question :

 If spark streaming is using checkpointing (/tmp/checkpointDir)  for
AtLeastOnce and  number of Topics or/and partitions has increased  then

 will gracefully shutting down and restarting from checkpoint will consider
new topics or/and partitions ?
 If the answer is NO then how to start from the same checkpoint with new
partitions/topics included?

Thanks,
Chandan


On Wed, Feb 24, 2016 at 9:30 PM, Cody Koeninger  wrote:

> That's correct, when you create a direct stream, you specify the
> topicpartitions you want to be a part of the stream (the other method for
> creating a direct stream is just a convenience wrapper).
>
> On Wed, Feb 24, 2016 at 2:15 AM, 陈宇航  wrote:
>
>> Here I use the *'KafkaUtils.createDirectStream'* to integrate Kafka with
>> Spark Streaming. I submitted the app, then I changed (increased) Kafka's
>> partition number after it's running for a while. Then I check the input
>> offset with '*rdd.asInstanceOf[HasOffsetRanges].offsetRanges*', seeing
>> that only the offset of the initial partitions are returned.
>>
>> Does this mean Spark Streaming's Kafka integration can't update its
>> parallelism when Kafka's partition number is changed?
>>
>
>


-- 
Chandan Prakash


Re: Kafka partition increased while Spark Streaming is running

2016-02-24 Thread Cody Koeninger
That's correct, when you create a direct stream, you specify the
topicpartitions you want to be a part of the stream (the other method for
creating a direct stream is just a convenience wrapper).

On Wed, Feb 24, 2016 at 2:15 AM, 陈宇航  wrote:

> Here I use the *'KafkaUtils.createDirectStream'* to integrate Kafka with
> Spark Streaming. I submitted the app, then I changed (increased) Kafka's
> partition number after it's running for a while. Then I check the input
> offset with '*rdd.asInstanceOf[HasOffsetRanges].offsetRanges*', seeing
> that only the offset of the initial partitions are returned.
>
> Does this mean Spark Streaming's Kafka integration can't update its
> parallelism when Kafka's partition number is changed?
>