Re: repartition on direct kafka stream

2015-09-04 Thread Cody Koeninger
The answer already given is correct.  You shouldn't doubt this, because
you've already seen the shuffle data change accordingly.

On Fri, Sep 4, 2015 at 11:25 AM, Shushant Arora 
wrote:

> But Kafka stream has underlyng RDD which consists of offsets reanges only-
> so how does repartition works ?
>
> 1. First it evaluates the transformation and then repartition
> 2.or first it repartition and then transform. - In this case data should
> not be transformed rather offset ranges only should be repartition and
> shuffled.
>
>
>
> On Fri, Sep 4, 2015 at 10:24 AM, Saisai Shao 
> wrote:
>
>> Yes not the offset ranges, but the real data will be shuffled when you
>> using repartition().
>>
>> Thanks
>> Saisai
>>
>> On Fri, Sep 4, 2015 at 12:42 PM, Shushant Arora <
>> shushantaror...@gmail.com> wrote:
>>
>>> 1.Does repartitioning on direct kafka stream shuffles only the offsets
>>> or exact kafka messages across executors?
>>>
>>> Say I have a direct kafkastream
>>>
>>> directKafkaStream.repartition(numexecutors).mapPartitions(new
>>> FlatMapFunction>, String>(){
>>> ...
>>> }
>>>
>>> Say originally I have 5*numexceutor partitons in kafka.
>>>
>>> Now only the offset ranges should be shuffled to executors not exact
>>> kafka messages? But I am seeing a very large size of shuffles data
>>> read/write on streaming ui. When I remove this repartition - shuffle read
>>> /write becomes 0.
>>>
>>>
>>
>


Re: repartition on direct kafka stream

2015-09-04 Thread Shushant Arora
Yes agree shuffle data reveals that offsets+data is transformed.

Wanted to understand mapPartition or any transformation in (
directKafkaStream.repartition(numexecutors).mapPartitions(...))  is
happening before shuffle or after shuffle.

If after shuffle - is this due to the reason that very first
transformation/action on directkafkastream (be it repartition() or
mapPartition()) made directKafkaStream to evaluate and then repartition
made the data to shuffled and then mapPartition is called on shuffled data.

On Fri, Sep 4, 2015 at 10:33 PM, Cody Koeninger  wrote:

> The answer already given is correct.  You shouldn't doubt this, because
> you've already seen the shuffle data change accordingly.
>
> On Fri, Sep 4, 2015 at 11:25 AM, Shushant Arora  > wrote:
>
>> But Kafka stream has underlyng RDD which consists of offsets reanges
>> only- so how does repartition works ?
>>
>> 1. First it evaluates the transformation and then repartition
>> 2.or first it repartition and then transform. - In this case data should
>> not be transformed rather offset ranges only should be repartition and
>> shuffled.
>>
>>
>>
>> On Fri, Sep 4, 2015 at 10:24 AM, Saisai Shao 
>> wrote:
>>
>>> Yes not the offset ranges, but the real data will be shuffled when you
>>> using repartition().
>>>
>>> Thanks
>>> Saisai
>>>
>>> On Fri, Sep 4, 2015 at 12:42 PM, Shushant Arora <
>>> shushantaror...@gmail.com> wrote:
>>>
 1.Does repartitioning on direct kafka stream shuffles only the offsets
 or exact kafka messages across executors?

 Say I have a direct kafkastream

 directKafkaStream.repartition(numexecutors).mapPartitions(new
 FlatMapFunction>, String>(){
 ...
 }

 Say originally I have 5*numexceutor partitons in kafka.

 Now only the offset ranges should be shuffled to executors not exact
 kafka messages? But I am seeing a very large size of shuffles data
 read/write on streaming ui. When I remove this repartition - shuffle read
 /write becomes 0.


>>>
>>
>


repartition on direct kafka stream

2015-09-03 Thread Shushant Arora
1.Does repartitioning on direct kafka stream shuffles only the offsets or
exact kafka messages across executors?

Say I have a direct kafkastream

directKafkaStream.repartition(numexecutors).mapPartitions(new
FlatMapFunction>, String>(){
...
}

Say originally I have 5*numexceutor partitons in kafka.

Now only the offset ranges should be shuffled to executors not exact kafka
messages? But I am seeing a very large size of shuffles data read/write on
streaming ui. When I remove this repartition - shuffle read /write becomes
0.


Re: repartition on direct kafka stream

2015-09-03 Thread Saisai Shao
Yes not the offset ranges, but the real data will be shuffled when you
using repartition().

Thanks
Saisai

On Fri, Sep 4, 2015 at 12:42 PM, Shushant Arora 
wrote:

> 1.Does repartitioning on direct kafka stream shuffles only the offsets or
> exact kafka messages across executors?
>
> Say I have a direct kafkastream
>
> directKafkaStream.repartition(numexecutors).mapPartitions(new
> FlatMapFunction>, String>(){
> ...
> }
>
> Say originally I have 5*numexceutor partitons in kafka.
>
> Now only the offset ranges should be shuffled to executors not exact kafka
> messages? But I am seeing a very large size of shuffles data read/write on
> streaming ui. When I remove this repartition - shuffle read /write becomes
> 0.
>
>