Re: How does flink read data from kafka number of TM's are more than topic partitions

2018-09-21 Thread Piotr Nowojski
Hi,

Yes, in your case half of the Kafka source tasks wouldn’t read/process any 
records (you can check that in web UI). This shouldn’t harm you, unless your 
records will be redistributed after the source. For example:

source.keyBy(..).process(new MyVeryHeavyOperator()).print()

Should be fine, because `keyBy(…)` will redistribute records. However

source.map(new MyVeryHeavyOperator()).print()

Will mean that half of `MyVeryHeavyOperator`s will be idling as well. To solve 
that, you might want to consider using 

dataStream.rebalance();

Piotrek

> On 21 Sep 2018, at 13:25, Taher Koitawala  wrote:
> 
> Hi All,
>  Let's say a topic in kafka has 5 partitions. If I spawn 10 Task 
> Managers with 1 slot each and parallelism is 10 then how will records be read 
> from the kafka topic if I use the FlinkKafkaConsumer to read.
> 
> Will 5 TM's read and the rest be ideal in that case? Is over subscribing the 
> number of TM's than the number of partitions in the Kafka topic guarantee 
> high throughput?
>  
> Regards,
> Taher Koitawala
> GS Lab Pune
> +91 8407979163



Re: How does flink read data from kafka number of TM's are more than topic partitions

2018-09-21 Thread Taher Koitawala
Thanks a lot for the explanation. That was exactly what I thought should
happen. However, it is always good to a clear confirmation.


Regards,
Taher Koitawala
GS Lab Pune
+91 8407979163


On Fri, Sep 21, 2018 at 6:26 PM Piotr Nowojski 
wrote:

> Hi,
>
> Yes, in your case half of the Kafka source tasks wouldn’t read/process any
> records (you can check that in web UI). This shouldn’t harm you, unless
> your records will be redistributed after the source. For example:
>
> source.keyBy(..).process(new MyVeryHeavyOperator()).print()
>
> Should be fine, because `keyBy(…)` will redistribute records. However
>
> source.map(new MyVeryHeavyOperator()).print()
>
> Will mean that half of `MyVeryHeavyOperator`s will be idling as well. To
> solve that, you might want to consider using
>
> dataStream.rebalance();
>
> Piotrek
>
> On 21 Sep 2018, at 13:25, Taher Koitawala 
> wrote:
>
> Hi All,
>  Let's say a topic in kafka has 5 partitions. If I spawn 10 Task
> Managers with 1 slot each and parallelism is 10 then how will records be
> read from the kafka topic if I use the FlinkKafkaConsumer to read.
>
> Will 5 TM's read and the rest be ideal in that case? Is over subscribing
> the number of TM's than the number of partitions in the Kafka topic
> guarantee high throughput?
>
> Regards,
> Taher Koitawala
> GS Lab Pune
> +91 8407979163
>
>
>


Re: How does flink read data from kafka number of TM's are more than topic partitions

2018-09-21 Thread Piotr Nowojski
No problem :)

Piotrek

> On 21 Sep 2018, at 15:04, Taher Koitawala  wrote:
> 
> Thanks a lot for the explanation. That was exactly what I thought should 
> happen. However, it is always good to a clear confirmation.
> 
> 
> Regards,
> Taher Koitawala
> GS Lab Pune
> +91 8407979163
> 
> 
> On Fri, Sep 21, 2018 at 6:26 PM Piotr Nowojski  > wrote:
> Hi,
> 
> Yes, in your case half of the Kafka source tasks wouldn’t read/process any 
> records (you can check that in web UI). This shouldn’t harm you, unless your 
> records will be redistributed after the source. For example:
> 
> source.keyBy(..).process(new MyVeryHeavyOperator()).print()
> 
> Should be fine, because `keyBy(…)` will redistribute records. However
> 
> source.map(new MyVeryHeavyOperator()).print()
> 
> Will mean that half of `MyVeryHeavyOperator`s will be idling as well. To 
> solve that, you might want to consider using 
> 
> dataStream.rebalance();
> 
> Piotrek
> 
>> On 21 Sep 2018, at 13:25, Taher Koitawala > > wrote:
>> 
>> Hi All,
>>  Let's say a topic in kafka has 5 partitions. If I spawn 10 Task 
>> Managers with 1 slot each and parallelism is 10 then how will records be 
>> read from the kafka topic if I use the FlinkKafkaConsumer to read.
>> 
>> Will 5 TM's read and the rest be ideal in that case? Is over subscribing the 
>> number of TM's than the number of partitions in the Kafka topic guarantee 
>> high throughput?
>>  
>> Regards,
>> Taher Koitawala
>> GS Lab Pune
>> +91 8407979163
>