Hi,
   I am using KafkaSource API read from 6 topics within Kafka. Flink version - 
1.14.3. Each and every kafka topic my Flink pipeline reads from is having a 
different load but same number of partitions (lets say 30). For example 
partition 0 of topic 1 and partition 0 of topic 2 have different loads but all 
the partitions of a given topic have similar load. So the total number of 
partitions my pipeline reads from is 30 * 6 = 180 partitions.


  1.  Initially I started with 15 task slots. Each task slot is reading from 12 
partitions (2 partitions of topic1, 2 partitions of topic2,……)
  2.  I took a savepoint and redeployed the application from savepoint by 
scaling it up to 20 task slots.
  3.  I then took another savepoint and redeployed the application from new 
savepoint by scaling it down back to 15 task slots.

  I then observe that each task slot is reading from 12 partitions but not 2 
partitions of each topic. For example, task slot 0 is reading from 3 partitions 
of topic1, 3 partitions of topic2, 2 partitions of topic3, 2 partitions of 
topic 4, 1 partition of topic 5 and 1 partition of topic 6. As the load is not 
same across partitions of different topics, all the pods are not processing 
same number of records and we are seeing lag from the few pods which are 
processing partitions of heavy throughput.

This issue is coming up during scaling up and scaling down. I suspect the 
partition distribution is going wrong during scaling up and scaling down when 
reading from savepoint.


Can someone please help with this.


Thanks,
Sandeep

Reply via email to