[ 
https://issues.apache.org/jira/browse/FLINK-34096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Morávek resolved FLINK-34096.
-----------------------------------
    Resolution: Duplicate

> Upscale of kafka source operator leads to some splits getting lost
> ------------------------------------------------------------------
>
>                 Key: FLINK-34096
>                 URL: https://issues.apache.org/jira/browse/FLINK-34096
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 1.18.0
>            Reporter: Yang LI
>            Priority: Critical
>         Attachments: global-configuration-log.txt, 
> image-2024-01-15-15-46-47-104.png, image-2024-01-15-15-47-36-509.png, 
> image-2024-01-15-15-48-07-871.png, source-split-log.txt, substate-log.txt
>
>
> Hello,
> We've been conducting experiments with Autoscaling in Apache Flink version 
> 1.18.0 and encountered a bug associated with the Kafka source split.
> The issue manifested in our system as follows: upon experiencing a sudden 
> spike in traffic, the autoscaler opted to upscale the Kafka source vertex. 
> However, the Kafka source fetcher failed to retrieve all available Kafka 
> partitions. Additionally, we observed duplication in source splits. For 
> example, taskmanager-1 and taskmanager-4 both fetched the same Kafka 
> partition.
> !image-2024-01-15-15-46-47-104.png|width=423,height=381!
> !image-2024-01-15-15-48-07-871.png|width=719,height=329!
> {noformat}
> taskmanager-9 2024-01-09 17:59:37 [Source: kafka_source_input_with_kt -> Flat 
> Map -> session_valid (5/18)#6] INFO  o.a.f.c.b.s.r.SourceReaderBase - Adding 
> split(s) to reader: [[Partition: sf-enriched-4, StartingOffset: 26084169, 
> StoppingOffset: -9223372036854775808], [Partition: sf-anonymized-8, 
> StartingOffset: 46477069, StoppingOffset: -9223372036854775808], [Partition: 
> sf-anonymized-9, StartingOffset: 46121324, StoppingOffset: 
> -9223372036854775808], [Partition: sf-enriched-5, StartingOffset: 26221751, 
> StoppingOffset: -9223372036854775808]] 
> taskmanager-6 2024-01-09 17:59:37 [Source: kafka_source_input_with_kt -> Flat 
> Map -> session_valid (4/18)#6] INFO  o.a.f.c.b.s.r.SourceReaderBase - Adding 
> split(s) to reader: [[Partition: sf-enriched-4, StartingOffset: 26084169, 
> StoppingOffset: -9223372036854775808], [Partition: sf-anonymized-8, 
> StartingOffset: 46477069, StoppingOffset: -9223372036854775808], [Partition: 
> sf-anonymized-20, StartingOffset: 46211745, StoppingOffset: 
> -9223372036854775808], [Partition: sf-anonymized-32, StartingOffset: 
> 46340878, StoppingOffset: -9223372036854775808]] {noformat}
> You can see in these logs, taskmanager-9 and taskmanager-6 has both fetched 
> partition sf-enriched-4 and sf-anonymized-8
> Additional Questions
>  * During some other experiments which also lead to kafka partition issues, 
> we noticed that the autoscaler attempted to increase the parallelism of the 
> source vertex to a value that is not a divisor of the Kafka topic's partition 
> count. For example, it recommended a parallelism of 48 when the total 
> partition count was 72. In such scenarios: 
>  * 
>  ** Does kafka source connector vertex still suppose to works well when its 
> parallelism is not divisor of topic's partition count?
>  ** If this configuration is not ideal, should there be a mechanism within 
> the autoscaler to ensure that the parallelism of the source vertex always 
> matches the topic's partition count?
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to