[ https://issues.apache.org/jira/browse/FLINK-34400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17815253#comment-17815253 ]
Alexis Sarda-Espinosa commented on FLINK-34400: ----------------------------------------------- Ah, let me clarify. The setup is roughly like this: * Source 1 with parallelism 2 consuming from topic A that continuously receives messages ** I have 2 Task Managers, so each source reader should be consuming approximately 15 partitions in each TM. * Source 2 with parallelism 1 consuming from topic B that receives messages rarely and has stayed mostly empty during my experiments So, what I meant is that Source 1 is showing lag in only one of its readers, and the corresponding error logs only show in 1 TM, the other reader and its TM never show errors. On the other hand, why would lag start showing up only after 15 minutes or so? I will probably enable idleness anyway, but I was testing both scenarios and I thing this inconsistencies are kind of unexpected. > Kafka sources with watermark alignment sporadically stop consuming > ------------------------------------------------------------------ > > Key: FLINK-34400 > URL: https://issues.apache.org/jira/browse/FLINK-34400 > Project: Flink > Issue Type: Bug > Affects Versions: 1.18.1 > Reporter: Alexis Sarda-Espinosa > Priority: Major > Attachments: alignment_lags.png, logs.txt > > > I have 2 Kafka sources that read from different topics. I have assigned them > to the same watermark alignment group, and I have _not_ enabled idleness > explicitly in their watermark strategies. One topic remains pretty much empty > most of the time, while the other receives a few events per second all the > time. Parallelism of the active source is 2, for the other one it's 1, and > checkpoints are once every minute. > This works correctly for some time (10 - 15 minutes in my case) but then 1 of > the active sources stops consuming, which causes lag to increase. Weirdly, > after another 15 minutes or so, all the backlog is consumed at once, and then > everything stops again. > I'm attaching some logs from the Task Manager where the issue appears. You > will notice that the Kafka network client reports disconnections (a long time > after the deserializer stopped reporting that events were being consumed), > I'm not sure if this is related. -- This message was sent by Atlassian Jira (v8.20.10#820010)