[ 
https://issues.apache.org/jira/browse/FLINK-20217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski updated FLINK-20217:
-----------------------------------
    Description: 
Timers are currently processed in one big block under the checkpoint lock 
(under {{InternalTimerServiceImpl#advanceWatermark}}. This can be problematic 
in a number of scenarios while doing checkpointing which would lead to 
checkpoints timing out (and even unaligned checkpoints would not help).

If you have a huge number of timers to process when advancing the watermark and 
the task is also back-pressured, the situation may actually be worse since you 
would block on the checkpoint lock and also wait for buffers/credits from the 
receiver.

I propose to make this loop more fine-grained so that it is interruptible by 
checkpoints, but maybe there is also some other way to improve here.

This issue has been for example observed here: 
https://lists.apache.org/thread/f6ffk9912fg5j1rfkxbzrh0qmp4w6qry

  was:
Timers are currently processed in one big block under the checkpoint lock 
(under {{InternalTimerServiceImpl#advanceWatermark}}. This can be problematic 
in a number of scenarios while doing checkpointing which would lead to 
checkpoints timing out (and even unaligned checkpoints would not help).

If you have a huge number of timers to process when advancing the watermark and 
the task is also back-pressured, the situation may actually be worse since you 
would block on the checkpoint lock and also wait for buffers/credits from the 
receiver.

I propose to make this loop more fine-grained so that it is interruptible by 
checkpoints, but maybe there is also some other way to improve here.


> More fine-grained timer processing
> ----------------------------------
>
>                 Key: FLINK-20217
>                 URL: https://issues.apache.org/jira/browse/FLINK-20217
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / DataStream, Runtime / Task
>    Affects Versions: 1.10.2, 1.11.2, 1.12.0
>            Reporter: Nico Kruber
>            Priority: Not a Priority
>              Labels: auto-deprioritized-major, auto-deprioritized-minor
>
> Timers are currently processed in one big block under the checkpoint lock 
> (under {{InternalTimerServiceImpl#advanceWatermark}}. This can be problematic 
> in a number of scenarios while doing checkpointing which would lead to 
> checkpoints timing out (and even unaligned checkpoints would not help).
> If you have a huge number of timers to process when advancing the watermark 
> and the task is also back-pressured, the situation may actually be worse 
> since you would block on the checkpoint lock and also wait for 
> buffers/credits from the receiver.
> I propose to make this loop more fine-grained so that it is interruptible by 
> checkpoints, but maybe there is also some other way to improve here.
> This issue has been for example observed here: 
> https://lists.apache.org/thread/f6ffk9912fg5j1rfkxbzrh0qmp4w6qry



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to