[jira] [Closed] (FLINK-23466) UnalignedCheckpointITCase hangs on Azure

2021-11-16 Thread Piotr Nowojski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-23466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski closed FLINK-23466.
--
Resolution: Fixed

I've extracted the newly reported issue to FLINK-24919

> UnalignedCheckpointITCase hangs on Azure
> 
>
> Key: FLINK-23466
> URL: https://issues.apache.org/jira/browse/FLINK-23466
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.14.0
>Reporter: Dawid Wysakowicz
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.14.1
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=20813&view=logs&j=a57e0635-3fad-5b08-57c7-a4142d7d6fa9&t=2ef0effc-1da1-50e5-c2bd-aab434b1c5b7&l=16016
> The problem is the buffer listener will be removed from the listener queue 
> when notified and then it will be added to the listener queue again if it 
> needs more buffers. However, if some buffers are recycled meanwhile, the 
> buffer listener will not be notified of the available buffers. For example:
> 1. Thread 1 calls LocalBufferPool#recycle().
> 2. Thread 1 reaches LocalBufferPool#fireBufferAvailableNotification() and 
> listener.notifyBufferAvailable() is invoked, but Thread 1 sleeps before 
> acquiring the lock to registeredListeners.add(listener).
> 3. Thread 2 is being woken up as a result of notifyBufferAvailable() 
> call. It takes the buffer, but it needs more buffers.
> 4. Other threads, return all buffers, including this one that has been 
> recycled. None are taken. Are all in the LocalBufferPool.
> 5. Thread 1 wakes up, and continues fireBufferAvailableNotification() 
> invocation.
> 6. Thread 1 re-adds listener that's waiting for more buffer 
> registeredListeners.add(listener).
> 7. Thread 1 exits loop LocalBufferPool#recycle(MemorySegment, int) 
> inside, as the original memory segment has been used.
> At the end we have a state where all buffers are in the LocalBufferPool, so 
> no new recycle() calls will happen, but there is still one listener waiting 
> for a buffer (despite buffers being available).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (FLINK-23466) UnalignedCheckpointITCase hangs on Azure

2021-08-30 Thread Piotr Nowojski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-23466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski closed FLINK-23466.
--
Resolution: Fixed

Merged to master as 48a384dffc7
Merged to release-1.14 as 0067d35cc0f

> UnalignedCheckpointITCase hangs on Azure
> 
>
> Key: FLINK-23466
> URL: https://issues.apache.org/jira/browse/FLINK-23466
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.14.0
>Reporter: Dawid Wysakowicz
>Assignee: Yingjie Cao
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.14.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=20813&view=logs&j=a57e0635-3fad-5b08-57c7-a4142d7d6fa9&t=2ef0effc-1da1-50e5-c2bd-aab434b1c5b7&l=16016
> The problem is the buffer listener will be removed from the listener queue 
> when notified and then it will be added to the listener queue again if it 
> needs more buffers. However, if some buffers are recycled meanwhile, the 
> buffer listener will not be notified of the available buffers. For example:
> 1. Thread 1 calls LocalBufferPool#recycle().
> 2. Thread 1 reaches LocalBufferPool#fireBufferAvailableNotification() and 
> listener.notifyBufferAvailable() is invoked, but Thread 1 sleeps before 
> acquiring the lock to registeredListeners.add(listener).
> 3. Thread 2 is being woken up as a result of notifyBufferAvailable() 
> call. It takes the buffer, but it needs more buffers.
> 4. Other threads, return all buffers, including this one that has been 
> recycled. None are taken. Are all in the LocalBufferPool.
> 5. Thread 1 wakes up, and continues fireBufferAvailableNotification() 
> invocation.
> 6. Thread 1 re-adds listener that's waiting for more buffer 
> registeredListeners.add(listener).
> 7. Thread 1 exits loop LocalBufferPool#recycle(MemorySegment, int) 
> inside, as the original memory segment has been used.
> At the end we have a state where all buffers are in the LocalBufferPool, so 
> no new recycle() calls will happen, but there is still one listener waiting 
> for a buffer (despite buffers being available).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)