[ 
https://issues.apache.org/jira/browse/FLINK-13203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-13203:
-----------------------------------
    Priority: Major  (was: Critical)

> [proper fix] Deadlock occurs when requiring exclusive buffer for 
> RemoteInputChannel
> -----------------------------------------------------------------------------------
>
>                 Key: FLINK-13203
>                 URL: https://issues.apache.org/jira/browse/FLINK-13203
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network
>    Affects Versions: 1.9.0
>            Reporter: Piotr Nowojski
>            Priority: Major
>              Labels: auto-deprioritized-critical
>
> The issue is during requesting exclusive buffers with a timeout. Since 
> currently the number of maximum buffers and the number of required buffers 
> are not the same for local buffer pools, there may be cases that the local 
> buffer pools of the upstream tasks occupy all the buffers while the 
> downstream tasks fail to acquire exclusive buffers to make progress. As for 
> 1.9 in https://issues.apache.org/jira/browse/FLINK-12852 deadlock was avoided 
> by adding a timeout to try to failover the current execution when the timeout 
> occurs and tips users to increase the number of buffers in the exception 
> message.
> In the discussion under the https://issues.apache.org/jira/browse/FLINK-12852 
> there were numerous proper solutions discussed and as for now there is no 
> consensus how to fix it:
> 1. Only allocate the minimum per producer, which is one buffer per channel. 
> This would be needed to keep the requirement similar to what we have at the 
> moment, but it is much less than we recommend for the credit-based network 
> data exchange (2* channels + floating)
> 2a. Coordinate the deployment sink-to-source such that receivers always have 
> their buffers first. This will be complex to implement and coordinate and 
> break with many assumptions about tasks being independent (coordination wise) 
> on the TaskManagers. Giving that assumption up will be a pretty big step and 
> cause lot's of complexity in the future.
> {quote}
> It will also increase deployment delays. Low deployment delays should be a 
> design goal in my opinion, as it will enable other features more easily, like 
> low-disruption upgrades, etc.
> {quote}
> 2b. Assign extra buffers only once all of the tasks are RUNNING. This is a 
> simplified version of 2a, without tracking the tasks sink-to-source.
> 3. Make buffers always revokable, by spilling.
> This is tricky to implement very efficiently, especially because there is the 
> logic that slices buffers for early sends for the low-latency streaming stuff
> the spilling request will come from an asynchronous call. That will probably 
> stay like that even with the mailbox, because the main thread will be 
> frequently blocked on buffer allocation when this request comes.
> 4. We allocate the recommended number for good throughput (2*numChannels + 
> floating) per consumer and per producer.
> No dynamic rebalancing any more. This would increase the number of required 
> network buffers in certain high-parallelism scenarios quite a bit with the 
> default config. Users can down-configure this by setting the per-channel 
> buffers lower. But it would break user setups and require them to adjust the 
> config when upgrading.
> 5. We make the network resource per slot and ask the scheduler to attach 
> information about how many producers and how many consumers will be in the 
> slot, worst case. We use that to pre-compute how many excess buffers the 
> producers may take.
> This will also break with some assumptions and lead us to the point that we 
> have to pre-compute network buffers in the same way as managed memory. Seeing 
> how much pain it is with the managed memory, this seems not so great.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to