[ https://issues.apache.org/jira/browse/FLINK-13203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Flink Jira Bot updated FLINK-13203: ----------------------------------- Priority: Major (was: Critical) > [proper fix] Deadlock occurs when requiring exclusive buffer for > RemoteInputChannel > ----------------------------------------------------------------------------------- > > Key: FLINK-13203 > URL: https://issues.apache.org/jira/browse/FLINK-13203 > Project: Flink > Issue Type: Bug > Components: Runtime / Network > Affects Versions: 1.9.0 > Reporter: Piotr Nowojski > Priority: Major > Labels: auto-deprioritized-critical > > The issue is during requesting exclusive buffers with a timeout. Since > currently the number of maximum buffers and the number of required buffers > are not the same for local buffer pools, there may be cases that the local > buffer pools of the upstream tasks occupy all the buffers while the > downstream tasks fail to acquire exclusive buffers to make progress. As for > 1.9 in https://issues.apache.org/jira/browse/FLINK-12852 deadlock was avoided > by adding a timeout to try to failover the current execution when the timeout > occurs and tips users to increase the number of buffers in the exception > message. > In the discussion under the https://issues.apache.org/jira/browse/FLINK-12852 > there were numerous proper solutions discussed and as for now there is no > consensus how to fix it: > 1. Only allocate the minimum per producer, which is one buffer per channel. > This would be needed to keep the requirement similar to what we have at the > moment, but it is much less than we recommend for the credit-based network > data exchange (2* channels + floating) > 2a. Coordinate the deployment sink-to-source such that receivers always have > their buffers first. This will be complex to implement and coordinate and > break with many assumptions about tasks being independent (coordination wise) > on the TaskManagers. Giving that assumption up will be a pretty big step and > cause lot's of complexity in the future. > {quote} > It will also increase deployment delays. Low deployment delays should be a > design goal in my opinion, as it will enable other features more easily, like > low-disruption upgrades, etc. > {quote} > 2b. Assign extra buffers only once all of the tasks are RUNNING. This is a > simplified version of 2a, without tracking the tasks sink-to-source. > 3. Make buffers always revokable, by spilling. > This is tricky to implement very efficiently, especially because there is the > logic that slices buffers for early sends for the low-latency streaming stuff > the spilling request will come from an asynchronous call. That will probably > stay like that even with the mailbox, because the main thread will be > frequently blocked on buffer allocation when this request comes. > 4. We allocate the recommended number for good throughput (2*numChannels + > floating) per consumer and per producer. > No dynamic rebalancing any more. This would increase the number of required > network buffers in certain high-parallelism scenarios quite a bit with the > default config. Users can down-configure this by setting the per-channel > buffers lower. But it would break user setups and require them to adjust the > config when upgrading. > 5. We make the network resource per slot and ask the scheduler to attach > information about how many producers and how many consumers will be in the > slot, worst case. We use that to pre-compute how many excess buffers the > producers may take. > This will also break with some assumptions and lead us to the point that we > have to pre-compute network buffers in the same way as managed memory. Seeing > how much pain it is with the managed memory, this seems not so great. -- This message was sent by Atlassian Jira (v8.3.4#803005)