[ https://issues.apache.org/jira/browse/FLINK-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997961#comment-15997961 ]
Till Rohrmann commented on FLINK-6434: -------------------------------------- Thanks for reporting the issue [~tiemsn]. This sounds like a bug and should be fixed. I think we could solve it the following way: We generate the {{AllocationID}} in {{ProviderAndOwner#allocateSlot}} and pass it to {{SlotPoolGateway#allocateSlot}}. On the returned future we register an exception handler which will call {{SlotPoolGateway#failAllocation}} with the generated {{AllocationID}}. That way we should be able to deal with timeouts on the {{Execution}} side. What do you think? > There may be allocatedSlots leak in SlotPool > -------------------------------------------- > > Key: FLINK-6434 > URL: https://issues.apache.org/jira/browse/FLINK-6434 > Project: Flink > Issue Type: Bug > Components: Cluster Management > Reporter: shuai.xu > Assignee: shuai.xu > Labels: flip-6 > > If the call allocateSlot() from Execution to Slotpool timeout, the job will > begin to failover, but the pending request are still in SlotPool, if then a > new slot register to SlotPool, it may be fulfill the outdated pending request > and be added to allocatedSlots, but it will never be used and will never be > recycled. -- This message was sent by Atlassian JIRA (v6.3.15#6346)