Till Rohrmann created FLINK-18012:
-------------------------------------

             Summary: Deactivate slot timeout if 
TaskSlotTable.tryMarkSlotActive is called
                 Key: FLINK-18012
                 URL: https://issues.apache.org/jira/browse/FLINK-18012
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Coordination
    Affects Versions: 1.10.1, 1.9.3, 1.11.0
            Reporter: Till Rohrmann
            Assignee: Till Rohrmann
             Fix For: 1.11.0, 1.10.2, 1.9.4


With FLINK-9932 we loosened the slot allocation protocol in a way that the 
{{JobMaster}} can deploy {{Tasks}} into a slot which has not been {{ACTIVATED}} 
but only {{ALLOCATED}} for a given job. This allowed to better handle the case 
where the {{JobMasterGateway#offerSlots}} response was late so that it timed 
out. The way it was solved is to offer a {{TaskSlotTable#tryMarkSlotActive}} 
method which, in contrast to {{TaskSlotTable#markSlotActive}}, would not fail 
if the requested slot was not available.

However, the problem is that the former method does not deactivate the slot 
timeout. Hence, it can happen if the {{offerSlots}} response never arrives at 
the {{TaskExecutor}} that an {{ACTIVATED}} slot times out.

In order to fix the problem, we should also deactivate the slot timeout when 
{{TaskSlotTable#tryMarkSlotActive}} is being called.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to