陈梓立 created FLINK-9779:
--------------------------

             Summary: Remove SlotRequest timeout
                 Key: FLINK-9779
                 URL: https://issues.apache.org/jira/browse/FLINK-9779
             Project: Flink
          Issue Type: Improvement
          Components: JobManager, ResourceManager, TaskManager
            Reporter: 陈梓立


As is involved in FLINK-8643 and  FLINK-8653, we use external timeout to 
replace internal timeout of slot request. Follow the question: why not entirely 
remove this timeout mechanism? In our industrial case, this timeout mechanism 
causes more no-needed fail and makes resource allocation inaccurate.

I would propose to get rid of slot request timeout. Instead, we handle TM fail 
in RM where properly cancel pending request and if TM cannot offer slot to JM, 
we introduce a blacklist mechanism to nudge RM realloc for pending request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to