[ 
https://issues.apache.org/jira/browse/FLINK-19069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190129#comment-17190129
 ] 

Till Rohrmann commented on FLINK-19069:
---------------------------------------

I think we should make sure that the user code is run in a non-blocking 
fashion. This means that we don't let the user/implementor of some user 
interface decide. What we have to make sure is that the call 
{{FinalizeOnMaster.finalizeGlobal}} is executed outside of the main thread at 
least.

In the {{ExecutionGraph}} we need to handle the concurrent results properly 
which also means to handle concurrent {{JobStatus}} changes of the 
{{ExecutionGraph}}. Also, one needs to think about what happens if the job gets 
cancelled concurrently. Assuming that the {{finalizeOnMaster}} calls belong to 
the lifetime of the {{ExecutionGraph}}, one would have to wait for these calls 
to finish before we can move the {{ExecutionGraph}} into a terminal state.

> finalizeOnMaster takes too much time and client timeouts
> --------------------------------------------------------
>
>                 Key: FLINK-19069
>                 URL: https://issues.apache.org/jira/browse/FLINK-19069
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.9.0, 1.10.0, 1.11.0, 1.12.0
>            Reporter: Jiayi Liao
>            Priority: Critical
>             Fix For: 1.12.0, 1.11.2, 1.10.3
>
>
> Currently we execute {{finalizeOnMaster}} in JM's main thread, which may 
> stuck the JM for a very long time and client timeouts eventually. 
> For example, we'd like to write data to HDFS  and commit files on JM, which 
> takes more than ten minutes to commit tens of thousands files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to