[ https://issues.apache.org/jira/browse/FLINK-19069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190129#comment-17190129 ]
Till Rohrmann commented on FLINK-19069: --------------------------------------- I think we should make sure that the user code is run in a non-blocking fashion. This means that we don't let the user/implementor of some user interface decide. What we have to make sure is that the call {{FinalizeOnMaster.finalizeGlobal}} is executed outside of the main thread at least. In the {{ExecutionGraph}} we need to handle the concurrent results properly which also means to handle concurrent {{JobStatus}} changes of the {{ExecutionGraph}}. Also, one needs to think about what happens if the job gets cancelled concurrently. Assuming that the {{finalizeOnMaster}} calls belong to the lifetime of the {{ExecutionGraph}}, one would have to wait for these calls to finish before we can move the {{ExecutionGraph}} into a terminal state. > finalizeOnMaster takes too much time and client timeouts > -------------------------------------------------------- > > Key: FLINK-19069 > URL: https://issues.apache.org/jira/browse/FLINK-19069 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.9.0, 1.10.0, 1.11.0, 1.12.0 > Reporter: Jiayi Liao > Priority: Critical > Fix For: 1.12.0, 1.11.2, 1.10.3 > > > Currently we execute {{finalizeOnMaster}} in JM's main thread, which may > stuck the JM for a very long time and client timeouts eventually. > For example, we'd like to write data to HDFS and commit files on JM, which > takes more than ten minutes to commit tens of thousands files. -- This message was sent by Atlassian Jira (v8.3.4#803005)