[
https://issues.apache.org/jira/browse/FLINK-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15751246#comment-15751246
]
ASF GitHub Bot commented on FLINK-5332:
---------------------------------------
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/2999
Manually merged in 2f3ad58b7b73463aa1827baef0eb2e9d87fdb882
> Non-thread safe FileSystem::initOutPathLocalFS() can cause lost
> files/directories in local execution
> ----------------------------------------------------------------------------------------------------
>
> Key: FLINK-5332
> URL: https://issues.apache.org/jira/browse/FLINK-5332
> Project: Flink
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.2.0
> Reporter: Stephan Ewen
> Assignee: Stephan Ewen
> Priority: Critical
> Fix For: 1.2.0
>
>
> This is mainly relevant to tests and Local Mini Cluster executions.
> The {{FileOutputFormat}} and its subclasses rely on
> {{FileSystem::initOutPathLocalFS()}} to prepare the output directory. When
> multiple parallel output writers call that method, there is a slim chance
> that one parallel threads deletes the others directory. The checks that the
> method has are not bullet proof.
> I believe that this is the cause for many Travis test instabilities that we
> observed over time.
> Simply synchronizing that method per process should do the trick. Since it is
> a rare initialization method, and only relevant in tests & local mini cluster
> executions, it should be a price that is okay to pay. I see no other way, as
> we do not have simple access to an atomic "check and delete and recreate"
> file operation.
> The synchronization also makes many "re-try" code paths obsolete (there
> should be no re-tries needed on proper file systems).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)