Matthew Ho created GOBBLIN-1619:
-----------------------------------
Summary: WriterUtils.mkdirsWithRecursivePermission contains race
condition and puts unnecessary load on filesystem
Key: GOBBLIN-1619
URL: https://issues.apache.org/jira/browse/GOBBLIN-1619
Project: Apache Gobblin
Issue Type: Bug
Reporter: Matthew Ho
The current implementation recursively calls fs.mkdirs has the following issues:
* *Race condition for creating parent directories, causing FileNotFound
exception even when the file exists on file system*
* {*}HDFS fs.mkdirs atomically creates missing parent directories. Thus, the
recursive approach is making unnecessary calls.{*}{*}{*}
HDFS, which the current FileSystem interface is built upon, guarantees the
parents will be created. So all FileSystem class implementations should also
follow this behavior.
*Note the
[FileSystem|https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileSystem.html]
abstract class documentation says the following:*
The behaviour of the filesystem is [specified in the Hadoop documentation.
|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html]However,
the normative specification of the behavior of this class is actually HDFS:
{color:#de350b}if HDFS does not behave the way these Javadocs or the
specification in the Hadoop documentations define, assume that the
documentation is incorrect{color}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)