[ https://issues.apache.org/jira/browse/SPARK-16736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran updated SPARK-16736: ----------------------------------- Summary: remove redundant FileSystem status checks calls from Spark codebase (was: remove redundant FileSystem.exists() calls from Spark codebase) > remove redundant FileSystem status checks calls from Spark codebase > ------------------------------------------------------------------- > > Key: SPARK-16736 > URL: https://issues.apache.org/jira/browse/SPARK-16736 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.0.0 > Reporter: Steve Loughran > Priority: Minor > > The Hadoop {{FileSystem.exists()}} and {{FileSystem.isDirectory()}} calls are > wrappers around FileSystem.getStatus(), —the latter putting load on an HDFS > NN, and very, very slow against object stores. > # if these calls are followed by any getStatus() calls then they can be > eliminated by careful merging and pulling out the catching of > {FileNotFoundException}} from the exists() call to the spark code. > # Any sequence of exists + delete can be optimised by removing the exists > check, relying on {{FileSystem.delete()}} to be a no-op if the destination > path is not present. That's a tested requirement of all Hadoop compatible FS > and object stores. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org