[ 
https://issues.apache.org/jira/browse/SPARK-16736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16736:
------------------------------------

    Assignee: Apache Spark

> remove redundant FileSystem status checks calls from Spark codebase
> -------------------------------------------------------------------
>
>                 Key: SPARK-16736
>                 URL: https://issues.apache.org/jira/browse/SPARK-16736
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.0.0
>            Reporter: Steve Loughran
>            Assignee: Apache Spark
>            Priority: Minor
>
> The Hadoop {{FileSystem.exists()}} and {{FileSystem.isDirectory()}} calls are 
> wrappers around FileSystem.getStatus(), —the latter putting load on an HDFS 
> NN, and very, very slow against object stores.
> # if these calls are followed by any getStatus() calls then they can be 
> eliminated by careful merging and pulling out the catching of 
> {FileNotFoundException}} from the exists() call to the spark code.
> # Any sequence of exists + delete can be optimised by removing the exists 
> check, relying on {{FileSystem.delete()}} to be a no-op if the destination 
> path is not present. That's a tested requirement of all Hadoop compatible FS 
> and object stores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to