[ 
https://issues.apache.org/jira/browse/SPARK-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14175748#comment-14175748
 ] 

Sean Owen commented on SPARK-3967:
----------------------------------

You guys should make PRs for these. I am also not sure if it's so necessary to 
download the file into a temp directory and move it... it may cause a copy 
instead of rename, and in fact does here, and so is not like the file appears 
in the target dir atomically anyway. I'm not sure the code here cleans up the 
partially downloaded file in case of error and that could leave a broken file 
in the target dir instead of just a temp dir.

The change to not copy the file when identical looks sound; I bet you can avoid 
checking if it exists twice.

> Spark applications fail in yarn-cluster mode when the directories configured 
> in yarn.nodemanager.local-dirs are located on different disks/partitions
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-3967
>                 URL: https://issues.apache.org/jira/browse/SPARK-3967
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Christophe PRÉAUD
>         Attachments: spark-1.1.0-utils-fetch.patch, 
> spark-1.1.0-yarn_cluster_tmpdir.patch
>
>
> Spark applications fail from time to time in yarn-cluster mode (but not in 
> yarn-client mode) when yarn.nodemanager.local-dirs (Hadoop YARN config) is 
> set to a comma-separated list of directories which are located on different 
> disks/partitions.
> Steps to reproduce:
> 1. Set yarn.nodemanager.local-dirs (in yarn-site.xml) to a list of 
> directories located on different partitions (the more you set, the more 
> likely it will be to reproduce the bug):
> (...)
> <property>
>   <name>yarn.nodemanager.local-dirs</name>
>   
> <value>file:/d1/yarn/local/nm-local-dir,file:/d2/yarn/local/nm-local-dir,file:/d3/yarn/local/nm-local-dir,file:/d4/yarn/local/nm-local-dir,file:/d5/yarn/local/nm-local-dir,file:/d6/yarn/local/nm-local-dir,file:/d7/yarn/local/nm-local-dir</value>
> </property>
> (...)
> 2. Launch (several times) an application in yarn-cluster mode, it will fail 
> (apparently randomly) from time to time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to