ding created SPARK-5418: --------------------------- Summary: Output directory for shuffle should consider left space of each directory set in conf Key: SPARK-5418 URL: https://issues.apache.org/jira/browse/SPARK-5418 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 1.2.0 Environment: Ubuntu, others should be similar Reporter: ding Priority: Minor
I set multiple directorys in conf spark.local.dir as "scratch" space, one of them(eg. /mnt/disk1) have 30G left space while others(eg./mnt/disk2) have 100G. In current version, spark use hash to figure out which directory is used for "scratch" space. It means each directory has the same chance. After hounds of iteration of pagerank, there is "No space left" exception and driver crashes. It does not make sense since there is still 70G+ left space in other directorys. We should take consider left space on each directorys when figure out which directory should be map output dir. I will send a PR for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org