Hi, After updating from spark-1.0.0 to spark-1.1.0, my spark applications failed most of the time (but not always) in yarn-cluster mode (but not in yarn-client mode).
Here is my configuration: * spark-1.1.0 * hadoop-2.2.0 And the hadoop.tmp.dir definition in the hadoop core-site.xml file (each directory is on its own partition): <property> <name>hadoop.tmp.dir</name> <value>file:/d1/yarn/local,file:/d2/yarn/local,file:/d3/yarn/local,file:/d4/yarn/local,file:/d5/yarn/local,file:/d6/yarn/local,file:/d7/yarn/local</value> </property> After investigating, it turns out that the problem is when the executor fetches a jar file: the jar is downloaded in a temporary file, always in /d1/yarn/local (see hadoop.tmp.dir definition above), and then moved in one of the temporary directory defined in hadoop.tmp.dir: * if it is the same than the temporary file (i.e. /d1/yarn/local), then the application continues normally * if it is another one (i.e. /d2/yarn/local, /d3/yarn/local,...), it fails with the following error: 14/10/10 14:33:51 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 (TID 0) java.io.FileNotFoundException: ./logReader-1.0.10.jar (Permission denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.<init>(FileOutputStream.java:221) at com.google.common.io.Files$FileByteSink.openStream(Files.java:223) at com.google.common.io.Files$FileByteSink.openStream(Files.java:211) at com.google.common.io.ByteSource.copyTo(ByteSource.java:203) at com.google.common.io.Files.copy(Files.java:436) at com.google.common.io.Files.move(Files.java:651) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:440) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:325) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:323) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:323) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:158) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) I have no idea why the move fails when the source and target files are not on the same partition, for the moment I have worked around the problem with the attached patch (i.e. I ensure that the temp file and the moved file are always on the same partition). Any thought about this problem? Thanks! Christophe. ________________________________ Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
--- core/src/main/scala/org/apache/spark/util/Utils.scala.orig 2014-09-03 08:00:33.000000000 +0200 +++ core/src/main/scala/org/apache/spark/util/Utils.scala 2014-10-10 17:51:59.000000000 +0200 @@ -349,8 +349,7 @@ */ def fetchFile(url: String, targetDir: File, conf: SparkConf, securityMgr: SecurityManager) { val filename = url.split("/").last - val tempDir = getLocalDir(conf) - val tempFile = File.createTempFile("fetchFileTemp", null, new File(tempDir)) + val tempFile = File.createTempFile("fetchFileTemp", null, new File(targetDir.getAbsolutePath)) val targetFile = new File(targetDir, filename) val uri = new URI(url) val fileOverwrite = conf.getBoolean("spark.files.overwrite", false)
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org