Sergey Zhemzhitsky created SPARK-21549: ------------------------------------------
Summary: Spark fails to abort job correctly in case of custom OutputFormat implementations Key: SPARK-21549 URL: https://issues.apache.org/jira/browse/SPARK-21549 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.0 Environment: spark 2.2.0 scala 2.11 Reporter: Sergey Zhemzhitsky Priority: Critical Spark fails to abort job correctly in case of custom OutputFormat implementations. There are OutputFormat implementations which do not need to use *mapreduce.output.fileoutputformat.outputdir* standard hadoop property. [But spark reads this property from the configuration.|https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala#L79] while setting up an OutputCommitter {code:javascript} val committer = FileCommitProtocol.instantiate( className = classOf[HadoopMapReduceCommitProtocol].getName, jobId = stageId.toString, outputPath = conf.value.get("mapreduce.output.fileoutputformat.outputdir"), isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] committer.setupJob(jobContext) {code} In that case if job fails Spark executes [committer.abortJob|https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala#L106] {code:javascript} committer.abortJob(jobContext) {code} ... and fails with the following exception {code} Can not create a Path from a null string java.lang.IllegalArgumentException: Can not create a Path from a null string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:123) at org.apache.hadoop.fs.Path.<init>(Path.java:135) at org.apache.hadoop.fs.Path.<init>(Path.java:89) at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.absPathStagingDir(HadoopMapReduceCommitProtocol.scala:58) at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.abortJob(HadoopMapReduceCommitProtocol.scala:141) at org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.write(SparkHadoopMapReduceWriter.scala:106) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1085) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1084) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org