[ https://issues.apache.org/jira/browse/SPARK-21549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Zhemzhitsky updated SPARK-21549: --------------------------------------- Priority: Blocker (was: Critical) > Spark fails to complete job correctly in case of OutputFormat which do not > write into hdfs > ------------------------------------------------------------------------------------------ > > Key: SPARK-21549 > URL: https://issues.apache.org/jira/browse/SPARK-21549 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.2.0 > Environment: spark 2.2.0 > scala 2.11 > Reporter: Sergey Zhemzhitsky > Priority: Blocker > > Spark fails to complete job correctly in case of custom OutputFormat > implementations. > There are OutputFormat implementations which do not need to use > *mapreduce.output.fileoutputformat.outputdir* standard hadoop property. > [But spark reads this property from the > configuration|https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala#L79] > while setting up an OutputCommitter > {code:javascript} > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > {code} > ... and then uses this property later on while [commiting the > job|https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L132], > [aborting the > job|https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L141], > [creating task's temp > path|https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L95] > In that cases when the job completes then following exception is thrown > {code} > Can not create a Path from a null string > java.lang.IllegalArgumentException: Can not create a Path from a null string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:123) > at org.apache.hadoop.fs.Path.<init>(Path.java:135) > at org.apache.hadoop.fs.Path.<init>(Path.java:89) > at > org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.absPathStagingDir(HadoopMapReduceCommitProtocol.scala:58) > at > org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.abortJob(HadoopMapReduceCommitProtocol.scala:141) > at > org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.write(SparkHadoopMapReduceWriter.scala:106) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1085) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1084) > ... > {code} > So it seems that all the jobs which use OutputFormats which don't write data > into HDFS-compatible file systems are broken. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org