[ https://issues.apache.org/jira/browse/SPARK-8079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheng Lian updated SPARK-8079: ------------------------------ Target Version/s: 1.4.1, 1.5.0 (was: 1.4.0, 1.4.1) > NPE when HadoopFsRelation.prepareForWriteJob throws exception > ------------------------------------------------------------- > > Key: SPARK-8079 > URL: https://issues.apache.org/jira/browse/SPARK-8079 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.4.0 > Reporter: Cheng Lian > Assignee: Cheng Lian > Labels: backport-needed > > Take {{ParquetRelation2}} as an example, the following Spark shell code may > cause an unexpected NPE: > {code} > import sqlContext._ > import sqlContext.implicits._ > range(1, 3).select($"id" as "a > b").write.format("parquet").save("file:///tmp/foo") > {code} > Exceptions thrown: > {noformat} > import sqlContext._ > import sqlContext.implicits._ > range(1, 3).select($"id" as "a > b").write.format("parquet").save("file:///tmp/foo") > java.lang.RuntimeException: Attribute name "a b" contains invalid > character(s) among " ,;{}() =". Please use alias to rename it. > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$checkSpecialCharacters$2.apply(ParquetTypes.scala:414) > at > org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$checkSpecialCharacters$2.apply(ParquetTypes.scala:412) > at scala.collection.immutable.List.foreach(List.scala:318) > at > org.apache.spark.sql.parquet.ParquetTypesConverter$.checkSpecialCharacters(ParquetTypes.scala:412) > at > org.apache.spark.sql.parquet.ParquetTypesConverter$.convertToString(ParquetTypes.scala:423) > at > org.apache.spark.sql.parquet.RowWriteSupport$.setSchema(ParquetTableSupport.scala:383) > at > org.apache.spark.sql.parquet.ParquetRelation2.prepareJobForWrite(newParquet.scala:230) > ... > java.lang.NullPointerException > at > org.apache.spark.sql.sources.BaseWriterContainer.abortJob(commands.scala:372) > at > org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.insert(commands.scala:137) > at > org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.run(commands.scala:114) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57) > ... > {noformat} > Note that the first {{RuntimeException}} is expected, while the following NPE > is not. > The reason of the NPE is that, {{BaseWriterContainer.driverSideSetup()}} > calls {{relation.prepareForWriteJob()}} AND initializes the > {{OutputCommitter}} used for the subsequent write job. However, if the former > throws an exception, the latter is not properly initialized, thus an NPE is > thrown when aborting the job because the {{OutputCommitter}} is still null. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org