[ https://issues.apache.org/jira/browse/SPARK-4521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheng Lian updated SPARK-4521: ------------------------------ Description: I think this is actually a bug in parquet, but it would be good to track it here as well. To reproduce: {code} jsonRDD(sparkContext.parallelize("""{"number of clusters": 1}"""::Nil)).saveAsParquetFile("test") parquetFile("test").collect() {code} {code} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 8.0 failed 1 times, most recent failure: Lost task 0.0 in stage 8.0 (TID 13, localhost): java.lang.IllegalArgumentException: field ended by ';': expected ';' but got 'of' at line 1: optional int32 number of at parquet.schema.MessageTypeParser.check(MessageTypeParser.java:209) at parquet.schema.MessageTypeParser.addPrimitiveType(MessageTypeParser.java:182) at parquet.schema.MessageTypeParser.addType(MessageTypeParser.java:108) at parquet.schema.MessageTypeParser.addGroupTypeFields(MessageTypeParser.java:96) at parquet.schema.MessageTypeParser.parse(MessageTypeParser.java:89) at parquet.schema.MessageTypeParser.parseMessageType(MessageTypeParser.java:79) at parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:189) at parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:138) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:135) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:107) {code} was: I think this is actually a bug in parquet, but it would be good to track it here as well. To reproduce: {code} jsonRDD(sparkContext.parallelize("""{"number of clusters": 1}"""::Nil)).saveAsParquetFile("test") parquetFile("test").collect() {code} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 8.0 failed 1 times, most recent failure: Lost task 0.0 in stage 8.0 (TID 13, localhost): java.lang.IllegalArgumentException: field ended by ';': expected ';' but got 'of' at line 1: optional int32 number of at parquet.schema.MessageTypeParser.check(MessageTypeParser.java:209) at parquet.schema.MessageTypeParser.addPrimitiveType(MessageTypeParser.java:182) at parquet.schema.MessageTypeParser.addType(MessageTypeParser.java:108) at parquet.schema.MessageTypeParser.addGroupTypeFields(MessageTypeParser.java:96) at parquet.schema.MessageTypeParser.parse(MessageTypeParser.java:89) at parquet.schema.MessageTypeParser.parseMessageType(MessageTypeParser.java:79) at parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:189) at parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:138) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:135) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:107) {code} > Parquet fails to read columns with spaces in the name > ----------------------------------------------------- > > Key: SPARK-4521 > URL: https://issues.apache.org/jira/browse/SPARK-4521 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.2.0 > Reporter: Michael Armbrust > > I think this is actually a bug in parquet, but it would be good to track it > here as well. To reproduce: > {code} > jsonRDD(sparkContext.parallelize("""{"number of clusters": > 1}"""::Nil)).saveAsParquetFile("test") > parquetFile("test").collect() > {code} > {code} > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 8.0 failed 1 times, most recent failure: Lost task 0.0 in stage 8.0 > (TID 13, localhost): java.lang.IllegalArgumentException: field ended by ';': > expected ';' but got 'of' at line 1: optional int32 number of > at parquet.schema.MessageTypeParser.check(MessageTypeParser.java:209) > at > parquet.schema.MessageTypeParser.addPrimitiveType(MessageTypeParser.java:182) > at parquet.schema.MessageTypeParser.addType(MessageTypeParser.java:108) > at > parquet.schema.MessageTypeParser.addGroupTypeFields(MessageTypeParser.java:96) > at parquet.schema.MessageTypeParser.parse(MessageTypeParser.java:89) > at > parquet.schema.MessageTypeParser.parseMessageType(MessageTypeParser.java:79) > at > parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:189) > at > parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:138) > at > org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:135) > at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:107) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org