Hi all My team has the same issue. It looks like Spark 1.3's sparkSQL cannot read parquet file generated by Spark 1.1. It will cost a lot of migration work when we wanna to upgrade Spark 1.3.
Is there anyone can help me? Thanks Wisely Chen On Tue, Mar 10, 2015 at 5:06 PM, Pei-Lun Lee <pl...@appier.com> wrote: > Hi, > > I found that if I try to read parquet file generated by spark 1.1.1 using > 1.3.0-rc3 by default settings, I got this error: > > com.fasterxml.jackson.core.JsonParseException: Unrecognized token > 'StructType': was expecting ('true', 'false' or 'null') > at [Source: StructType(List(StructField(a,IntegerType,false))); line: 1, > column: 11] > at > com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1419) > at > > com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:508) > at > > com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:2300) > at > > com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:1459) > at > > com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:683) > at > > com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3105) > at > > com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3051) > at > > com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2161) > at org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:19) > at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:44) > at > org.apache.spark.sql.types.DataType$.fromJson(dataTypes.scala:41) > at > > org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675) > at > > org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675) > > > > this is how I save parquet file with 1.1.1: > > sql("select 1 as a").saveAsParquetFile("/tmp/foo") > > > > and this is the meta data of the 1.1.1 parquet file: > > creator: parquet-mr version 1.4.3 > extra: org.apache.spark.sql.parquet.row.metadata = > StructType(List(StructField(a,IntegerType,false))) > > > > by comparison, this is 1.3.0 meta: > > creator: parquet-mr version 1.6.0rc3 > extra: org.apache.spark.sql.parquet.row.metadata = > {"type":"struct","fields":[{"name":"a","type":"integer","nullable":t > [more]... > > > > It looks like now ParquetRelation2 is used to load parquet file by default > and it only recognizes JSON format schema but 1.1.1 schema was case class > string format. > > Setting spark.sql.parquet.useDataSourceApi to false will fix it, but I > don't know the differences. > Is this considered a bug? We have a lot of parquet files from 1.1.1, should > we disable data source api in order to read them if we want to upgrade to > 1.3? > > Thanks, > -- > Pei-Lun >