We are looking at the issue and will likely fix it for Spark 1.3.1. On Thu, Mar 12, 2015 at 8:25 PM, giive chen <thegi...@gmail.com> wrote:
> Hi all > > My team has the same issue. It looks like Spark 1.3's sparkSQL cannot read > parquet file generated by Spark 1.1. It will cost a lot of migration work > when we wanna to upgrade Spark 1.3. > > Is there anyone can help me? > > > Thanks > > Wisely Chen > > > On Tue, Mar 10, 2015 at 5:06 PM, Pei-Lun Lee <pl...@appier.com> wrote: > > > Hi, > > > > I found that if I try to read parquet file generated by spark 1.1.1 using > > 1.3.0-rc3 by default settings, I got this error: > > > > com.fasterxml.jackson.core.JsonParseException: Unrecognized token > > 'StructType': was expecting ('true', 'false' or 'null') > > at [Source: StructType(List(StructField(a,IntegerType,false))); line: 1, > > column: 11] > > at > > > com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1419) > > at > > > > > com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:508) > > at > > > > > com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:2300) > > at > > > > > com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:1459) > > at > > > > > com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:683) > > at > > > > > com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3105) > > at > > > > > com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3051) > > at > > > > > com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2161) > > at > org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:19) > > at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:44) > > at > > org.apache.spark.sql.types.DataType$.fromJson(dataTypes.scala:41) > > at > > > > > org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675) > > at > > > > > org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675) > > > > > > > > this is how I save parquet file with 1.1.1: > > > > sql("select 1 as a").saveAsParquetFile("/tmp/foo") > > > > > > > > and this is the meta data of the 1.1.1 parquet file: > > > > creator: parquet-mr version 1.4.3 > > extra: org.apache.spark.sql.parquet.row.metadata = > > StructType(List(StructField(a,IntegerType,false))) > > > > > > > > by comparison, this is 1.3.0 meta: > > > > creator: parquet-mr version 1.6.0rc3 > > extra: org.apache.spark.sql.parquet.row.metadata = > > {"type":"struct","fields":[{"name":"a","type":"integer","nullable":t > > [more]... > > > > > > > > It looks like now ParquetRelation2 is used to load parquet file by > default > > and it only recognizes JSON format schema but 1.1.1 schema was case class > > string format. > > > > Setting spark.sql.parquet.useDataSourceApi to false will fix it, but I > > don't know the differences. > > Is this considered a bug? We have a lot of parquet files from 1.1.1, > should > > we disable data source api in order to read them if we want to upgrade to > > 1.3? > > > > Thanks, > > -- > > Pei-Lun > > >