Thanks! On Sat, Mar 14, 2015 at 3:31 AM, Michael Armbrust <mich...@databricks.com> wrote:
> Here is the JIRA: https://issues.apache.org/jira/browse/SPARK-6315 > > On Thu, Mar 12, 2015 at 11:00 PM, Michael Armbrust <mich...@databricks.com > > > wrote: > > > We are looking at the issue and will likely fix it for Spark 1.3.1. > > > > On Thu, Mar 12, 2015 at 8:25 PM, giive chen <thegi...@gmail.com> wrote: > > > >> Hi all > >> > >> My team has the same issue. It looks like Spark 1.3's sparkSQL cannot > read > >> parquet file generated by Spark 1.1. It will cost a lot of migration > work > >> when we wanna to upgrade Spark 1.3. > >> > >> Is there anyone can help me? > >> > >> > >> Thanks > >> > >> Wisely Chen > >> > >> > >> On Tue, Mar 10, 2015 at 5:06 PM, Pei-Lun Lee <pl...@appier.com> wrote: > >> > >> > Hi, > >> > > >> > I found that if I try to read parquet file generated by spark 1.1.1 > >> using > >> > 1.3.0-rc3 by default settings, I got this error: > >> > > >> > com.fasterxml.jackson.core.JsonParseException: Unrecognized token > >> > 'StructType': was expecting ('true', 'false' or 'null') > >> > at [Source: StructType(List(StructField(a,IntegerType,false))); line: > >> 1, > >> > column: 11] > >> > at > >> > > >> > com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1419) > >> > at > >> > > >> > > >> > com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:508) > >> > at > >> > > >> > > >> > com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:2300) > >> > at > >> > > >> > > >> > com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:1459) > >> > at > >> > > >> > > >> > com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:683) > >> > at > >> > > >> > > >> > com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3105) > >> > at > >> > > >> > > >> > com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3051) > >> > at > >> > > >> > > >> > com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2161) > >> > at > >> org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:19) > >> > at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:44) > >> > at > >> > org.apache.spark.sql.types.DataType$.fromJson(dataTypes.scala:41) > >> > at > >> > > >> > > >> > org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675) > >> > at > >> > > >> > > >> > org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675) > >> > > >> > > >> > > >> > this is how I save parquet file with 1.1.1: > >> > > >> > sql("select 1 as a").saveAsParquetFile("/tmp/foo") > >> > > >> > > >> > > >> > and this is the meta data of the 1.1.1 parquet file: > >> > > >> > creator: parquet-mr version 1.4.3 > >> > extra: org.apache.spark.sql.parquet.row.metadata = > >> > StructType(List(StructField(a,IntegerType,false))) > >> > > >> > > >> > > >> > by comparison, this is 1.3.0 meta: > >> > > >> > creator: parquet-mr version 1.6.0rc3 > >> > extra: org.apache.spark.sql.parquet.row.metadata = > >> > {"type":"struct","fields":[{"name":"a","type":"integer","nullable":t > >> > [more]... > >> > > >> > > >> > > >> > It looks like now ParquetRelation2 is used to load parquet file by > >> default > >> > and it only recognizes JSON format schema but 1.1.1 schema was case > >> class > >> > string format. > >> > > >> > Setting spark.sql.parquet.useDataSourceApi to false will fix it, but I > >> > don't know the differences. > >> > Is this considered a bug? We have a lot of parquet files from 1.1.1, > >> should > >> > we disable data source api in order to read them if we want to upgrade > >> to > >> > 1.3? > >> > > >> > Thanks, > >> > -- > >> > Pei-Lun > >> > > >> > > > > >