Re: SparkSQL 1.3.0 (RC3) failed to read parquet file generated by 1.1.1

2015-03-15 Thread Pei-Lun Lee
Thanks!

On Sat, Mar 14, 2015 at 3:31 AM, Michael Armbrust mich...@databricks.com
wrote:

 Here is the JIRA: https://issues.apache.org/jira/browse/SPARK-6315

 On Thu, Mar 12, 2015 at 11:00 PM, Michael Armbrust mich...@databricks.com
 
 wrote:

  We are looking at the issue and will likely fix it for Spark 1.3.1.
 
  On Thu, Mar 12, 2015 at 8:25 PM, giive chen thegi...@gmail.com wrote:
 
  Hi all
 
  My team has the same issue. It looks like Spark 1.3's sparkSQL cannot
 read
  parquet file generated by Spark 1.1. It will cost a lot of migration
 work
  when we wanna to upgrade Spark 1.3.
 
  Is there  anyone can help me?
 
 
  Thanks
 
  Wisely Chen
 
 
  On Tue, Mar 10, 2015 at 5:06 PM, Pei-Lun Lee pl...@appier.com wrote:
 
   Hi,
  
   I found that if I try to read parquet file generated by spark 1.1.1
  using
   1.3.0-rc3 by default settings, I got this error:
  
   com.fasterxml.jackson.core.JsonParseException: Unrecognized token
   'StructType': was expecting ('true', 'false' or 'null')
at [Source: StructType(List(StructField(a,IntegerType,false))); line:
  1,
   column: 11]
   at
  
 
 com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1419)
   at
  
  
 
 com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:508)
   at
  
  
 
 com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:2300)
   at
  
  
 
 com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:1459)
   at
  
  
 
 com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:683)
   at
  
  
 
 com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3105)
   at
  
  
 
 com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3051)
   at
  
  
 
 com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2161)
   at
  org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:19)
   at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:44)
   at
   org.apache.spark.sql.types.DataType$.fromJson(dataTypes.scala:41)
   at
  
  
 
 org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)
   at
  
  
 
 org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)
  
  
  
   this is how I save parquet file with 1.1.1:
  
   sql(select 1 as a).saveAsParquetFile(/tmp/foo)
  
  
  
   and this is the meta data of the 1.1.1 parquet file:
  
   creator: parquet-mr version 1.4.3
   extra:   org.apache.spark.sql.parquet.row.metadata =
   StructType(List(StructField(a,IntegerType,false)))
  
  
  
   by comparison, this is 1.3.0 meta:
  
   creator: parquet-mr version 1.6.0rc3
   extra:   org.apache.spark.sql.parquet.row.metadata =
   {type:struct,fields:[{name:a,type:integer,nullable:t
   [more]...
  
  
  
   It looks like now ParquetRelation2 is used to load parquet file by
  default
   and it only recognizes JSON format schema but 1.1.1 schema was case
  class
   string format.
  
   Setting spark.sql.parquet.useDataSourceApi to false will fix it, but I
   don't know the differences.
   Is this considered a bug? We have a lot of parquet files from 1.1.1,
  should
   we disable data source api in order to read them if we want to upgrade
  to
   1.3?
  
   Thanks,
   --
   Pei-Lun
  
 
 
 



Re: SparkSQL 1.3.0 (RC3) failed to read parquet file generated by 1.1.1

2015-03-12 Thread giive chen
Hi all

My team has the same issue. It looks like Spark 1.3's sparkSQL cannot read
parquet file generated by Spark 1.1. It will cost a lot of migration work
when we wanna to upgrade Spark 1.3.

Is there  anyone can help me?


Thanks

Wisely Chen


On Tue, Mar 10, 2015 at 5:06 PM, Pei-Lun Lee pl...@appier.com wrote:

 Hi,

 I found that if I try to read parquet file generated by spark 1.1.1 using
 1.3.0-rc3 by default settings, I got this error:

 com.fasterxml.jackson.core.JsonParseException: Unrecognized token
 'StructType': was expecting ('true', 'false' or 'null')
  at [Source: StructType(List(StructField(a,IntegerType,false))); line: 1,
 column: 11]
 at
 com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1419)
 at

 com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:508)
 at

 com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:2300)
 at

 com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:1459)
 at

 com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:683)
 at

 com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3105)
 at

 com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3051)
 at

 com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2161)
 at org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:19)
 at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:44)
 at
 org.apache.spark.sql.types.DataType$.fromJson(dataTypes.scala:41)
 at

 org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)
 at

 org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)



 this is how I save parquet file with 1.1.1:

 sql(select 1 as a).saveAsParquetFile(/tmp/foo)



 and this is the meta data of the 1.1.1 parquet file:

 creator: parquet-mr version 1.4.3
 extra:   org.apache.spark.sql.parquet.row.metadata =
 StructType(List(StructField(a,IntegerType,false)))



 by comparison, this is 1.3.0 meta:

 creator: parquet-mr version 1.6.0rc3
 extra:   org.apache.spark.sql.parquet.row.metadata =
 {type:struct,fields:[{name:a,type:integer,nullable:t
 [more]...



 It looks like now ParquetRelation2 is used to load parquet file by default
 and it only recognizes JSON format schema but 1.1.1 schema was case class
 string format.

 Setting spark.sql.parquet.useDataSourceApi to false will fix it, but I
 don't know the differences.
 Is this considered a bug? We have a lot of parquet files from 1.1.1, should
 we disable data source api in order to read them if we want to upgrade to
 1.3?

 Thanks,
 --
 Pei-Lun



SparkSQL 1.3.0 (RC3) failed to read parquet file generated by 1.1.1

2015-03-10 Thread Pei-Lun Lee
Hi,

I found that if I try to read parquet file generated by spark 1.1.1 using
1.3.0-rc3 by default settings, I got this error:

com.fasterxml.jackson.core.JsonParseException: Unrecognized token
'StructType': was expecting ('true', 'false' or 'null')
 at [Source: StructType(List(StructField(a,IntegerType,false))); line: 1,
column: 11]
at
com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1419)
at
com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:508)
at
com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:2300)
at
com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:1459)
at
com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:683)
at
com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3105)
at
com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3051)
at
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2161)
at org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:19)
at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:44)
at org.apache.spark.sql.types.DataType$.fromJson(dataTypes.scala:41)
at
org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)
at
org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)



this is how I save parquet file with 1.1.1:

sql(select 1 as a).saveAsParquetFile(/tmp/foo)



and this is the meta data of the 1.1.1 parquet file:

creator: parquet-mr version 1.4.3
extra:   org.apache.spark.sql.parquet.row.metadata =
StructType(List(StructField(a,IntegerType,false)))



by comparison, this is 1.3.0 meta:

creator: parquet-mr version 1.6.0rc3
extra:   org.apache.spark.sql.parquet.row.metadata =
{type:struct,fields:[{name:a,type:integer,nullable:t
[more]...



It looks like now ParquetRelation2 is used to load parquet file by default
and it only recognizes JSON format schema but 1.1.1 schema was case class
string format.

Setting spark.sql.parquet.useDataSourceApi to false will fix it, but I
don't know the differences.
Is this considered a bug? We have a lot of parquet files from 1.1.1, should
we disable data source api in order to read them if we want to upgrade to
1.3?

Thanks,
--
Pei-Lun