Re: Parquet Array Support Broken?

2015-09-08 Thread Cheng Lian
Yeah, this is a typical Parquet interoperability issue due to unfortunate historical reasons. Hive (actually parquet-hive) gives the following schema for array: message m0 { optional group f (LIST) { repeated group bag { optional int32 array_element; } } } while Spark SQL gives

Re: Parquet Array Support Broken?

2015-09-07 Thread Alex Kozlov
Thank you - it works if the file is created in Spark On Mon, Sep 7, 2015 at 3:06 PM, Ruslan Dautkhanov wrote: > Read response from Cheng Lian on Aug/27th - it > looks the same problem. > > Workarounds > 1. write that parquet file in Spark; > 2.

Re: Parquet Array Support Broken?

2015-09-07 Thread Alex Kozlov
No, it was created in Hive by CTAS, but any help is appreciated... On Mon, Sep 7, 2015 at 2:51 PM, Ruslan Dautkhanov wrote: > That parquet table wasn't created in Spark, is it? > > There was a recent discussion on this list that complex data types in > Spark prior to 1.5

Re: Parquet Array Support Broken?

2015-09-07 Thread Ruslan Dautkhanov
Read response from Cheng Lian on Aug/27th - it looks the same problem. Workarounds 1. write that parquet file in Spark; 2. upgrade to Spark 1.5. -- Ruslan Dautkhanov On Mon, Sep 7, 2015 at 3:52 PM, Alex Kozlov wrote: > No, it was created in Hive by

Re: Parquet Array Support Broken?

2015-09-07 Thread Ruslan Dautkhanov
That parquet table wasn't created in Spark, is it? There was a recent discussion on this list that complex data types in Spark prior to 1.5 often incompatible with Hive for example, if I remember correctly. On Mon, Sep 7, 2015, 2:57 PM Alex Kozlov wrote: > I am trying to read

Re: Parquet Array Support Broken?

2015-09-07 Thread Alex Kozlov
The same error if I do: val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) val results = sqlContext.sql("SELECT * FROM stats") but it does work from Hive shell directly... On Mon, Sep 7, 2015 at 1:56 PM, Alex Kozlov wrote: > I am trying to read an (array typed)

Parquet Array Support Broken?

2015-09-07 Thread Alex Kozlov
I am trying to read an (array typed) parquet file in spark-shell (Spark 1.4.1 with Hadoop 2.6): {code} $ bin/spark-shell log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See