Re: Parquet error reading data that contains array of structs

2015-04-29 Thread Cheng Lian
Thanks for the detailed information! Now I can confirm that this is a backwards-compatibility issue. The data written by parquet 1.6rc7 follows the standard LIST structure. However, Spark SQL still uses old parquet-avro style two-level structures, which causes the problem. Cheng On 4/27/15

Re: Parquet error reading data that contains array of structs

2015-04-27 Thread Jianshi Huang
FYI, Parquet schema output: message pig_schema { optional binary cust_id (UTF8); optional int32 part_num; optional group ip_list (LIST) { repeated group ip_t { optional binary ip (UTF8); } } optional group vid_list (LIST) { repeated group vid_t { optional binary

Re: Parquet error reading data that contains array of structs

2015-04-26 Thread Jianshi Huang
Hi Huai, I'm using Spark 1.3.1. You're right. The dataset is not generated by Spark. It's generated by Pig using Parquet 1.6.0rc7 jars. Let me see if I can send a testing dataset to you... Jianshi On Sat, Apr 25, 2015 at 2:22 AM, Yin Huai yh...@databricks.com wrote: oh, I missed that. It

Re: Parquet error reading data that contains array of structs

2015-04-26 Thread Cheng Lian
Had an offline discussion with Jianshi, the dataset was generated by Pig. Jianshi - Could you please attach the output of parquet-schema path-to-parquet-file? I guess this is a Parquet format backwards-compatibility issue. Parquet hadn't standardized representation of LIST and MAP until

Re: Parquet error reading data that contains array of structs

2015-04-26 Thread Cheng Lian
Had an offline discussion with Jianshi, the dataset was generated by Pig. Jianshi - Could you please attach the output of parquet-schema path-to-parquet-file? I guess this is a Parquet format backwards-compatibility issue. Parquet hadn't standardized representation of LIST and MAP until

Parquet error reading data that contains array of structs

2015-04-24 Thread Jianshi Huang
Hi, My data looks like this: +---++--+ | col_name | data_type | comment | +---++--+ | cust_id | string | | | part_num | int|

Re: Parquet error reading data that contains array of structs

2015-04-24 Thread Ted Yu
Yin: Fix Version of SPARK-4520 is not set. I assume it was fixed in 1.3.0 Cheers Fix Version On Fri, Apr 24, 2015 at 11:00 AM, Yin Huai yh...@databricks.com wrote: The exception looks like the one mentioned in https://issues.apache.org/jira/browse/SPARK-4520. What is the version of Spark?

Re: Parquet error reading data that contains array of structs

2015-04-24 Thread Yin Huai
oh, I missed that. It is fixed in 1.3.0. Also, Jianshi, the dataset was not generated by Spark SQL, right? On Fri, Apr 24, 2015 at 11:09 AM, Ted Yu yuzhih...@gmail.com wrote: Yin: Fix Version of SPARK-4520 is not set. I assume it was fixed in 1.3.0 Cheers Fix Version On Fri, Apr 24,

Re: Parquet error reading data that contains array of structs

2015-04-24 Thread Yin Huai
The exception looks like the one mentioned in https://issues.apache.org/jira/browse/SPARK-4520. What is the version of Spark? On Fri, Apr 24, 2015 at 2:40 AM, Jianshi Huang jianshi.hu...@gmail.com wrote: Hi, My data looks like this: +---++--+ |