Thanks for the detailed information!
Now I can confirm that this is a backwards-compatibility issue. The data
written by parquet 1.6rc7 follows the standard LIST structure. However,
Spark SQL still uses old parquet-avro style two-level structures, which
causes the problem.
Cheng
On 4/27/15
FYI,
Parquet schema output:
message pig_schema {
optional binary cust_id (UTF8);
optional int32 part_num;
optional group ip_list (LIST) {
repeated group ip_t {
optional binary ip (UTF8);
}
}
optional group vid_list (LIST) {
repeated group vid_t {
optional binary
Hi Huai,
I'm using Spark 1.3.1.
You're right. The dataset is not generated by Spark. It's generated by Pig
using Parquet 1.6.0rc7 jars.
Let me see if I can send a testing dataset to you...
Jianshi
On Sat, Apr 25, 2015 at 2:22 AM, Yin Huai yh...@databricks.com wrote:
oh, I missed that. It
Had an offline discussion with Jianshi, the dataset was generated by Pig.
Jianshi - Could you please attach the output of parquet-schema
path-to-parquet-file? I guess this is a Parquet format
backwards-compatibility issue. Parquet hadn't standardized
representation of LIST and MAP until
Had an offline discussion with Jianshi, the dataset was generated by Pig.
Jianshi - Could you please attach the output of parquet-schema
path-to-parquet-file? I guess this is a Parquet format
backwards-compatibility issue. Parquet hadn't standardized
representation of LIST and MAP until
Hi,
My data looks like this:
+---++--+
| col_name | data_type | comment |
+---++--+
| cust_id | string | |
| part_num | int|
Yin:
Fix Version of SPARK-4520 is not set.
I assume it was fixed in 1.3.0
Cheers
Fix Version
On Fri, Apr 24, 2015 at 11:00 AM, Yin Huai yh...@databricks.com wrote:
The exception looks like the one mentioned in
https://issues.apache.org/jira/browse/SPARK-4520. What is the version of
Spark?
oh, I missed that. It is fixed in 1.3.0.
Also, Jianshi, the dataset was not generated by Spark SQL, right?
On Fri, Apr 24, 2015 at 11:09 AM, Ted Yu yuzhih...@gmail.com wrote:
Yin:
Fix Version of SPARK-4520 is not set.
I assume it was fixed in 1.3.0
Cheers
Fix Version
On Fri, Apr 24,
The exception looks like the one mentioned in
https://issues.apache.org/jira/browse/SPARK-4520. What is the version of
Spark?
On Fri, Apr 24, 2015 at 2:40 AM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
My data looks like this:
+---++--+
|