subject:"\"\\\[jira\\\] \\\[Comment Edited\\\] \\\(SPARK\\\-20297\\\) Parquet Decimal\\\(12,2\\\) written by Spark is unreadable by Hive and Impala\""

[jira] [Comment Edited] (SPARK-20297) Parquet Decimal(12,2) written by Spark is unreadable by Hive and Impala

2017-04-11 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965273#comment-15965273
 ] 

Hyukjin Kwon edited comment on SPARK-20297 at 4/12/17 2:24 AM:
---

Let me leave some pointers about related PRs - 
https://github.com/apache/spark/pull/8566 and 
https://github.com/apache/spark/pull/6617. cc [~lian cheng]]


was (Author: hyukjin.kwon):
Let me leave some pointers about related PRs - 
https://github.com/apache/spark/pull/8566 and 
https://github.com/apache/spark/pull/6617. cc [~liancheng]

> Parquet Decimal(12,2) written by Spark is unreadable by Hive and Impala
> ---
>
> Key: SPARK-20297
> URL: https://issues.apache.org/jira/browse/SPARK-20297
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Mostafa Mokhtar
>  Labels: integration
>
> While trying to load some data using Spark 2.1 I realized that decimal(12,2) 
> columns stored in Parquet written by Spark are not readable by Hive or Impala.
> Repro 
> {code}
> CREATE TABLE customer_acctbal(
>   c_acctbal decimal(12,2))
> STORED AS Parquet;
> insert into customer_acctbal values (7539.95);
> {code}
> Error from Hive
> {code}
> Failed with exception 
> java.io.IOException:parquet.io.ParquetDecodingException: Can not read value 
> at 1 in block 0 in file 
> hdfs://server1:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal/part-0-03d6e3bb-fe5e-4f20-87a4-88dec955dfcd.snappy.parquet
> Time taken: 0.122 seconds
> {code}
> Error from Impala
> {code}
> File 
> 'hdfs://server:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal/part-0-32db4c61-fe67-4be2-9c16-b55c75c517a4.snappy.parquet'
>  has an incompatible Parquet schema for column 
> 'tpch_nested_3000_parquet.customer_acctbal.c_acctbal'. Column type: 
> DECIMAL(12,2), Parquet schema:
> optional int64 c_acctbal [i:0 d:1 r:0] (1 of 2 similar)
> {code}
> Table info 
> {code}
> hive> describe formatted customer_acctbal;
> OK
> # col_name  data_type   comment
> c_acctbal   decimal(12,2)
> # Detailed Table Information
> Database:   tpch_nested_3000_parquet
> Owner:  mmokhtar
> CreateTime: Mon Apr 10 17:47:24 PDT 2017
> LastAccessTime: UNKNOWN
> Protect Mode:   None
> Retention:  0
> Location:   
> hdfs://server1.com:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal
> Table Type: MANAGED_TABLE
> Table Parameters:
> COLUMN_STATS_ACCURATE   true
> numFiles1
> numRows 0
> rawDataSize 0
> totalSize   120
> transient_lastDdlTime   1491871644
> # Storage Information
> SerDe Library:  
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
> InputFormat:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> OutputFormat:   
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
> Compressed: No
> Num Buckets:-1
> Bucket Columns: []
> Sort Columns:   []
> Storage Desc Params:
> serialization.format1
> Time taken: 0.032 seconds, Fetched: 31 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-20297) Parquet Decimal(12,2) written by Spark is unreadable by Hive and Impala

2017-04-11 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965273#comment-15965273
 ] 

Hyukjin Kwon edited comment on SPARK-20297 at 4/12/17 2:23 AM:
---

Let me leave some pointers about related PRs - 
https://github.com/apache/spark/pull/8566 and 
https://github.com/apache/spark/pull/6617. cc [~liancheng]


was (Author: hyukjin.kwon):
Let me leave some pointers about related PRs - 
https://github.com/apache/spark/pull/8566 and 
https://github.com/apache/spark/pull/8566. cc [~liancheng]

> Parquet Decimal(12,2) written by Spark is unreadable by Hive and Impala
> ---
>
> Key: SPARK-20297
> URL: https://issues.apache.org/jira/browse/SPARK-20297
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Mostafa Mokhtar
>  Labels: integration
>
> While trying to load some data using Spark 2.1 I realized that decimal(12,2) 
> columns stored in Parquet written by Spark are not readable by Hive or Impala.
> Repro 
> {code}
> CREATE TABLE customer_acctbal(
>   c_acctbal decimal(12,2))
> STORED AS Parquet;
> insert into customer_acctbal values (7539.95);
> {code}
> Error from Hive
> {code}
> Failed with exception 
> java.io.IOException:parquet.io.ParquetDecodingException: Can not read value 
> at 1 in block 0 in file 
> hdfs://server1:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal/part-0-03d6e3bb-fe5e-4f20-87a4-88dec955dfcd.snappy.parquet
> Time taken: 0.122 seconds
> {code}
> Error from Impala
> {code}
> File 
> 'hdfs://server:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal/part-0-32db4c61-fe67-4be2-9c16-b55c75c517a4.snappy.parquet'
>  has an incompatible Parquet schema for column 
> 'tpch_nested_3000_parquet.customer_acctbal.c_acctbal'. Column type: 
> DECIMAL(12,2), Parquet schema:
> optional int64 c_acctbal [i:0 d:1 r:0] (1 of 2 similar)
> {code}
> Table info 
> {code}
> hive> describe formatted customer_acctbal;
> OK
> # col_name  data_type   comment
> c_acctbal   decimal(12,2)
> # Detailed Table Information
> Database:   tpch_nested_3000_parquet
> Owner:  mmokhtar
> CreateTime: Mon Apr 10 17:47:24 PDT 2017
> LastAccessTime: UNKNOWN
> Protect Mode:   None
> Retention:  0
> Location:   
> hdfs://server1.com:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal
> Table Type: MANAGED_TABLE
> Table Parameters:
> COLUMN_STATS_ACCURATE   true
> numFiles1
> numRows 0
> rawDataSize 0
> totalSize   120
> transient_lastDdlTime   1491871644
> # Storage Information
> SerDe Library:  
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
> InputFormat:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> OutputFormat:   
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
> Compressed: No
> Num Buckets:-1
> Bucket Columns: []
> Sort Columns:   []
> Storage Desc Params:
> serialization.format1
> Time taken: 0.032 seconds, Fetched: 31 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-20297) Parquet Decimal(12,2) written by Spark is unreadable by Hive and Impala

2017-04-11 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965265#comment-15965265
 ] 

Hyukjin Kwon edited comment on SPARK-20297 at 4/12/17 2:21 AM:
---

Thank you so much for trying out [~mmokhtar]. Do you maybe think this JIRA is 
resolvable maybe?


was (Author: hyukjin.kwon):
Thank you so much for trying out [~mmokhtar]. Do you maybe think this JIRA is 
resolvable maybe?

Up to my knowledge, this option means to follow Parquet's specification rather 
than the current way used by Spark. So, if other implementation follows 
Parquet's specification, I guess this is the correct option for compatibility.

> Parquet Decimal(12,2) written by Spark is unreadable by Hive and Impala
> ---
>
> Key: SPARK-20297
> URL: https://issues.apache.org/jira/browse/SPARK-20297
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Mostafa Mokhtar
>  Labels: integration
>
> While trying to load some data using Spark 2.1 I realized that decimal(12,2) 
> columns stored in Parquet written by Spark are not readable by Hive or Impala.
> Repro 
> {code}
> CREATE TABLE customer_acctbal(
>   c_acctbal decimal(12,2))
> STORED AS Parquet;
> insert into customer_acctbal values (7539.95);
> {code}
> Error from Hive
> {code}
> Failed with exception 
> java.io.IOException:parquet.io.ParquetDecodingException: Can not read value 
> at 1 in block 0 in file 
> hdfs://server1:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal/part-0-03d6e3bb-fe5e-4f20-87a4-88dec955dfcd.snappy.parquet
> Time taken: 0.122 seconds
> {code}
> Error from Impala
> {code}
> File 
> 'hdfs://server:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal/part-0-32db4c61-fe67-4be2-9c16-b55c75c517a4.snappy.parquet'
>  has an incompatible Parquet schema for column 
> 'tpch_nested_3000_parquet.customer_acctbal.c_acctbal'. Column type: 
> DECIMAL(12,2), Parquet schema:
> optional int64 c_acctbal [i:0 d:1 r:0] (1 of 2 similar)
> {code}
> Table info 
> {code}
> hive> describe formatted customer_acctbal;
> OK
> # col_name  data_type   comment
> c_acctbal   decimal(12,2)
> # Detailed Table Information
> Database:   tpch_nested_3000_parquet
> Owner:  mmokhtar
> CreateTime: Mon Apr 10 17:47:24 PDT 2017
> LastAccessTime: UNKNOWN
> Protect Mode:   None
> Retention:  0
> Location:   
> hdfs://server1.com:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal
> Table Type: MANAGED_TABLE
> Table Parameters:
> COLUMN_STATS_ACCURATE   true
> numFiles1
> numRows 0
> rawDataSize 0
> totalSize   120
> transient_lastDdlTime   1491871644
> # Storage Information
> SerDe Library:  
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
> InputFormat:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> OutputFormat:   
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
> Compressed: No
> Num Buckets:-1
> Bucket Columns: []
> Sort Columns:   []
> Storage Desc Params:
> serialization.format1
> Time taken: 0.032 seconds, Fetched: 31 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-20297) Parquet Decimal(12,2) written by Spark is unreadable by Hive and Impala

[jira] [Comment Edited] (SPARK-20297) Parquet Decimal(12,2) written by Spark is unreadable by Hive and Impala

[jira] [Comment Edited] (SPARK-20297) Parquet Decimal(12,2) written by Spark is unreadable by Hive and Impala

3 matches

Site Navigation

Mail list logo

Footer information