[jira] [Updated] (SPARK-20297) Parquet Decimal(12,2) written by Spark is unreadable by Hive and Impala

2017-04-11 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-20297:
-
Priority: Major  (was: Critical)

> Parquet Decimal(12,2) written by Spark is unreadable by Hive and Impala
> ---
>
> Key: SPARK-20297
> URL: https://issues.apache.org/jira/browse/SPARK-20297
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Mostafa Mokhtar
>  Labels: integration
>
> While trying to load some data using Spark 2.1 I realized that decimal(12,2) 
> columns stored in Parquet written by Spark are not readable by Hive or Impala.
> Repro 
> {code}
> CREATE TABLE customer_acctbal(
>   c_acctbal decimal(12,2))
> STORED AS Parquet;
> insert into customer_acctbal values (7539.95);
> {code}
> Error from Hive
> {code}
> Failed with exception 
> java.io.IOException:parquet.io.ParquetDecodingException: Can not read value 
> at 1 in block 0 in file 
> hdfs://server1:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal/part-0-03d6e3bb-fe5e-4f20-87a4-88dec955dfcd.snappy.parquet
> Time taken: 0.122 seconds
> {code}
> Error from Impala
> {code}
> File 
> 'hdfs://server:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal/part-0-32db4c61-fe67-4be2-9c16-b55c75c517a4.snappy.parquet'
>  has an incompatible Parquet schema for column 
> 'tpch_nested_3000_parquet.customer_acctbal.c_acctbal'. Column type: 
> DECIMAL(12,2), Parquet schema:
> optional int64 c_acctbal [i:0 d:1 r:0] (1 of 2 similar)
> {code}
> Table info 
> {code}
> hive> describe formatted customer_acctbal;
> OK
> # col_name  data_type   comment
> c_acctbal   decimal(12,2)
> # Detailed Table Information
> Database:   tpch_nested_3000_parquet
> Owner:  mmokhtar
> CreateTime: Mon Apr 10 17:47:24 PDT 2017
> LastAccessTime: UNKNOWN
> Protect Mode:   None
> Retention:  0
> Location:   
> hdfs://server1.com:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal
> Table Type: MANAGED_TABLE
> Table Parameters:
> COLUMN_STATS_ACCURATE   true
> numFiles1
> numRows 0
> rawDataSize 0
> totalSize   120
> transient_lastDdlTime   1491871644
> # Storage Information
> SerDe Library:  
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
> InputFormat:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> OutputFormat:   
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
> Compressed: No
> Num Buckets:-1
> Bucket Columns: []
> Sort Columns:   []
> Storage Desc Params:
> serialization.format1
> Time taken: 0.032 seconds, Fetched: 31 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20297) Parquet Decimal(12,2) written by Spark is unreadable by Hive and Impala

2017-04-11 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-20297:
-
Component/s: (was: Spark Core)
 SQL

> Parquet Decimal(12,2) written by Spark is unreadable by Hive and Impala
> ---
>
> Key: SPARK-20297
> URL: https://issues.apache.org/jira/browse/SPARK-20297
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Mostafa Mokhtar
>Priority: Critical
>  Labels: integration
>
> While trying to load some data using Spark 2.1 I realized that decimal(12,2) 
> columns stored in Parquet written by Spark are not readable by Hive or Impala.
> Repro 
> {code}
> CREATE TABLE customer_acctbal(
>   c_acctbal decimal(12,2))
> STORED AS Parquet;
> insert into customer_acctbal values (7539.95);
> {code}
> Error from Hive
> {code}
> Failed with exception 
> java.io.IOException:parquet.io.ParquetDecodingException: Can not read value 
> at 1 in block 0 in file 
> hdfs://server1:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal/part-0-03d6e3bb-fe5e-4f20-87a4-88dec955dfcd.snappy.parquet
> Time taken: 0.122 seconds
> {code}
> Error from Impala
> {code}
> File 
> 'hdfs://server:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal/part-0-32db4c61-fe67-4be2-9c16-b55c75c517a4.snappy.parquet'
>  has an incompatible Parquet schema for column 
> 'tpch_nested_3000_parquet.customer_acctbal.c_acctbal'. Column type: 
> DECIMAL(12,2), Parquet schema:
> optional int64 c_acctbal [i:0 d:1 r:0] (1 of 2 similar)
> {code}
> Table info 
> {code}
> hive> describe formatted customer_acctbal;
> OK
> # col_name  data_type   comment
> c_acctbal   decimal(12,2)
> # Detailed Table Information
> Database:   tpch_nested_3000_parquet
> Owner:  mmokhtar
> CreateTime: Mon Apr 10 17:47:24 PDT 2017
> LastAccessTime: UNKNOWN
> Protect Mode:   None
> Retention:  0
> Location:   
> hdfs://server1.com:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal
> Table Type: MANAGED_TABLE
> Table Parameters:
> COLUMN_STATS_ACCURATE   true
> numFiles1
> numRows 0
> rawDataSize 0
> totalSize   120
> transient_lastDdlTime   1491871644
> # Storage Information
> SerDe Library:  
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
> InputFormat:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> OutputFormat:   
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
> Compressed: No
> Num Buckets:-1
> Bucket Columns: []
> Sort Columns:   []
> Storage Desc Params:
> serialization.format1
> Time taken: 0.032 seconds, Fetched: 31 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org