[jira] [Comment Edited] (SPARK-20297) Parquet Decimal(12,2) written by Spark is unreadable by Hive and Impala

Hyukjin Kwon (JIRA) Tue, 11 Apr 2017 19:21:54 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-20297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965265#comment-15965265
 ]


Hyukjin Kwon edited comment on SPARK-20297 at 4/12/17 2:21 AM:
---------------------------------------------------------------

Thank you so much for trying out [~mmokhtar]. Do you maybe think this JIRA is 
resolvable maybe?


was (Author: hyukjin.kwon):
Thank you so much for trying out [~mmokhtar]. Do you maybe think this JIRA is 
resolvable maybe?

Up to my knowledge, this option means to follow Parquet's specification rather 
than the current way used by Spark. So, if other implementation follows 
Parquet's specification, I guess this is the correct option for compatibility.

> Parquet Decimal(12,2) written by Spark is unreadable by Hive and Impala
> -----------------------------------------------------------------------
>
>                 Key: SPARK-20297
>                 URL: https://issues.apache.org/jira/browse/SPARK-20297
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Mostafa Mokhtar
>              Labels: integration
>
> While trying to load some data using Spark 2.1 I realized that decimal(12,2) 
> columns stored in Parquet written by Spark are not readable by Hive or Impala.
> Repro 
> {code}
> CREATE TABLE customer_acctbal(
>   c_acctbal decimal(12,2))
> STORED AS Parquet;
> insert into customer_acctbal values (7539.95);
> {code}
> Error from Hive
> {code}
> Failed with exception 
> java.io.IOException:parquet.io.ParquetDecodingException: Can not read value 
> at 1 in block 0 in file 
> hdfs://server1:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal/part-00000-03d6e3bb-fe5e-4f20-87a4-88dec955dfcd.snappy.parquet
> Time taken: 0.122 seconds
> {code}
> Error from Impala
> {code}
> File 
> 'hdfs://server:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal/part-00000-32db4c61-fe67-4be2-9c16-b55c75c517a4.snappy.parquet'
>  has an incompatible Parquet schema for column 
> 'tpch_nested_3000_parquet.customer_acctbal.c_acctbal'. Column type: 
> DECIMAL(12,2), Parquet schema:
> optional int64 c_acctbal [i:0 d:1 r:0] (1 of 2 similar)
> {code}
> Table info 
> {code}
> hive> describe formatted customer_acctbal;
> OK
> # col_name              data_type               comment
> c_acctbal               decimal(12,2)
> # Detailed Table Information
> Database:               tpch_nested_3000_parquet
> Owner:                  mmokhtar
> CreateTime:             Mon Apr 10 17:47:24 PDT 2017
> LastAccessTime:         UNKNOWN
> Protect Mode:           None
> Retention:              0
> Location:               
> hdfs://server1.com:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal
> Table Type:             MANAGED_TABLE
> Table Parameters:
>         COLUMN_STATS_ACCURATE   true
>         numFiles                1
>         numRows                 0
>         rawDataSize             0
>         totalSize               120
>         transient_lastDdlTime   1491871644
> # Storage Information
> SerDe Library:          
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
> InputFormat:            
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> OutputFormat:           
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
> Compressed:             No
> Num Buckets:            -1
> Bucket Columns:         []
> Sort Columns:           []
> Storage Desc Params:
>         serialization.format    1
> Time taken: 0.032 seconds, Fetched: 31 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-20297) Parquet Decimal(12,2) written by Spark is unreadable by Hive and Impala

Reply via email to