[ https://issues.apache.org/jira/browse/SPARK-20297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15975763#comment-15975763 ]
Hyukjin Kwon commented on SPARK-20297: -------------------------------------- Yea, then it looks both ways are still vaild whether it is newer or older. Spark supports both ways when writing out via the option I described above. It looks other systems provided here are not able to read int-based decimals that complies Parquet specification. If the point to leave this open is to keep this discussion, we could leave this resolved or open a JIRA in Parquet side. I am resolving this. Please reopen this if I misunderstood. > Parquet Decimal(12,2) written by Spark is unreadable by Hive and Impala > ----------------------------------------------------------------------- > > Key: SPARK-20297 > URL: https://issues.apache.org/jira/browse/SPARK-20297 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.0 > Reporter: Mostafa Mokhtar > Labels: integration > > While trying to load some data using Spark 2.1 I realized that decimal(12,2) > columns stored in Parquet written by Spark are not readable by Hive or Impala. > Repro > {code} > CREATE TABLE customer_acctbal( > c_acctbal decimal(12,2)) > STORED AS Parquet; > insert into customer_acctbal values (7539.95); > {code} > Error from Hive > {code} > Failed with exception > java.io.IOException:parquet.io.ParquetDecodingException: Can not read value > at 1 in block 0 in file > hdfs://server1:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal/part-00000-03d6e3bb-fe5e-4f20-87a4-88dec955dfcd.snappy.parquet > Time taken: 0.122 seconds > {code} > Error from Impala > {code} > File > 'hdfs://server:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal/part-00000-32db4c61-fe67-4be2-9c16-b55c75c517a4.snappy.parquet' > has an incompatible Parquet schema for column > 'tpch_nested_3000_parquet.customer_acctbal.c_acctbal'. Column type: > DECIMAL(12,2), Parquet schema: > optional int64 c_acctbal [i:0 d:1 r:0] (1 of 2 similar) > {code} > Table info > {code} > hive> describe formatted customer_acctbal; > OK > # col_name data_type comment > c_acctbal decimal(12,2) > # Detailed Table Information > Database: tpch_nested_3000_parquet > Owner: mmokhtar > CreateTime: Mon Apr 10 17:47:24 PDT 2017 > LastAccessTime: UNKNOWN > Protect Mode: None > Retention: 0 > Location: > hdfs://server1.com:8020/user/hive/warehouse/tpch_nested_3000_parquet.db/customer_acctbal > Table Type: MANAGED_TABLE > Table Parameters: > COLUMN_STATS_ACCURATE true > numFiles 1 > numRows 0 > rawDataSize 0 > totalSize 120 > transient_lastDdlTime 1491871644 > # Storage Information > SerDe Library: > org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe > InputFormat: > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat > Compressed: No > Num Buckets: -1 > Bucket Columns: [] > Sort Columns: [] > Storage Desc Params: > serialization.format 1 > Time taken: 0.032 seconds, Fetched: 31 row(s) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org