[ 
https://issues.apache.org/jira/browse/IMPALA-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568796#comment-16568796
 ] 

ASF subversion and git services commented on IMPALA-5542:
---------------------------------------------------------

Commit 7917eac0ad52fbfa4f6e95046986950ea04af676 in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=7917eac ]

IMPALA-5542: Impala cannot scan Parquet decimal stored as int64_t/int32_t

The Decimal type in Parquet is a logical type. That means
the Parquet file stores some physical/primitive type that
is annotated by the DECIMAL tag to make it behave like
decimals.

The allowed physical types for decimals are INT32, INT64,
FIXED, and BINARY. Before this commit Impala could only
read decimals stored as FIXED or BINARY.

Spark decided to write decimals as INT32 or INT64 when
their precision allows it:
(1 <= precision <= 9) ==> INT32
(10 <= precision <= 18) ==> INT64

I updated our column readers to accept INT32 and INT64
as valid physical types for decimals.

Testing:
* extended parquet-plain-test.cc
* added Parquet files generated by Spark 2.3.1
  and updated test_scanners.py

Change-Id: Ib8c41bfc7c1664bdba5099d3893dc8dbe4304794
Reviewed-on: http://gerrit.cloudera.org:8080/11000
Reviewed-by: Zoltan Borok-Nagy <borokna...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Impala cannot scan Parquet decimal stored as int64_t/int32_t
> ------------------------------------------------------------
>
>                 Key: IMPALA-5542
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5542
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>    Affects Versions: Impala 2.10.0
>            Reporter: Tim Armstrong
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: parquet
>
> This is supported according to the Parquet spec 
> (https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#decimal)
>  but wasn't widely used. For some reason Spark decided to start writing this 
> out as the default (see SPARK-20297) so we will likely start seeing this at 
> some point.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to