[jira] [Commented] (IMPALA-7087) Impala is unable to read Parquet decimal columns with lower precision/scale than table metadata

ASF subversion and git services (Jira) Wed, 14 Jul 2021 06:12:07 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380586#comment-17380586
 ]


ASF subversion and git services commented on IMPALA-7087:
---------------------------------------------------------

Commit 9d46255739f94c53d686f670ee7b5981db59c148 in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=9d46255 ]

IMPALA-7087, IMPALA-8131: Read decimals from Parquet files with different 
precision/scale

IMPALA-7087 is about reading Parquet decimal columns with lower
precision/scale than table metadata.
IMPALA-8131 is about reading Parquet decimal columns with higher
scale than table metadata.

Both are resolved by this patch. It reuses some parts from an
earlier change request from Sahil Takiar:
https://gerrit.cloudera.org/#/c/12163/

A new utility class has been introduced, ParquetDataConverter which does
the data conversion. It also helps to decide whether data conversion
is needed or not.

NULL values are returned in case of overflows. This behavior is
consistent with Hive.

Parquet column stats reader is also updated to convert the decimal
values. The stats reader is used to evaluate min/max conjuncts. It
works well because later we also evaluate the conjuncts on the
converted values anyway.

The status of different filterings:
 * dictionary filtering: disabled for columns that need conversion
 * runtime bloom filters: work on the converted values
 * runtime min/max filters: work on the converted values

This patch also enables schema evolution of decimal columns of Iceberg
tables.

Testing:
 * added e2e tests

Change-Id: Icefa7e545ca9f7df1741a2d1225375ecf54434da
Reviewed-on: http://gerrit.cloudera.org:8080/17678
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringho...@cloudera.com>


> Impala is unable to read Parquet decimal columns with lower precision/scale 
> than table metadata
> -----------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-7087
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7087
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>            Reporter: Tim Armstrong
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: decimal, parquet, ramp-up
>         Attachments: binary_decimal_precision_and_scale_widening.parquet
>
>
> This is similar to IMPALA-2515, except relates to a different precision/scale 
> in the file metadata rather than just a mismatch in the bytes used to store 
> the data. In a lot of cases we should be able to convert the decimal type on 
> the fly to the higher-precision type.
> {noformat}
> ERROR: File '/hdfs/path/000000_0_x_2' column 'alterd_decimal' has an invalid 
> type length. Expecting: 11 len in file: 8
> {noformat}
> It would be convenient to allow reading parquet files where the 
> precision/scale in the file can be converted to the precision/scale in the 
> table metadata without loss of precision.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7087) Impala is unable to read Parquet decimal columns with lower precision/scale than table metadata

Reply via email to