[ https://issues.apache.org/jira/browse/IMPALA-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380586#comment-17380586 ]
ASF subversion and git services commented on IMPALA-7087: --------------------------------------------------------- Commit 9d46255739f94c53d686f670ee7b5981db59c148 in impala's branch refs/heads/master from Zoltan Borok-Nagy [ https://gitbox.apache.org/repos/asf?p=impala.git;h=9d46255 ] IMPALA-7087, IMPALA-8131: Read decimals from Parquet files with different precision/scale IMPALA-7087 is about reading Parquet decimal columns with lower precision/scale than table metadata. IMPALA-8131 is about reading Parquet decimal columns with higher scale than table metadata. Both are resolved by this patch. It reuses some parts from an earlier change request from Sahil Takiar: https://gerrit.cloudera.org/#/c/12163/ A new utility class has been introduced, ParquetDataConverter which does the data conversion. It also helps to decide whether data conversion is needed or not. NULL values are returned in case of overflows. This behavior is consistent with Hive. Parquet column stats reader is also updated to convert the decimal values. The stats reader is used to evaluate min/max conjuncts. It works well because later we also evaluate the conjuncts on the converted values anyway. The status of different filterings: * dictionary filtering: disabled for columns that need conversion * runtime bloom filters: work on the converted values * runtime min/max filters: work on the converted values This patch also enables schema evolution of decimal columns of Iceberg tables. Testing: * added e2e tests Change-Id: Icefa7e545ca9f7df1741a2d1225375ecf54434da Reviewed-on: http://gerrit.cloudera.org:8080/17678 Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Reviewed-by: Csaba Ringhofer <csringho...@cloudera.com> > Impala is unable to read Parquet decimal columns with lower precision/scale > than table metadata > ----------------------------------------------------------------------------------------------- > > Key: IMPALA-7087 > URL: https://issues.apache.org/jira/browse/IMPALA-7087 > Project: IMPALA > Issue Type: Sub-task > Components: Backend > Reporter: Tim Armstrong > Assignee: Zoltán Borók-Nagy > Priority: Major > Labels: decimal, parquet, ramp-up > Attachments: binary_decimal_precision_and_scale_widening.parquet > > > This is similar to IMPALA-2515, except relates to a different precision/scale > in the file metadata rather than just a mismatch in the bytes used to store > the data. In a lot of cases we should be able to convert the decimal type on > the fly to the higher-precision type. > {noformat} > ERROR: File '/hdfs/path/000000_0_x_2' column 'alterd_decimal' has an invalid > type length. Expecting: 11 len in file: 8 > {noformat} > It would be convenient to allow reading parquet files where the > precision/scale in the file can be converted to the precision/scale in the > table metadata without loss of precision. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org