Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17678 )
Change subject: IMPALA-7087, IMPALA-8131: Read decimals from Parquet files with different precision/scale ...................................................................... Patch Set 2: Code-Review+1 (8 comments) lgtm, only have small comments http://gerrit.cloudera.org:8080/#/c/17678/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17678/2//COMMIT_MSG@25 PS2, Line 25: Parquet column stats reader is also updated to convert the decimal : values. Can you mention how the different filters handle this conversion? e.g.: dictionary filter: disabled for all column that need conversion bloom filter: not yet supported for Decimal min-max filter: supported, because the conversion preservers the ordering of values, so if min<=val<=max then round(min)<=round(val)<=round(max) will be also true + stats that are converted to NULL are ignored These may seem self-evident, but it took some time to convince myself that we handle all these cases correctly :) http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-stats.cc File be/src/exec/parquet/parquet-column-stats.cc: http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-stats.cc@24 PS2, Line 24: #include "exec/parquet/parquet-data-converter.h" : : #include "common/names.h" > The order is not in ascending order. common/names.h is usually an exception for this, I am not sure why http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-stats.cc@353 PS2, Line 353: // No need for an extra buffer, we can do the conversion in-place. Can you mention why we can handle this correctly? See my comment in the commit message. http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-data-converter.h File be/src/exec/parquet/parquet-data-converter.h: http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-data-converter.h@32 PS2, Line 32: ParquetDataConverter Thanks for moving this logic out for parquet-column-readers.cc! http://gerrit.cloudera.org:8080/#/c/17678/2/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test File testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test: http://gerrit.cloudera.org:8080/#/c/17678/2/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test@7 PS2, Line 7: 333 Can you change to a value that will be also rounded up at some scale, e.g. 23.633? This would allow a test for like the one I mention at line 99 but for max_value http://gerrit.cloudera.org:8080/#/c/17678/2/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test@60 PS2, Line 60: ALTER TABLE test_dec CHANGE COLUMN d d DECIMAL(32, 8); This ALTER TABLE is not needed. http://gerrit.cloudera.org:8080/#/c/17678/2/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test@77 PS2, Line 77: ALTER TABLE test_dec CHANGE COLUMN d d DECIMAL(10, 2); This ALTER TABLE is not needed. http://gerrit.cloudera.org:8080/#/c/17678/2/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test@99 PS2, Line 99: -23.7 Can you add a test where search for -23.7 specifically? This would cover the case when the value we look for is lower than the min_value in the stats, so it would ensure that the stats are rounded correctly. -- To view, visit http://gerrit.cloudera.org:8080/17678 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icefa7e545ca9f7df1741a2d1225375ecf54434da Gerrit-Change-Number: 17678 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Comment-Date: Mon, 12 Jul 2021 23:38:02 +0000 Gerrit-HasComments: Yes