Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17678 )

Change subject: IMPALA-7087, IMPALA-8131: Read decimals from Parquet files with 
different precision/scale
......................................................................


Patch Set 2: Code-Review+1

(8 comments)

lgtm, only have small comments

http://gerrit.cloudera.org:8080/#/c/17678/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17678/2//COMMIT_MSG@25
PS2, Line 25: Parquet column stats reader is also updated to convert the decimal
            : values.
Can you mention how the different filters handle this conversion? e.g.:
dictionary filter: disabled for all column that need conversion
bloom filter: not yet supported for Decimal
min-max filter: supported, because the conversion preservers the ordering of 
values, so if min<=val<=max then round(min)<=round(val)<=round(max) will be 
also true + stats that are converted to NULL are ignored

These may seem self-evident, but it took some time to convince myself that we 
handle all these cases correctly :)


http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-stats.cc
File be/src/exec/parquet/parquet-column-stats.cc:

http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-stats.cc@24
PS2, Line 24: #include "exec/parquet/parquet-data-converter.h"
            :
            : #include "common/names.h"
> The order is not in ascending order.
common/names.h is usually an exception for this, I am not sure why


http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-stats.cc@353
PS2, Line 353:   // No need for an extra buffer, we can do the conversion 
in-place.
Can you mention why we can handle this correctly? See my comment in the commit 
message.


http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-data-converter.h
File be/src/exec/parquet/parquet-data-converter.h:

http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-data-converter.h@32
PS2, Line 32: ParquetDataConverter
Thanks for moving this logic out for parquet-column-readers.cc!


http://gerrit.cloudera.org:8080/#/c/17678/2/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test
File 
testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test:

http://gerrit.cloudera.org:8080/#/c/17678/2/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test@7
PS2, Line 7: 333
Can you change to a value that will be also rounded up at some scale, e.g. 
23.633? This would allow a test for like the one I mention at line 99 but for 
max_value


http://gerrit.cloudera.org:8080/#/c/17678/2/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test@60
PS2, Line 60: ALTER TABLE test_dec CHANGE COLUMN d d DECIMAL(32, 8);
This ALTER TABLE is not needed.


http://gerrit.cloudera.org:8080/#/c/17678/2/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test@77
PS2, Line 77: ALTER TABLE test_dec CHANGE COLUMN d d DECIMAL(10, 2);
This ALTER TABLE is not needed.


http://gerrit.cloudera.org:8080/#/c/17678/2/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test@99
PS2, Line 99: -23.7
Can you add a test where search for -23.7 specifically? This would cover the 
case when the value we look for is lower than the min_value in the stats, so it 
would ensure that the stats are rounded correctly.



-- 
To view, visit http://gerrit.cloudera.org:8080/17678
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icefa7e545ca9f7df1741a2d1225375ecf54434da
Gerrit-Change-Number: 17678
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 12 Jul 2021 23:38:02 +0000
Gerrit-HasComments: Yes

Reply via email to