Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17678 )
Change subject: IMPALA-7087, IMPALA-8131: Read decimals from Parquet files with different precision/scale ...................................................................... Patch Set 3: (19 comments) Thanks for all the comments! http://gerrit.cloudera.org:8080/#/c/17678/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17678/2//COMMIT_MSG@18 PS2, Line 18: A > nit. Put the sentence in passive voice? Done http://gerrit.cloudera.org:8080/#/c/17678/2//COMMIT_MSG@22 PS2, Line 22: NU > nit. Same as above. Done http://gerrit.cloudera.org:8080/#/c/17678/2//COMMIT_MSG@25 PS2, Line 25: Parquet column stats reader is also updated to convert the decimal : values. > Can you mention how the different filters handle this conversion? e.g.: Bloom filters work for decimals. I extended the commit message. http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-readers.cc File be/src/exec/parquet/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-readers.cc@740 PS2, Line 740: // The value or the conversion is invalid but execution should continue - set the > nit. 'The value or the conversion' Done http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-stats.h File be/src/exec/parquet/parquet-column-stats.h: http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-stats.h@351 PS2, Line 351: /// Decodes decimal value into slot. Does conversion if needed. > nit. May need to add a comment. Done http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-stats.h@352 PS2, Line 352: template <typename DecimalType> : bool DecodeDecimal(const std: > nit. It seems that they can appear in one line (instead of two)? Done http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-stats.cc File be/src/exec/parquet/parquet-column-stats.cc: http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-stats.cc@24 PS2, Line 24: #include "exec/parquet/parquet-data-converter.h" : : #include "common/names.h" > common/names.h is usually an exception for this, I am not sure why 'common/names.h' has using declarations, i.e. it has side-effects. E.g. imagine the following scenario: #include "common/names.h" // has dozens of using #include "some-other-header.h" // Now it compiles with the above usings http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-stats.cc@348 PS2, Line 348: bool ret = ColumnStats<DecimalType>::DecodePlainValue(stat_value, slot, > This line can be moved between line 351 and 352 when the converter is neede Done http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-stats.cc@352 PS2, Line 352: // Let's convert the decimal value to > nit. LIKELY()? Done http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-stats.cc@353 PS2, Line 353: // filters and min/max conjuncts against the converted values as later we'd also > Can you mention why we can handle this correctly? See my comment in the com Added comment. http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-data-converter.h File be/src/exec/parquet/parquet-data-converter.h: http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-data-converter.h@49 PS2, Line 49: const ParquetTimestampD > nit. const and &? Done http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-data-converter.h@179 PS2, Line 179: /*round=*/true > nit. Just wonder why the round flag is set to true here. I think both truncating and rounding would be valid based on the SQL standard, but I chose rounding to follow Hive's behavior. http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exprs/decimal-operators-ir.cc File be/src/exprs/decimal-operators-ir.cc: http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exprs/decimal-operators-ir.cc@119 PS2, Line 119: /*round=*/false > nit. I wonder if there is a situation in which 'round' should be set to tru I chose to preserve the old behavior to be on the safe side. I didn't put too much thinking into it, probably rounding would be more rational. Once this patch goes in I'll create a new Jira ticket to re-evaluate this behavior. We can make it tunable via a query option, but maybe it's better to choose a side in this case. http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/runtime/decimal-value.inline.h@144 PS2, Line 144: { > nit. It seems this IF can be removed. Done http://gerrit.cloudera.org:8080/#/c/17678/2/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test File testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test: http://gerrit.cloudera.org:8080/#/c/17678/2/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test@7 PS2, Line 7: 633 > Can you change to a value that will be also rounded up at some scale, e.g. Done http://gerrit.cloudera.org:8080/#/c/17678/2/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test@58 PS2, Line 58: ---- > nit. Suggest to add an INSERT query inserting a value like 100000000.9999 t I added it a bit later. I cannot add it here because in the subsequent tests we lower the precision to 10. And we still don't allow to read a file that has decimal with higher precision than table metadata. Though maybe we could also allow it in this patch, and return NULL values in case of overflows. http://gerrit.cloudera.org:8080/#/c/17678/2/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test@60 PS2, Line 60: ---- RESULTS > This ALTER TABLE is not needed. Done http://gerrit.cloudera.org:8080/#/c/17678/2/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test@77 PS2, Line 77: 23.63 > This ALTER TABLE is not needed. Done http://gerrit.cloudera.org:8080/#/c/17678/2/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test@99 PS2, Line 99: DECIM > Can you add a test where search for -23.7 specifically? This would cover th Done. Also added a search for 24. -- To view, visit http://gerrit.cloudera.org:8080/17678 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icefa7e545ca9f7df1741a2d1225375ecf54434da Gerrit-Change-Number: 17678 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Tue, 13 Jul 2021 14:24:37 +0000 Gerrit-HasComments: Yes