Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17678 )

Change subject: IMPALA-7087, IMPALA-8131: Read decimals from Parquet files with 
different precision/scale
......................................................................


Patch Set 3:

(19 comments)

Thanks for all the comments!

http://gerrit.cloudera.org:8080/#/c/17678/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17678/2//COMMIT_MSG@18
PS2, Line 18: A
> nit. Put the sentence in passive voice?
Done


http://gerrit.cloudera.org:8080/#/c/17678/2//COMMIT_MSG@22
PS2, Line 22: NU
> nit. Same as above.
Done


http://gerrit.cloudera.org:8080/#/c/17678/2//COMMIT_MSG@25
PS2, Line 25: Parquet column stats reader is also updated to convert the decimal
            : values.
> Can you mention how the different filters handle this conversion? e.g.:
Bloom filters work for decimals. I extended the commit message.


http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-readers.cc
File be/src/exec/parquet/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-readers.cc@740
PS2, Line 740:       // The value or the conversion is invalid but execution 
should continue - set the
> nit. 'The value or the conversion'
Done


http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-stats.h
File be/src/exec/parquet/parquet-column-stats.h:

http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-stats.h@351
PS2, Line 351:   /// Decodes decimal value into slot. Does conversion if needed.
> nit. May need to add a comment.
Done


http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-stats.h@352
PS2, Line 352: template <typename DecimalType>
             :   bool DecodeDecimal(const std:
> nit. It seems that they can appear in one line (instead of two)?
Done


http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-stats.cc
File be/src/exec/parquet/parquet-column-stats.cc:

http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-stats.cc@24
PS2, Line 24: #include "exec/parquet/parquet-data-converter.h"
            :
            : #include "common/names.h"
> common/names.h is usually an exception for this, I am not sure why
'common/names.h' has using declarations, i.e. it has side-effects. E.g. imagine 
the following scenario:

 #include "common/names.h"      // has dozens of using
 #include "some-other-header.h" // Now it compiles with the above usings


http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-stats.cc@348
PS2, Line 348: bool ret = 
ColumnStats<DecimalType>::DecodePlainValue(stat_value, slot,
> This line can be moved between line 351 and 352 when the converter is neede
Done


http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-stats.cc@352
PS2, Line 352: // Let's convert the decimal value to
> nit. LIKELY()?
Done


http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-column-stats.cc@353
PS2, Line 353:   // filters and min/max conjuncts against the converted values 
as later we'd also
> Can you mention why we can handle this correctly? See my comment in the com
Added comment.


http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-data-converter.h
File be/src/exec/parquet/parquet-data-converter.h:

http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-data-converter.h@49
PS2, Line 49: const ParquetTimestampD
> nit. const and &?
Done


http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exec/parquet/parquet-data-converter.h@179
PS2, Line 179: /*round=*/true
> nit. Just wonder why the round flag is set to true here.
I think both truncating and rounding would be valid based on the SQL standard, 
but I chose rounding to follow Hive's behavior.


http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exprs/decimal-operators-ir.cc
File be/src/exprs/decimal-operators-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/exprs/decimal-operators-ir.cc@119
PS2, Line 119: /*round=*/false
> nit. I wonder if there is a situation in which 'round' should be set to tru
I chose to preserve the old behavior to be on the safe side.
I didn't put too much thinking into it, probably rounding would be more 
rational.

Once this patch goes in I'll create a new Jira ticket to re-evaluate this 
behavior. We can make it tunable via a query option, but maybe it's better to 
choose a side in this case.


http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/17678/2/be/src/runtime/decimal-value.inline.h@144
PS2, Line 144: {
> nit. It seems this IF can be removed.
Done


http://gerrit.cloudera.org:8080/#/c/17678/2/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test
File 
testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test:

http://gerrit.cloudera.org:8080/#/c/17678/2/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test@7
PS2, Line 7: 633
> Can you change to a value that will be also rounded up at some scale, e.g.
Done


http://gerrit.cloudera.org:8080/#/c/17678/2/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test@58
PS2, Line 58: ----
> nit. Suggest to add an INSERT query inserting a value like 100000000.9999 t
I added it a bit later. I cannot add it here because in the subsequent tests we 
lower the precision to 10. And we still don't allow to read a file that has 
decimal with higher precision than table metadata.

Though maybe we could also allow it in this patch, and return NULL values in 
case of overflows.


http://gerrit.cloudera.org:8080/#/c/17678/2/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test@60
PS2, Line 60: ---- RESULTS
> This ALTER TABLE is not needed.
Done


http://gerrit.cloudera.org:8080/#/c/17678/2/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test@77
PS2, Line 77: 23.63
> This ALTER TABLE is not needed.
Done


http://gerrit.cloudera.org:8080/#/c/17678/2/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-altering.test@99
PS2, Line 99: DECIM
> Can you add a test where search for -23.7 specifically? This would cover th
Done. Also added a search for 24.



--
To view, visit http://gerrit.cloudera.org:8080/17678
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icefa7e545ca9f7df1741a2d1225375ecf54434da
Gerrit-Change-Number: 17678
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Tue, 13 Jul 2021 14:24:37 +0000
Gerrit-HasComments: Yes

Reply via email to