Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 )
Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early ...................................................................... Patch Set 8: (11 comments) I think this is a nice improvement. Looking forward to the performance results! BTW, Sorry for leaving lots of code style comments.. http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/codegen/gen_ir_descriptions.py File be/src/codegen/gen_ir_descriptions.py: http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/codegen/gen_ir_descriptions.py@235 PS8, Line 235: ["BOOL_MIN_MAX_FILTER_ALWAYSTRUE", "_ZNK6impala16BoolMinMaxFilter10AlwaysTrueEv"], : ["TINYINT_MIN_MAX_FILTER_ALWAYSTRUE", "_ZNK6impala19TinyIntMinMaxFilter10AlwaysTrueEv"], : ["SMALLINT_MIN_MAX_FILTER_ALWAYSTRUE", "_ZNK6impala20SmallIntMinMaxFilter10AlwaysTrueEv"], : ["INT_MIN_MAX_FILTER_ALWAYSTRUE", "_ZNK6impala15IntMinMaxFilter10AlwaysTrueEv"], : ["BIGINT_MIN_MAX_FILTER_ALWAYSTRUE", "_ZNK6impala18BigIntMinMaxFilter10AlwaysTrueEv"], : ["FLOAT_MIN_MAX_FILTER_ALWAYSTRUE", "_ZNK6impala17FloatMinMaxFilter10AlwaysTrueEv"], : ["DOUBLE_MIN_MAX_FILTER_ALWAYSTRUE", "_ZNK6impala18DoubleMinMaxFilter10AlwaysTrueEv"], : ["STRING_MIN_MAX_FILTER_ALWAYSTRUE", "_ZNK6impala18StringMinMaxFilter10AlwaysTrueEv"], : ["TIMESTAMP_MIN_MAX_FILTER_ALWAYSTRUE", "_ZNK6impala21TimestampMinMaxFilter10AlwaysTrueEv"], : ["DATE_MIN_MAX_FILTER_ALWAYSTRUE", "_ZNK6impala16DateMinMaxFilter10AlwaysTrueEv"], : ["DECIMAL_MIN_MAX_FILTER_ALWAYSTRUE", "_ZNK6impala19DecimalMinMaxFilter10AlwaysTrueEv"], Just curious, what errors will we encounter if we don't make the AlwaysTrue() method virtual and use _ZNK6impala12MinMaxFilter10AlwaysTrueEv directly? http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/filter-context.cc File be/src/exec/filter-context.cc: http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/filter-context.cc@281 PS8, Line 281: nit: redundant blank line http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/filter-context.cc@512 PS8, Line 512: if (computed_ratio) { : *computed_ratio = ratio; : } nit: our code style prefers collapsing simple if-statement to one line if (computed_ratio) *computed_ratio = ratio; http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/partitioned-hash-join-builder.cc@329 PS8, Line 329: DetermineUsefulnessForMinmaxFilters(); Should we codegen FilterContext::ShouldRejectFilterBasedOnColumnStats() if we want to call this for each batch (e.g. eliminate the switch branch in it in codegen)? http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/partitioned-hash-join-builder.cc@335 PS8, Line 335: if (filter_ctxs_.size() == 0) { : return; : } nit: our code style prefer collapsing simple if-statement to one line if (filter_ctxs_.size() == 0) return; http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/partitioned-hash-join-builder.cc@342 PS8, Line 342: if (ctx.local_min_max_filter->AlwaysTrue()) { : continue; : } nit: our code style prefer collapsing simple if-statement to one line if (ctx.local_min_max_filter->AlwaysTrue()) continue; http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/partitioned-hash-join-builder.cc@349 PS8, Line 349: auto nit: should we use "const auto&" instead? http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/partitioned-hash-join-builder.cc@351 PS8, Line 351: nit: our code style uses 4 spaces indention here http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/partitioned-hash-join-builder.cc@355 PS8, Line 355: if (min_ratio > ratio) { : min_ratio = ratio; : } nit: our code style prefer collapsing simple if-statement to one line if (min_ratio > ratio) min_ratio = ratio; http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/exec/partitioned-hash-join-builder.cc@992 PS8, Line 992: auto nit: should we use "const auto&" instead? http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/runtime/runtime-filter-ir.cc File be/src/runtime/runtime-filter-ir.cc: http://gerrit.cloudera.org:8080/#/c/17295/8/be/src/runtime/runtime-filter-ir.cc@35 PS8, Line 35: && !filter->AlwaysTrue() Should we move this into LIKELY()? -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 8 Gerrit-Owner: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Comment-Date: Wed, 21 Apr 2021 02:58:04 +0000 Gerrit-HasComments: Yes