Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/15403 )
Change subject: IMPALA-6505: Min-Max predicate push down in ORC scanner ...................................................................... Patch Set 4: (21 comments) Added tests and revoled the comments. Let me know if I missed any. http://gerrit.cloudera.org:8080/#/c/15403/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15403/1//COMMIT_MSG@19 PS1, Line 19: - Run scanner tests on ORC files. > Some ideas for more tests: Done. Before ORC-961, we can only check "RowsRead" in the profile. http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.h File be/src/exec/hdfs-orc-scanner.h: http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.h@180 PS2, Line 180: o whe > nit. would search_argument_pool sounds better? We have such names in other places meaning MemPool for all permanent allocations. I feel like we can keep this name.. http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.h@316 PS2, Line 316: r* slot); > nit: can it be const? We need to use ScalarExprEvaluator::GetValue() which is unfortunately not declared as const.. http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.h@317 PS2, Line 317: > nit: we usually put output parameters at the end of the parameter list Done http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc@888 PS1, Line 888: int tuple_idx = 0; : while (!scan_node_- > There shouldn't be too much problem if the table is written and read using Let's skip pushing down predicates on CHAR/VARCHAR since there are also other issues like IMPALA-1652 and HIVE-24040. I filed IMPALA-10882 to continue this work. http://gerrit.cloudera.org:8080/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc@959 PS1, Line 959: *predicate_type = orc::PredicateDataType::TIMESTAMP; : if (!val) return orc::Literal(predicate_type); : const TimestampValue* tv = reinterpret_cast<const TimestampValue*>(val); : int64_t nanos; > Same as above, wait for ORC-612? Done http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@631 PS2, Line 631: int num_row_to_commit = TransferScratchTuples(row_batch); > Can this be removed? Done http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@833 PS2, Line 833: row_id < capacity && !o > nit. Please add a comment: T is the intended ORC primitive type and U is Im Done http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@834 PS2, Line 834: nullptr); > nit. maybe renamed as GetORCPrimitiveLiteral. Done http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@836 PS2, Line 836: TURN_ > UNLIKELY()? I think this is not in the hot path. But it's harmless to add it. Done. http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@874 PS2, Line 874: t i > UNLIKELY()? Done http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@888 PS2, Line 888: int tuple_idx = 0; > Then should we add a DCHECK? Done http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@895 PS2, Line 895: t nu > UNLIKELY()? Done http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@901 PS2, Line 901: m_row > UNLIKELY()? Done http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@905 PS2, Line 905: e (num_ > need to check NULL Nice catch! http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@917 PS2, Line 917: : : template<typename T, typena > These three local variables can be relocated to each of the relevant cases. Done http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@941 PS2, Line 941: *predicate_type = orc::PredicateDataType::LONG; > I wonder if we should return Status for this method instead of ORC literal Nice catch! For failure cases we all return NULL literals. I added a check in the caller. http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/exec/hdfs-orc-scanner.cc@980 PS2, Line 980: // The date should be valid at this point. : ignore_result(dv->ToDaysSinceEpoch(&value)); : return orc::Literal(*predicate_type, value); : } : case TYPE_STRING: : case TYPE_VARCHAR:{ : *predicate_type = orc::PredicateDataType::STRING; : if (!val) return orc::Literal(predicate_type); : const StringValue* sv = reinterpret_cast<StringValue*>(val); : return orc::Lite > I think for a min/max filter F, we need to formulate the following two push Will address this in IMPALA-10878. http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/service/query-options.h File be/src/service/query-options.h: http://gerrit.cloudera.org:8080/#/c/15403/2/be/src/service/query-options.h@50 PS2, Line 50: ORC_READ_STATISTICS > ORC_PUSHDOWN_PREDICATES probably sounds better for this patch. Maybe there There is an existing query option PARQUET_READ_STATISTICS. Maybe it's ok to keep this name? http://gerrit.cloudera.org:8080/#/c/15403/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/15403/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@543 PS1, Line 543: statsPred.analyzeNoThrow(analyzer); > ORC lib supports pushing down EQ predicates. I think we don't need to trans Will do this in IMPALA-10873 http://gerrit.cloudera.org:8080/#/c/15403/2/tests/query_test/test_nested_types.py File tests/query_test/test_nested_types.py: http://gerrit.cloudera.org:8080/#/c/15403/2/tests/query_test/test_nested_types.py@165 PS2, Line 165: @SkipIfABFS.hive : @SkipIfADLS.hive > Should we change the reason here? Like "this test is specific to Parquet, f Done -- To view, visit http://gerrit.cloudera.org:8080/15403 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I136622413db21e0941d238ab6aeea901a6464845 Gerrit-Change-Number: 15403 Gerrit-PatchSet: 4 Gerrit-Owner: Norbert Luksa <norbert.lu...@cloudera.com> Gerrit-Reviewer: Anonymous Coward (520) Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Norbert Luksa <norbert.lu...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Wed, 25 Aug 2021 11:50:53 +0000 Gerrit-HasComments: Yes