Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/24062 )
Change subject: IMPALA-14116: Skip NULL in an IN-list against a column of an ORC table ...................................................................... IMPALA-14116: Skip NULL in an IN-list against a column of an ORC table This patch fixes a bug introduced in IMPALA-6505 that was later manifested by IMPALA-10873. Specifically, IMPALA-6505 allowed us to push min/max predicates against columns of an ORC table to the scan node. Given a supported column type, to prune out rows that do not satisfy a predicate, Impala has to provide the corresponding function in the ORC library with an instance of the literal, and the type of the predicate. The type of the literal has to match the type of the predicate. Otherwise, the ORC library would throw an exception before scanning the ORC table. However, during the execution of an HdfsOrcScanner, when there was a null literal in a predicate, Impala would provide a literal whose type did not match the type of the predicate for the date, string, and decimal columns. This is because we provided the constructor of orc::Literal with a pointer to orc::PredicateDataType instead of an orc::PredicateDataType when instantiating an orc::Literal of these data types. Due to this, we actually created a Boolean orc::Literal that did not match the respective predicate type (i.e., date, string, or decimal ). The aforementioned issue above was dormant because with IMPALA-6505, we only pushed down binary predicates to the scan nodes of ORC tables, and Impala's front-end did not push down the null literal in a binary predicate in such a case. The issue was later manifested by IMPALA-10873 in that we started pushing IN-list predicates to the ORC scanner, and the null literal in the IN-list predicates was not filtered out by the front-end in IMPALA-10873. To fix this issue, the patch makes the front-end not push down the null literal in the IN-list predicates against columns of ORC tables. This patch also corrects how we instantiate an orc::Literal in HdfsOrcScanner. Testing: - Added an end-to-end test to verify Impala could correctly return the result when there is NULL in an IN-list predicate against date, string, and decimal columns of an ORC table. Change-Id: Id62a631e5aa97132afbe0b184d427ad6bc1a4ad0 Reviewed-on: http://gerrit.cloudera.org:8080/24062 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M be/src/exec/orc/hdfs-orc-scanner.cc M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java A testdata/workloads/functional-query/queries/QueryTest/null_in_inlist.test M tests/query_test/test_scanners.py 4 files changed, 90 insertions(+), 7 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/24062 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Id62a631e5aa97132afbe0b184d427ad6bc1a4ad0 Gerrit-Change-Number: 24062 Gerrit-PatchSet: 12 Gerrit-Owner: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]>
