Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24001 )

Change subject: IMPALA-14737: Push down LIKE predicates to Iceberg
......................................................................


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/24001/6/fe/src/main/java/org/apache/impala/common/IcebergPredicateConverter.java
File fe/src/main/java/org/apache/impala/common/IcebergPredicateConverter.java:

http://gerrit.cloudera.org:8080/#/c/24001/6/fe/src/main/java/org/apache/impala/common/IcebergPredicateConverter.java@169
PS6, Line 169:       // Check if this is a wildcard
             :       if (c == '%' || c == '_') {
             :         // Count preceding backslashes to see if it's escaped
             :         int backslashCount = 0;
             :         int j = i - 1;
             :         while (j >= 0 && pattern.charAt(j) == '\\') {
             :           backslashCount++;
             :           j--;
             :         }
             :
             :         // If odd number of backslashes, wildcard is escaped = 
literal content
             :         if (backslashCount % 2 == 1) {
             :           return true;
             :         }
This could be moved to a helper method as it is also used by 
findFirstUnescapedWildcard().


http://gerrit.cloudera.org:8080/#/c/24001/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test:

http://gerrit.cloudera.org:8080/#/c/24001/4/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test@122
PS4, Line 122: select * from iceberg_partitioned where action like "d%" and 
event_time < "2022-01-01" and id < 10
> The expectation is correct that action LIKE 'd%d' shouldn't be pushed down
In the end, I think we want both:
* push down LIKE 'd%' to Iceberg
* keep LIKE 'd%d' in predicates

This way, Iceberg can prune partitions and files for us. Then the executors 
only need to evaluate 'd%d' on the surviving data files.

But I'm fine with splitting this ticket into two commits:
1: handle LIKE 'prefix%' / 'prefix_' cases

(it is basically implemented by the current patch set)

2: handle the LIKE 'prefix%suffix' cases which pushes down LIKE 'prefix' to 
Iceberg, but still evaluates LIKE 'prefix%suffix' on the surviving rows.

The 2. needs modifications here: 
https://github.com/apache/impala/blob/0f53e31363dddad918c5f5cf103697b4624d9ede/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java#L1006-1018

The converter will need to signal that the original predicate was altered. 
Probably by introducing a ConverterResult class.


http://gerrit.cloudera.org:8080/#/c/24001/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-like-pushdown.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-like-pushdown.test:

http://gerrit.cloudera.org:8080/#/c/24001/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-like-pushdown.test@330
PS6, Line 330: (should only match 'download' which has 'd' at both ends)
We might need a new table for this, because iceberg_partitioned only has 
'download' that starts with 'd'.



--
To view, visit http://gerrit.cloudera.org:8080/24001
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I548834126540bcc8d22efc872c2571293b8b7ec4
Gerrit-Change-Number: 24001
Gerrit-PatchSet: 6
Gerrit-Owner: Arnab Karmakar <[email protected]>
Gerrit-Reviewer: Arnab Karmakar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 24 Feb 2026 11:05:55 +0000
Gerrit-HasComments: Yes

Reply via email to