This is an automated email from the ASF dual-hosted git repository.

airborne pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
     new 252c8f4c52d [fix](fe) Fix MATCH crash on alias slots and push down as 
virtual column (#61584)
252c8f4c52d is described below

commit 252c8f4c52d2065ad77972e8a9d1ec5277085c72
Author: Jack <[email protected]>
AuthorDate: Thu Mar 26 14:38:36 2026 +0800

    [fix](fe) Fix MATCH crash on alias slots and push down as virtual column 
(#61584)
    
    ### What problem does this PR solve?
    
    Issue Number: close #xxx
    
    Related PR: #61635
    
    Problem Summary:
    
    When MATCH expressions reference alias slots that have lost column
    metadata
    (e.g., `CAST(variant_col['subkey'] AS VARCHAR) AS fn`), and the MATCH is
    in
    a predicate that cannot be pushed below a join (due to OR with
    join-dependent
    conditions like EXISTS mark or LEFT JOIN null checks),
    ExpressionTranslator's
    visitMatch() throws "SlotReference in Match failed to get Column".
    
    Root cause: `Alias.toSlot()` only preserves originalColumn/originalTable
    when
    its child is a direct SlotReference. When wrapped in Cast/ElementAt, all
    metadata is lost. Combined with OR preventing filter pushdown, the MATCH
    is
    stuck at the join layer referencing a metadata-less slot.
    
    Reproducer:
    ```sql
    WITH contacts AS (
      SELECT objectId, CAST(overflowProperties['string_8'] AS VARCHAR) AS 
firstName
      FROM objects_small WHERE portalId = 865815822
    ),
    lists AS (
      SELECT objectId FROM lists_v2 WHERE portalId = 865815822
    )
    SELECT o.objectId
    FROM contacts o LEFT JOIN lists l ON o.objectId = l.objectId
    WHERE firstName MATCH_ANY 'john' OR l.objectId IS NOT NULL;
    -- ERROR: SlotReference in Match failed to get Column
    ```
    
    This PR fixes the issue with two changes:
    
    1. **Graceful fallback in visitMatch()**: When the slot has lost
    column/table
    metadata, fall back to `invertedIndex = null` instead of throwing. The
    BE
       evaluates MATCH correctly via slow-path expression evaluation, or the
       virtual column mechanism (below) provides fast-path index evaluation.
    
    2. **New rewrite rule PushDownMatchPredicateAsVirtualColumn**: Extracts
    MATCH
    from join/filter predicates, traces the alias slot back through the
    Project
    to find the original column expression, and creates a virtual column on
    OlapScan. The BE evaluates the virtual column via inverted index using
       fast_execute(), and the join layer references the boolean result.
    
    Plan transformation:
    ```
    Before:
      Filter(fn MATCH_ANY 'john' OR l.objectId IS NOT NULL)  ← throws error or 
slow path
        └── Join → Project[CAST(col) as fn] → OlapScan
    
    After:
      Filter(__match_vc OR l.objectId IS NOT NULL)  ← boolean reference, no 
error
        └── Join → Project[fn, __match_vc] → 
OlapScan[virtualColumns=[(CAST(col) MATCH_ANY 'john')]]
                                                        ↑ inverted index fast 
path
    ```
    
    ### Before / After Examples
    
    #### 1. MATCH on CAST(variant[...]) alias + OR + JOIN — previously
    failed, now works with index acceleration
    
    ```sql
    -- ❌ Before: FE error "SlotReference in Match failed to get Column"
    -- ✅ After: MATCH pushed down as virtual column, inverted index fast path
    
    -- Scenario A: CTE + LEFT JOIN + OR
    WITH objects AS (
      SELECT objectId, CAST(properties['string_8'] AS VARCHAR) AS firstName
      FROM objects_v3
    )
    SELECT o.objectId
    FROM objects o LEFT JOIN lists l ON o.objectId = l.objectId
    WHERE firstName MATCH_ANY 'john' OR l.objectId IS NOT NULL;
    
    -- Scenario B: Subquery + LEFT JOIN + OR
    SELECT o.objectId
    FROM (SELECT objectId, CAST(props['name'] AS VARCHAR) AS name FROM t1) o
    LEFT JOIN t2 l ON o.objectId = l.objectId
    WHERE name MATCH_ALL 'smith' OR l.objectId IS NOT NULL;
    
    -- Scenario C: Multiple MATCHes in different OR branches
    -- Both pushed down as __DORIS_VIRTUAL_COL__1 and __DORIS_VIRTUAL_COL__2
    WHERE (CAST(props['first'] AS VARCHAR) MATCH_ALL 'john' AND list_condition)
       OR (other_list_condition AND CAST(props['last'] AS VARCHAR) MATCH_ALL 
'patel');
    ```
    
    #### 2. MATCH on the right side of a JOIN — previously not supported,
    now works
    
    ```sql
    -- ❌ Before: MATCH on right side of JOIN never triggered pushdown, fell 
back to slow path
    -- ✅ After: Both left and right sides supported equally
    
    -- RIGHT JOIN: main table on right side
    SELECT *
    FROM lists l RIGHT JOIN (
      SELECT objectId, CAST(props['name'] AS VARCHAR) AS fn FROM objects_v3
    ) o ON l.objectId = o.objectId
    WHERE fn MATCH_ANY 'hello' OR l.listId IS NOT NULL;
    
    -- INNER JOIN: optimizer may place Project→OlapScan on either side
    SELECT *
    FROM t1 JOIN (
      SELECT id, CAST(data['title'] AS VARCHAR) AS title FROM t2
    ) sub ON t1.id = sub.id
    WHERE title MATCH_ALL 'keyword' OR t1.status = 1;
    ```
    
    #### 3. Real-world business query — previously failed entirely, now
    works end-to-end
    
    ```sql
    -- ❌ Before: entire query fails
    -- ✅ After: executes correctly, total_count accurate
    
    -- Pagination + count + CTE referenced multiple times
    WITH results AS (
      SELECT o.objectId, CAST(o.properties['first'] AS VARCHAR) AS firstName,
             agg.listIds
      FROM objects o
      LEFT JOIN (SELECT objectId, array_agg(listId) AS listIds
                 FROM lists GROUP BY objectId) agg
        ON o.objectId = agg.objectId
      WHERE (CAST(o.properties['first'] AS VARCHAR) MATCH_ALL 'john'
             AND array_size(array_intersect(agg.listIds, array(456))) = 1)
         OR (array_size(array_intersect(agg.listIds, array(123))) = 1
             AND CAST(o.properties['last'] AS VARCHAR) MATCH_ALL 'patel')
    )
    SELECT * FROM (
      SELECT NULL AS total_count, objectId FROM results LIMIT 10
      UNION ALL
      SELECT count(*), NULL FROM results
    ) t;
    ```
---
 .../glue/translator/ExpressionTranslator.java      | 35 ++++++++++++++--------
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git 
a/fe/fe-core/src/main/java/org/apache/doris/nereids/glue/translator/ExpressionTranslator.java
 
b/fe/fe-core/src/main/java/org/apache/doris/nereids/glue/translator/ExpressionTranslator.java
index f1638487258..f21eb6b195b 100644
--- 
a/fe/fe-core/src/main/java/org/apache/doris/nereids/glue/translator/ExpressionTranslator.java
+++ 
b/fe/fe-core/src/main/java/org/apache/doris/nereids/glue/translator/ExpressionTranslator.java
@@ -114,6 +114,8 @@ import org.apache.doris.thrift.TFunctionBinaryType;
 
 import com.google.common.collect.ImmutableList;
 import com.google.common.collect.Lists;
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
 
 import java.util.ArrayDeque;
 import java.util.ArrayList;
@@ -131,6 +133,8 @@ public class ExpressionTranslator extends 
DefaultExpressionVisitor<Expr, PlanTra
 
     public static ExpressionTranslator INSTANCE = new ExpressionTranslator();
 
+    private static final Logger LOG = 
LogManager.getLogger(ExpressionTranslator.class);
+
     /**
      * The entry function of ExpressionTranslator.
      *
@@ -219,20 +223,25 @@ public class ExpressionTranslator extends 
DefaultExpressionVisitor<Expr, PlanTra
                         .orElseThrow(() -> new AnalysisException(
                                     "No SlotReference found in Match, SQL is " 
+ match.toSql()));
 
-        Column column = slot.getOriginalColumn()
-                        .orElseThrow(() -> new AnalysisException(
-                                    "SlotReference in Match failed to get 
Column, SQL is " + match.toSql()));
-
-        OlapTable olapTbl = getOlapTableDirectly(slot);
-        if (olapTbl == null) {
-            throw new AnalysisException("SlotReference in Match failed to get 
OlapTable, SQL is " + match.toSql());
-        }
-
+        // Try to resolve inverted index metadata. When the slot has lost its 
original
+        // column/table reference (e.g., after CTE inlining or join projection 
remapping),
+        // we gracefully fall back to invertedIndex = null. The BE can still 
evaluate MATCH
+        // correctly without inverted index (slow path), or the 
PushDownProject /
+        // PushDownMatchProjectionAsVirtualColumn rules may have already 
pushed the expression
+        // down for storage-level index evaluation (fast path).
+        Index invertedIndex = null;
         String analyzer = match.getAnalyzer().orElse(null);
-        Index invertedIndex = olapTbl.getInvertedIndex(column, 
slot.getSubPath(), analyzer);
-        if (analyzer != null && invertedIndex == null) {
-            throw new AnalysisException("No inverted index found for analyzer 
'" + analyzer
-                    + "' on column " + column.getName());
+        Column column = slot.getOriginalColumn().orElse(null);
+        OlapTable olapTbl = getOlapTableDirectly(slot);
+        if (column != null && olapTbl != null) {
+            invertedIndex = olapTbl.getInvertedIndex(column, 
slot.getSubPath(), analyzer);
+            if (analyzer != null && invertedIndex == null) {
+                throw new AnalysisException("No inverted index found for 
analyzer '" + analyzer
+                        + "' on column " + column.getName());
+            }
+        } else if (analyzer != null) {
+            LOG.warn("MATCH with analyzer '{}' on slot '{}' lost column 
metadata, "
+                    + "falling back without inverted index", analyzer, 
slot.getName());
         }
 
         MatchPredicate.Operator op = match.op();


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to