This is an automated email from the ASF dual-hosted git repository.
airborne pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push:
new 252c8f4c52d [fix](fe) Fix MATCH crash on alias slots and push down as
virtual column (#61584)
252c8f4c52d is described below
commit 252c8f4c52d2065ad77972e8a9d1ec5277085c72
Author: Jack <[email protected]>
AuthorDate: Thu Mar 26 14:38:36 2026 +0800
[fix](fe) Fix MATCH crash on alias slots and push down as virtual column
(#61584)
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #61635
Problem Summary:
When MATCH expressions reference alias slots that have lost column
metadata
(e.g., `CAST(variant_col['subkey'] AS VARCHAR) AS fn`), and the MATCH is
in
a predicate that cannot be pushed below a join (due to OR with
join-dependent
conditions like EXISTS mark or LEFT JOIN null checks),
ExpressionTranslator's
visitMatch() throws "SlotReference in Match failed to get Column".
Root cause: `Alias.toSlot()` only preserves originalColumn/originalTable
when
its child is a direct SlotReference. When wrapped in Cast/ElementAt, all
metadata is lost. Combined with OR preventing filter pushdown, the MATCH
is
stuck at the join layer referencing a metadata-less slot.
Reproducer:
```sql
WITH contacts AS (
SELECT objectId, CAST(overflowProperties['string_8'] AS VARCHAR) AS
firstName
FROM objects_small WHERE portalId = 865815822
),
lists AS (
SELECT objectId FROM lists_v2 WHERE portalId = 865815822
)
SELECT o.objectId
FROM contacts o LEFT JOIN lists l ON o.objectId = l.objectId
WHERE firstName MATCH_ANY 'john' OR l.objectId IS NOT NULL;
-- ERROR: SlotReference in Match failed to get Column
```
This PR fixes the issue with two changes:
1. **Graceful fallback in visitMatch()**: When the slot has lost
column/table
metadata, fall back to `invertedIndex = null` instead of throwing. The
BE
evaluates MATCH correctly via slow-path expression evaluation, or the
virtual column mechanism (below) provides fast-path index evaluation.
2. **New rewrite rule PushDownMatchPredicateAsVirtualColumn**: Extracts
MATCH
from join/filter predicates, traces the alias slot back through the
Project
to find the original column expression, and creates a virtual column on
OlapScan. The BE evaluates the virtual column via inverted index using
fast_execute(), and the join layer references the boolean result.
Plan transformation:
```
Before:
Filter(fn MATCH_ANY 'john' OR l.objectId IS NOT NULL) ← throws error or
slow path
└── Join → Project[CAST(col) as fn] → OlapScan
After:
Filter(__match_vc OR l.objectId IS NOT NULL) ← boolean reference, no
error
└── Join → Project[fn, __match_vc] →
OlapScan[virtualColumns=[(CAST(col) MATCH_ANY 'john')]]
↑ inverted index fast
path
```
### Before / After Examples
#### 1. MATCH on CAST(variant[...]) alias + OR + JOIN — previously
failed, now works with index acceleration
```sql
-- ❌ Before: FE error "SlotReference in Match failed to get Column"
-- ✅ After: MATCH pushed down as virtual column, inverted index fast path
-- Scenario A: CTE + LEFT JOIN + OR
WITH objects AS (
SELECT objectId, CAST(properties['string_8'] AS VARCHAR) AS firstName
FROM objects_v3
)
SELECT o.objectId
FROM objects o LEFT JOIN lists l ON o.objectId = l.objectId
WHERE firstName MATCH_ANY 'john' OR l.objectId IS NOT NULL;
-- Scenario B: Subquery + LEFT JOIN + OR
SELECT o.objectId
FROM (SELECT objectId, CAST(props['name'] AS VARCHAR) AS name FROM t1) o
LEFT JOIN t2 l ON o.objectId = l.objectId
WHERE name MATCH_ALL 'smith' OR l.objectId IS NOT NULL;
-- Scenario C: Multiple MATCHes in different OR branches
-- Both pushed down as __DORIS_VIRTUAL_COL__1 and __DORIS_VIRTUAL_COL__2
WHERE (CAST(props['first'] AS VARCHAR) MATCH_ALL 'john' AND list_condition)
OR (other_list_condition AND CAST(props['last'] AS VARCHAR) MATCH_ALL
'patel');
```
#### 2. MATCH on the right side of a JOIN — previously not supported,
now works
```sql
-- ❌ Before: MATCH on right side of JOIN never triggered pushdown, fell
back to slow path
-- ✅ After: Both left and right sides supported equally
-- RIGHT JOIN: main table on right side
SELECT *
FROM lists l RIGHT JOIN (
SELECT objectId, CAST(props['name'] AS VARCHAR) AS fn FROM objects_v3
) o ON l.objectId = o.objectId
WHERE fn MATCH_ANY 'hello' OR l.listId IS NOT NULL;
-- INNER JOIN: optimizer may place Project→OlapScan on either side
SELECT *
FROM t1 JOIN (
SELECT id, CAST(data['title'] AS VARCHAR) AS title FROM t2
) sub ON t1.id = sub.id
WHERE title MATCH_ALL 'keyword' OR t1.status = 1;
```
#### 3. Real-world business query — previously failed entirely, now
works end-to-end
```sql
-- ❌ Before: entire query fails
-- ✅ After: executes correctly, total_count accurate
-- Pagination + count + CTE referenced multiple times
WITH results AS (
SELECT o.objectId, CAST(o.properties['first'] AS VARCHAR) AS firstName,
agg.listIds
FROM objects o
LEFT JOIN (SELECT objectId, array_agg(listId) AS listIds
FROM lists GROUP BY objectId) agg
ON o.objectId = agg.objectId
WHERE (CAST(o.properties['first'] AS VARCHAR) MATCH_ALL 'john'
AND array_size(array_intersect(agg.listIds, array(456))) = 1)
OR (array_size(array_intersect(agg.listIds, array(123))) = 1
AND CAST(o.properties['last'] AS VARCHAR) MATCH_ALL 'patel')
)
SELECT * FROM (
SELECT NULL AS total_count, objectId FROM results LIMIT 10
UNION ALL
SELECT count(*), NULL FROM results
) t;
```
---
.../glue/translator/ExpressionTranslator.java | 35 ++++++++++++++--------
1 file changed, 22 insertions(+), 13 deletions(-)
diff --git
a/fe/fe-core/src/main/java/org/apache/doris/nereids/glue/translator/ExpressionTranslator.java
b/fe/fe-core/src/main/java/org/apache/doris/nereids/glue/translator/ExpressionTranslator.java
index f1638487258..f21eb6b195b 100644
---
a/fe/fe-core/src/main/java/org/apache/doris/nereids/glue/translator/ExpressionTranslator.java
+++
b/fe/fe-core/src/main/java/org/apache/doris/nereids/glue/translator/ExpressionTranslator.java
@@ -114,6 +114,8 @@ import org.apache.doris.thrift.TFunctionBinaryType;
import com.google.common.collect.ImmutableList;
import com.google.common.collect.Lists;
+import org.apache.logging.log4j.LogManager;
+import org.apache.logging.log4j.Logger;
import java.util.ArrayDeque;
import java.util.ArrayList;
@@ -131,6 +133,8 @@ public class ExpressionTranslator extends
DefaultExpressionVisitor<Expr, PlanTra
public static ExpressionTranslator INSTANCE = new ExpressionTranslator();
+ private static final Logger LOG =
LogManager.getLogger(ExpressionTranslator.class);
+
/**
* The entry function of ExpressionTranslator.
*
@@ -219,20 +223,25 @@ public class ExpressionTranslator extends
DefaultExpressionVisitor<Expr, PlanTra
.orElseThrow(() -> new AnalysisException(
"No SlotReference found in Match, SQL is "
+ match.toSql()));
- Column column = slot.getOriginalColumn()
- .orElseThrow(() -> new AnalysisException(
- "SlotReference in Match failed to get
Column, SQL is " + match.toSql()));
-
- OlapTable olapTbl = getOlapTableDirectly(slot);
- if (olapTbl == null) {
- throw new AnalysisException("SlotReference in Match failed to get
OlapTable, SQL is " + match.toSql());
- }
-
+ // Try to resolve inverted index metadata. When the slot has lost its
original
+ // column/table reference (e.g., after CTE inlining or join projection
remapping),
+ // we gracefully fall back to invertedIndex = null. The BE can still
evaluate MATCH
+ // correctly without inverted index (slow path), or the
PushDownProject /
+ // PushDownMatchProjectionAsVirtualColumn rules may have already
pushed the expression
+ // down for storage-level index evaluation (fast path).
+ Index invertedIndex = null;
String analyzer = match.getAnalyzer().orElse(null);
- Index invertedIndex = olapTbl.getInvertedIndex(column,
slot.getSubPath(), analyzer);
- if (analyzer != null && invertedIndex == null) {
- throw new AnalysisException("No inverted index found for analyzer
'" + analyzer
- + "' on column " + column.getName());
+ Column column = slot.getOriginalColumn().orElse(null);
+ OlapTable olapTbl = getOlapTableDirectly(slot);
+ if (column != null && olapTbl != null) {
+ invertedIndex = olapTbl.getInvertedIndex(column,
slot.getSubPath(), analyzer);
+ if (analyzer != null && invertedIndex == null) {
+ throw new AnalysisException("No inverted index found for
analyzer '" + analyzer
+ + "' on column " + column.getName());
+ }
+ } else if (analyzer != null) {
+ LOG.warn("MATCH with analyzer '{}' on slot '{}' lost column
metadata, "
+ + "falling back without inverted index", analyzer,
slot.getName());
}
MatchPredicate.Operator op = match.op();
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]