This is an automated email from the ASF dual-hosted git repository.

huajianlan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
     new 0e669b97c6b [opt](nereids) optimize normalize window (#54947)
0e669b97c6b is described below

commit 0e669b97c6b70158ae75147a525c1f8ee5849df2
Author: 924060929 <[email protected]>
AuthorDate: Wed Aug 20 10:20:40 2025 +0800

    [opt](nereids) optimize normalize window (#54947)
    
    close #54577
    
    optimize normalize window, only push down the expression's used slots to
    bottom projects, which the expression not contains WindowFunction, so we
    can push down more filter through window.
    
    for this sql:
    ```sql
      select
        SUBSTR(orderdate,1,10) AS dt,
        ROW_NUMBER() OVER(PARTITION BY orderdate ORDER BY orderid DESC) AS rn
      from lineorders
      having dt = '2025-01-01'
    ```
    we not push down the `dt` slot under LogicalWindow, but push down
    [orderdate, orderid]
    to the bottom projects, because if we push down `dt`, the plan tree will
    be:
    ```
                LogicalFilter(substr(dt#3, 1, 10) = '2025-01-01')
                                        |
         LogicalWindow(rowNumber(partition by orderdate#2, order by orderid#1))
                                        |
      LogicalProject(orderid#1, orderdate#2, substr(orderdate#1, 1, 10) as dt#3)
    ```
    
    and can not push down filter by `PushDownFilterThroughWindow`, causing
    inefficiency,
    because dt#3 in LogicalFilter not contains in the partition key in
    LogicalWindow: [orderdate#2].
    
    
    so we only push down orderdate in the LogicalFilter, not push down `dt`:
    
    ```
         LogicalFilter(substr(orderdate#2, 1, 10) = '2025-01-01')
                                  |
         LogicalWindow(rowNumber(partition by orderdate#2, order by orderid#1))
                                  |
                LogicalProject(orderid#1, orderdate#2)
    ```
    
    and then, `PushDownFilterThroughWindow` found the LogicalFilter's
    `orderdate#2` contains
    in the LogicalWindow's partition key: [orderdate#2], and can push down
    filter to:
    
    ```
      LogicalWindow(rowNumber(partition by orderdate#2, order by orderid#1))
                                  |
                LogicalProject(orderid#1, orderdate#2)
                                 |
        LogicalFilter(substr(orderdate#2, 1, 10) = '2025-01-01')
    ```
---
 .../ExtractAndNormalizeWindowExpression.java       |  39 +++++++++++++
 .../rules/analysis/FillUpMissingSlotsTest.java     |  13 ++---
 .../rewrite/PushDownFilterThroughWindowTest.java   |  64 ++++++++++++++++++++-
 .../nereids_p0/cte/test_cte_filter_pushdown.out    | Bin 1358 -> 1357 bytes
 .../cte/test_cte_filter_pushdown.out               | Bin 1358 -> 1357 bytes
 .../tpch/push_filter_window_eqset.out              | Bin 418 -> 362 bytes
 .../tpcds_sf100/noStatsRfPrune/query51.out         | Bin 2673 -> 2637 bytes
 .../tpcds_sf100/no_stats_shape/query51.out         | Bin 2673 -> 2637 bytes
 .../shape_check/tpcds_sf100/rf_prune/query51.out   | Bin 2673 -> 2637 bytes
 .../data/shape_check/tpcds_sf100/shape/query51.out | Bin 2673 -> 2637 bytes
 .../data/shape_check/tpcds_sf1000/hint/query51.out | Bin 2791 -> 2755 bytes
 .../shape_check/tpcds_sf1000/shape/query51.out     | Bin 2673 -> 2637 bytes
 .../shape_check/tpcds_sf10t_orc/shape/query51.out  | Bin 2673 -> 2637 bytes
 13 files changed, 107 insertions(+), 9 deletions(-)

diff --git 
a/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/ExtractAndNormalizeWindowExpression.java
 
b/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/ExtractAndNormalizeWindowExpression.java
index 02165709564..e96cf3e105c 100644
--- 
a/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/ExtractAndNormalizeWindowExpression.java
+++ 
b/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/ExtractAndNormalizeWindowExpression.java
@@ -180,6 +180,45 @@ public class ExtractAndNormalizeWindowExpression extends 
OneRewriteRuleFactory i
                             inputSlots.stream()
                     ).distinct();
                 }
+
+                // for this sql:
+                //   select
+                //     SUBSTR(orderdate,1,10) AS dt,
+                //     ROW_NUMBER() OVER(PARTITION BY orderdate ORDER BY 
orderid DESC) AS rn
+                //   from lineorders
+                //   having dt = '2025-01-01'
+                //
+                // we not push down the `dt` slot under LogicalWindow, but 
push down [orderdate, orderid]
+                // to the bottom projects, because if we push down `dt`, the 
plan tree will be:
+                //
+                //             LogicalFilter(substr(dt#3, 1, 10) = 
'2025-01-01')
+                //                                     |
+                //      LogicalWindow(rowNumber(partition by orderdate#2, 
order by orderid#1))
+                //                                     |
+                //   LogicalProject(orderid#1, orderdate#2, 
substr(orderdate#1, 1, 10) as dt#3)
+                //
+                // and can not push down filter by 
`PushDownFilterThroughWindow`, causing inefficiency,
+                // because dt#3 in LogicalFilter not contains in the partition 
key in LogicalWindow: [orderdate#2].
+                //
+                // so we only push down orderdate in the LogicalFilter, not 
push down `dt`:
+                //
+                //      LogicalFilter(substr(orderdate#2, 1, 10) = 
'2025-01-01')
+                //                               |
+                //      LogicalWindow(rowNumber(partition by orderdate#2, 
order by orderid#1))
+                //                               |
+                //             LogicalProject(orderid#1, orderdate#2)
+                //
+                // and then, `PushDownFilterThroughWindow` found the 
LogicalFilter's `orderdate#2` contains
+                // in the LogicalWindow's partition key: [orderdate#2], and 
can push down filter to:
+                //
+                //   LogicalWindow(rowNumber(partition by orderdate#2, order 
by orderid#1))
+                //                               |
+                //             LogicalProject(orderid#1, orderdate#2)
+                //                              |
+                //     LogicalFilter(substr(orderdate#2, 1, 10) = '2025-01-01')
+                if (expression instanceof Alias) {
+                    return expression.getInputSlots().stream();
+                }
                 return ImmutableList.of(expression).stream();
             })
             .collect(ImmutableSet.toImmutableSet());
diff --git 
a/fe/fe-core/src/test/java/org/apache/doris/nereids/rules/analysis/FillUpMissingSlotsTest.java
 
b/fe/fe-core/src/test/java/org/apache/doris/nereids/rules/analysis/FillUpMissingSlotsTest.java
index 4fcbfbfde3a..b329eed2753 100644
--- 
a/fe/fe-core/src/test/java/org/apache/doris/nereids/rules/analysis/FillUpMissingSlotsTest.java
+++ 
b/fe/fe-core/src/test/java/org/apache/doris/nereids/rules/analysis/FillUpMissingSlotsTest.java
@@ -647,8 +647,7 @@ public class FillUpMissingSlotsTest extends 
AnalyzeCheckTestBase implements Memo
                     logicalProject(
                         logicalFilter(
                             logicalWindow(
-                                logicalProject(
-                                    logicalAggregate(logicalEmptyRelation())))
+                                    logicalAggregate(logicalEmptyRelation()))
                         ).when(filter -> 
filter.toString().contains("predicates=(rank() OVER(ORDER BY year asc null 
first)#5 > 1)"))
                     )
                 )
@@ -660,11 +659,9 @@ public class FillUpMissingSlotsTest extends 
AnalyzeCheckTestBase implements Memo
                     logicalProject(
                         logicalFilter(
                             logicalWindow(
-                                logicalProject(
-                                    logicalFilter(
-                                        
logicalAggregate(logicalEmptyRelation())
-                                    ).when(filter -> 
filter.toString().contains("predicates=(total#5 > 100)"))
-                                )
+                                logicalFilter(
+                                    logicalAggregate(logicalEmptyRelation())
+                                ).when(filter -> 
filter.toString().contains("predicates=(total#5 > 100)"))
                             )
                         ).when(filter -> 
filter.toString().contains("predicates=(row_number() OVER(ORDER BY year asc 
null first)#6 > 1)"))
                     )
@@ -692,7 +689,7 @@ public class FillUpMissingSlotsTest extends 
AnalyzeCheckTestBase implements Memo
                         logicalProject(
                             logicalFilter(
                                 logicalWindow(
-                                    
logicalProject(logicalAggregate(logicalEmptyRelation())))
+                                    logicalAggregate(logicalEmptyRelation()))
                             ).when(filter -> 
filter.toString().contains("predicates=(row_number() OVER(ORDER BY year asc 
null first)#5 > 1)"))
                         )
                     )
diff --git 
a/fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/PushDownFilterThroughWindowTest.java
 
b/fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/PushDownFilterThroughWindowTest.java
index 6edcd01a8ea..3c367550cfe 100644
--- 
a/fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/PushDownFilterThroughWindowTest.java
+++ 
b/fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/PushDownFilterThroughWindowTest.java
@@ -17,6 +17,7 @@
 
 package org.apache.doris.nereids.rules.rewrite;
 
+import org.apache.doris.nereids.rules.RuleType;
 import org.apache.doris.nereids.trees.expressions.Alias;
 import org.apache.doris.nereids.trees.expressions.EqualTo;
 import org.apache.doris.nereids.trees.expressions.Expression;
@@ -35,6 +36,7 @@ import org.apache.doris.nereids.util.MemoTestUtils;
 import org.apache.doris.nereids.util.PlanChecker;
 import org.apache.doris.nereids.util.PlanConstructor;
 import org.apache.doris.qe.ConnectContext;
+import org.apache.doris.utframe.TestWithFeService;
 
 import com.google.common.collect.ImmutableList;
 import com.google.common.collect.Lists;
@@ -42,7 +44,7 @@ import org.junit.jupiter.api.Test;
 
 import java.util.List;
 
-class PushDownFilterThroughWindowTest implements MemoPatternMatchSupported {
+class PushDownFilterThroughWindowTest extends TestWithFeService implements 
MemoPatternMatchSupported {
     private final LogicalOlapScan scan = new 
LogicalOlapScan(StatementScopeIdGenerator.newRelationId(),
             PlanConstructor.student,
             ImmutableList.of(""));
@@ -89,4 +91,64 @@ class PushDownFilterThroughWindowTest implements 
MemoPatternMatchSupported {
                         )
                 );
     }
+
+    @Test
+    public void testPushDownFilter() throws Exception {
+        String db = "test";
+        createDatabase(db);
+        useDatabase(db);
+        createTable("CREATE TABLE lineorders (\n"
+                + "orderdate varchar(100) NOT NULL,\n"
+                + "orderid int NOT NULL,\n"
+                + "country_id int NOT NULL,\n"
+                + "vender_id int NOT NULL,\n"
+                + "ordernum int NOT NULL,\n"
+                + "ordemoney int NOT NULL\n"
+                + ") ENGINE=OLAP\n"
+                + "DUPLICATE KEY(orderdate, orderid, country_id)\n"
+                + "COMMENT 'OLAP'\n"
+                + "PARTITION BY LIST(orderdate)\n"
+                + "(PARTITION p1992 VALUES IN (\"0-2020\"),\n"
+                + "PARTITION p1993 VALUES IN (\"0-2021\"),\n"
+                + "PARTITION p1994 VALUES IN (\"0-2022\"),\n"
+                + "PARTITION p1995 VALUES IN (\"0-2023\"),\n"
+                + "PARTITION p1996 VALUES IN (\"0-2024\"),\n"
+                + "PARTITION p1997 VALUES IN (\"0-2025\"))\n"
+                + "DISTRIBUTED BY HASH(orderid) BUCKETS 48\n"
+                + "PROPERTIES (\n"
+                + "\"replication_allocation\" = \"tag.location.default: 1\"\n"
+                + ")");
+
+        connectContext.getSessionVariable()
+                .setDisableNereidsRules(
+                        RuleType.OLAP_SCAN_PARTITION_PRUNE.name() + "," + 
RuleType.PRUNE_EMPTY_PARTITION.name());
+
+        PlanChecker.from(connectContext)
+                .analyze("select * from ( \n"
+                        + "  select \n"
+                        + "    orderid,\n"
+                        + "    orderdate,\n"
+                        + "    country_id,\n"
+                        + "    ordernum,\n"
+                        + "    ordemoney,\n"
+                        + "    SUBSTR(lineorders.orderdate,3,4) AS dt,\n"
+                        + "    ROW_NUMBER() OVER(PARTITION BY 
lineorders.orderid,lineorders.orderdate ORDER BY lineorders.country_id DESC) AS 
rn\n"
+                        + "  from lineorders\n"
+                        + ") a \n"
+                        + "where SUBSTR(a.dt, 1, 4) = SUBSTR(curdate(), 1, 4)")
+                .rewrite()
+                .matchesFromRoot(
+                        logicalResultSink(
+                                logicalProject(
+                                        logicalWindow(
+                                                logicalProject(
+                                                        logicalFilter(
+                                                                
logicalOlapScan()
+                                                        )
+                                                )
+                                        )
+                                )
+                        )
+                );
+    }
 }
diff --git a/regression-test/data/nereids_p0/cte/test_cte_filter_pushdown.out 
b/regression-test/data/nereids_p0/cte/test_cte_filter_pushdown.out
index 8d8c9733055..2cc65ed8374 100644
Binary files a/regression-test/data/nereids_p0/cte/test_cte_filter_pushdown.out 
and b/regression-test/data/nereids_p0/cte/test_cte_filter_pushdown.out differ
diff --git 
a/regression-test/data/nereids_rules_p0/cte/test_cte_filter_pushdown.out 
b/regression-test/data/nereids_rules_p0/cte/test_cte_filter_pushdown.out
index 8d8c9733055..2cc65ed8374 100644
Binary files 
a/regression-test/data/nereids_rules_p0/cte/test_cte_filter_pushdown.out and 
b/regression-test/data/nereids_rules_p0/cte/test_cte_filter_pushdown.out differ
diff --git 
a/regression-test/data/nereids_tpch_p0/tpch/push_filter_window_eqset.out 
b/regression-test/data/nereids_tpch_p0/tpch/push_filter_window_eqset.out
index 9d50006b8e1..bb38206fc48 100644
Binary files 
a/regression-test/data/nereids_tpch_p0/tpch/push_filter_window_eqset.out and 
b/regression-test/data/nereids_tpch_p0/tpch/push_filter_window_eqset.out differ
diff --git 
a/regression-test/data/shape_check/tpcds_sf100/noStatsRfPrune/query51.out 
b/regression-test/data/shape_check/tpcds_sf100/noStatsRfPrune/query51.out
index 470fabc0f31..cec684574ed 100644
Binary files 
a/regression-test/data/shape_check/tpcds_sf100/noStatsRfPrune/query51.out and 
b/regression-test/data/shape_check/tpcds_sf100/noStatsRfPrune/query51.out differ
diff --git 
a/regression-test/data/shape_check/tpcds_sf100/no_stats_shape/query51.out 
b/regression-test/data/shape_check/tpcds_sf100/no_stats_shape/query51.out
index 470fabc0f31..cec684574ed 100644
Binary files 
a/regression-test/data/shape_check/tpcds_sf100/no_stats_shape/query51.out and 
b/regression-test/data/shape_check/tpcds_sf100/no_stats_shape/query51.out differ
diff --git a/regression-test/data/shape_check/tpcds_sf100/rf_prune/query51.out 
b/regression-test/data/shape_check/tpcds_sf100/rf_prune/query51.out
index 470fabc0f31..cec684574ed 100644
Binary files 
a/regression-test/data/shape_check/tpcds_sf100/rf_prune/query51.out and 
b/regression-test/data/shape_check/tpcds_sf100/rf_prune/query51.out differ
diff --git a/regression-test/data/shape_check/tpcds_sf100/shape/query51.out 
b/regression-test/data/shape_check/tpcds_sf100/shape/query51.out
index 470fabc0f31..cec684574ed 100644
Binary files a/regression-test/data/shape_check/tpcds_sf100/shape/query51.out 
and b/regression-test/data/shape_check/tpcds_sf100/shape/query51.out differ
diff --git a/regression-test/data/shape_check/tpcds_sf1000/hint/query51.out 
b/regression-test/data/shape_check/tpcds_sf1000/hint/query51.out
index 631f4d82516..cf7ca83ca9c 100644
Binary files a/regression-test/data/shape_check/tpcds_sf1000/hint/query51.out 
and b/regression-test/data/shape_check/tpcds_sf1000/hint/query51.out differ
diff --git a/regression-test/data/shape_check/tpcds_sf1000/shape/query51.out 
b/regression-test/data/shape_check/tpcds_sf1000/shape/query51.out
index 6c22d2df308..98b43bfdfc1 100644
Binary files a/regression-test/data/shape_check/tpcds_sf1000/shape/query51.out 
and b/regression-test/data/shape_check/tpcds_sf1000/shape/query51.out differ
diff --git a/regression-test/data/shape_check/tpcds_sf10t_orc/shape/query51.out 
b/regression-test/data/shape_check/tpcds_sf10t_orc/shape/query51.out
index 5635b5dbabf..5d049b5a6bb 100644
Binary files 
a/regression-test/data/shape_check/tpcds_sf10t_orc/shape/query51.out and 
b/regression-test/data/shape_check/tpcds_sf10t_orc/shape/query51.out differ


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to