[PR] feat: support runtime filter push down to Table AM [cloudberry]

via GitHub Thu, 21 Aug 2025 21:45:32 -0700


gongxun0928 opened a new pull request, #1324:
URL: https://github.com/apache/cloudberry/pull/1324


   when seq scan begins, check whether the scanflags of table am is set to 
determine whether the runtime filter is pushed down.
   
   When the runtime filter is pushed down to pax am, pax am converts the 
min/max scankey in the runtime filter into PFTNode and performs min/max 
filtering.
   
   ```
   CREATE TABLE t1(c1 int, c2 int, c3 int, c4 int, c5 int) using pax 
with(minmax_columns='c1,c2');
   insert into t1 select i,i,i,i,i from generate_series(1,10000000) i;
   analyze t1;
   
   CREATE TABLE t2(c1 int, c2 int, c3 int, c4 int, c5 int) with 
(appendonly=true, orientation=column) distributed REPLICATED;
   INSERT INTO t2 VALUES (1,1,1,1,1), (2,2,2,2,2), (3,3,3,3,3), (4,4,4,4,4);
   INSERT INTO t2 select * FROM t2;
   INSERT INTO t2 select * FROM t2;
   INSERT INTO t2 select * FROM t2;
   
   set gp_enable_runtime_filter_pushdown to off;
   EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
   SELECT t1.c3 FROM t1, t2 WHERE t1.c2 = t2.c2;
                                       QUERY PLAN
   
----------------------------------------------------------------------------------
    Gather Motion 1:1  (slice1; segments: 1) (actual rows=32 loops=1)
      ->  Hash Join (actual rows=32 loops=1)
            Hash Cond: (t1.c2 = t2.c2)
            Extra Text: Hash chain length 8.0 avg, 8 max, using 4 of 524288 
buckets.
            ->  Seq Scan on t1 (actual rows=10000000 loops=1)
            ->  Hash (actual rows=32 loops=1)
                  Buckets: 524288  Batches: 1  Memory Usage: 4098kB
                  ->  Seq Scan on t2 (actual rows=32 loops=1)
    Optimizer: GPORCA
   (9 rows)
   
   Time: 1576.548 ms (00:01.577)
   
   gpadmin=# set gp_enable_runtime_filter_pushdown to on;
   SET
   Time: 0.362 ms
   gpadmin=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
   SELECT t1.c3 FROM t1, t2 WHERE t1.c2 = t2.c2;
                                       QUERY PLAN
   
----------------------------------------------------------------------------------
    Gather Motion 1:1  (slice1; segments: 1) (actual rows=32 loops=1)
      ->  Hash Join (actual rows=32 loops=1)
            Hash Cond: (t1.c2 = t2.c2)
            Extra Text: Hash chain length 8.0 avg, 8 max, using 4 of 524288 
buckets.
            ->  Seq Scan on t1 (actual rows=131072 loops=1)
            ->  Hash (actual rows=32 loops=1)
                  Buckets: 524288  Batches: 1  Memory Usage: 4098kB
                  ->  Seq Scan on t2 (actual rows=32 loops=1)
    Optimizer: GPORCA
   (9 rows)
   
   Time: 38.471 ms
   ```
   
   <!-- Thank you for your contribution to Apache Cloudberry (Incubating)! -->
   
   Fixes #ISSUE_Number
   
   ### What does this PR do?
   <!-- Brief overview of the changes, including any major features or fixes -->
   
   ### Type of Change
   - [ ] Bug fix (non-breaking change)
   - [ ] New feature (non-breaking change)
   - [ ] Breaking change (fix or feature with breaking changes)
   - [ ] Documentation update
   
   ### Breaking Changes
   <!-- Remove if not applicable. If yes, explain impact and migration path -->
   
   ### Test Plan
   <!-- How did you test these changes? -->
   - [ ] Unit tests added/updated
   - [ ] Integration tests added/updated
   - [ ] Passed `make installcheck`
   - [ ] Passed `make -C src/test installcheck-cbdb-parallel`
   
   ### Impact
   <!-- Remove sections that don't apply -->
   **Performance:**
   <!-- Any performance implications? -->
   
   **User-facing changes:**
   <!-- Any changes visible to users? -->
   
   **Dependencies:**
   <!-- New dependencies or version changes? -->
   
   ### Checklist
   - [ ] Followed [contribution 
guide](https://cloudberry.apache.org/contribute/code)
   - [ ] Added/updated documentation
   - [ ] Reviewed code for security implications
   - [ ] Requested review from [cloudberry 
committers](https://github.com/orgs/apache/teams/cloudberry-committers)
   
   ### Additional Context
   <!-- Any other information that would help reviewers? Remove if none -->
   
   ### CI Skip Instructions
   <!--
   To skip CI builds, add the appropriate CI skip identifier to your PR title.
   The identifier must:
   - Be in square brackets []
   - Include the word "ci" and either "skip" or "no"
   - Only use for documentation-only changes or when absolutely necessary
   -->
   
   ---
   <!-- Join our community:
   - Mailing list: 
[[email protected]](https://lists.apache.org/[email protected])
 (subscribe: [email protected])
   - Discussions: https://github.com/apache/cloudberry/discussions -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] feat: support runtime filter push down to Table AM [cloudberry]

Reply via email to