avamingli opened a new issue, #1301:
URL: https://github.com/apache/cloudberry/issues/1301
### Apache Cloudberry version
main
### What happened
While debug TPC-DS with partitioned tables, I found that when executing
queries on partitioned tables, the planner sometimes generates redundant Motion
nodes even when distribution keys are logically compatible.
The issue leads to suboptimal query plans with avoidable data redistribution
overhead.
### What you think should happen instead
_No response_
### How to reproduce
```sql
CREATE TABLE t1 (id varchar(32), date date) DISTRIBUTED BY (id)
PARTITION BY RANGE (date)
(START (date '2016-01-01') INCLUSIVE END (date '2016-01-04') EXCLUSIVE EVERY
(INTERVAL '1 day'));
CREATE TABLE t2 (id varchar(32)) DISTRIBUTED BY (id);
analyze t1;
analyze t2;
\d+ t1;
Partitioned table "public.t1"
Column | Type | Collation | Nullable | Default | Storage
| Stats target | Description
--------+-----------------------+-----------+----------+---------+----------+--------------+-------------
id | character varying(32) | | | | extended
| |
date | date | | | | plain
| |
Partition key: RANGE (date)
Partitions: t1_1_prt_1 FOR VALUES FROM ('01-01-2016') TO ('01-02-2016'),
t1_1_prt_2 FOR VALUES FROM ('01-02-2016') TO ('01-03-2016'),
t1_1_prt_3 FOR VALUES FROM ('01-03-2016') TO ('01-04-2016')
Distributed by: (id)
\d+ t2;
Table "public.t2"
Column | Type | Collation | Nullable | Default | Storage
| Stats target | Description
--------+-----------------------+-----------+----------+---------+----------+--------------+-------------
id | character varying(32) | | | | extended
| |
Distributed by: (id)
EXPLAIN(COSTS OFF) SELECT COUNT(*) FROM t1 JOIN t2 USING(id);
QUERY PLAN
----------------------------------------------------------------
Aggregate
-> Gather Motion 3:1 (slice1; segments: 3)
-> Hash Join
Hash Cond: ((t1.id)::text = (t2.id)::text)
-> Dynamic Seq Scan on t1
Number of partitions to scan: 3 (out of 3)
-> Hash
-> Seq Scan on t2
Optimizer: GPORCA
(9 rows)
set optimizer=off;
SET
EXPLAIN(COSTS OFF) SELECT COUNT(*) FROM t1 JOIN t2 USING(id);
QUERY PLAN
------------------------------------------------------------------
Aggregate
-> Gather Motion 3:1 (slice1; segments: 3)
-> Hash Join
Hash Cond: ((t1.id)::text = (t2.id)::text)
-> Redistribute Motion 3:3 (slice2; segments: 3)
Hash Key: t1.id
-> Append
-> Seq Scan on t1_1_prt_1 t1_1
-> Seq Scan on t1_1_prt_2 t1_2
-> Seq Scan on t1_1_prt_3 t1_3
-> Hash
-> Seq Scan on t2
Optimizer: Postgres query optimizer
(13 rows)
```
### Operating System
all
### Anything else
ORCA works well, it's planner's bug.
### Are you willing to submit PR?
- [x] Yes, I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/cloudberry/blob/main/CODE_OF_CONDUCT.md).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]