On Tue, Jan 27, 2026 at 11:42 PM Matheus Alcantara
<[email protected]> wrote:
>
> Although this make sense to me I see difference in row estimation using
> your v2 patch for the following example:
>
> CREATE COLLATION case_insensitive (provider = icu, locale =
> '@colStrength=secondary', deterministic = false);
> CREATE DOMAIN d_txt1 AS text collate case_insensitive;
> CREATE TABLE t1 (a int, b int, c text) PARTITION BY LIST(c);
> CREATE TABLE t1_p1 PARTITION OF t1 FOR VALUES IN ('0000', '0003', '0004',
> '0010');
> CREATE TABLE t1_p2 PARTITION OF t1 FOR VALUES IN ('0001', '0005', '0002',
> '0009');
> CREATE TABLE t1_p3 PARTITION OF t1 FOR VALUES IN ('0006', '0007', '0008',
> '0011');
> INSERT INTO t1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0,
> 599, 2) i;
> ANALYZE t1;
>
> CREATE TABLE t2 (a int, b int, c d_txt1) PARTITION BY LIST(c);
> CREATE TABLE t2_p1 PARTITION OF t2 FOR VALUES IN ('0000', '0003', '0004',
> '0010');
> CREATE TABLE t2_p2 PARTITION OF t2 FOR VALUES IN ('0001', '0005', '0002',
> '0009');
> CREATE TABLE t2_p3 PARTITION OF t2 FOR VALUES IN ('0006', '0007', '0008',
> '0011');
> INSERT INTO t2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0,
> 599, 3) i;
> ANALYZE t2;
>
> SET enable_partitionwise_join TO true;
>
> V1 patch:
>
> postgres=# select * from check_estimated_rows($$ SELECT FROM t1, t2 WHERE
> t1.c = t2.c GROUP BY t1.c, t2.c $$);
> estimated | actual
> -----------+--------
> 12 | 12
>
> V2 patch:
>
> postgres=# select * from check_estimated_rows($$ SELECT FROM t1, t2 WHERE
> t1.c = t2.c GROUP BY t1.c, t2.c $$);
> estimated | actual
> -----------+--------
> 144 | 12
>
> I've also tried to make the partitions of t1 and t2 as foreign tables
> and I got the same row estimation difference.
>
> I'm just wondering if we are missing something?
>
Hi.
in function process_equivalence, we have:
/*
* Ensure both input expressions expose the desired collation (their types
* should be OK already); see comments for canonicalize_ec_expression.
*/
item1 = canonicalize_ec_expression(item1,
exprType((Node *) item1),
collation);
item2 = canonicalize_ec_expression(item2,
exprType((Node *) item2),
collation);
Let's simplify the test case.
CREATE COLLATION case_insensitive (provider = icu, locale =
'@colStrength=secondary', deterministic = false);
CREATE DOMAIN d_txt1 AS text collate case_insensitive;
CREATE TABLE t3 (a int, b int, c text);
INSERT INTO t3 SELECT i % 12, i, to_char(i/50, 'FM0000') FROM
generate_series(0, 599, 2) i;
ANALYZE t3;
CREATE TABLE t4 (a int, b int, c d_txt1);
INSERT INTO t4 SELECT i % 10, i, to_char(i/50, 'FM0000') FROM
generate_series(0, 599, 3) i;
ANALYZE t4;
EXPLAIN SELECT FROM t3, t4 WHERE t3.c = t4.c GROUP BY t3.c, t4.c;
The ``WHERE t3.c = t4.c `` after the function process_equivalence, it will
produce 2 RELABELTYPE node in EquivalenceClass->ec_members->em_expr and your v1
uncondition strip these two RELABELTYPE, exprs_known_equal will retrun true,
therefore for numdistinct it will think "GROUP BY t3.c, t4.c" is the same as
""GROUP BY t3.c".
However if we not unconditionly strip RELABELTYPE, exprs_known_equal will return
false, therefore "GROUP BY t3.c, t4.c", "t3.c", "t4.c" is not identical, so it
multiply these two distinct numbers. IMHO, That's the reason for
estimate number 144 versus 12.
Please also see the comments in canonicalize_ec_expression.
Actually, I think the comments in canonicalize_ec_expression discourage strip
RelabelType nodes when RelabelType->resultcollid differs from the collation of
RelabelType->arg.
--
jian
https://www.enterprisedb.com/