On Jan 5, 2026, at 23:04, Tom Lane <[email protected]> wrote: > Yeah, there might be some gold to be mined there. IIRC we already > have some ability for LIMIT to tell an immediately-below Sort that > only a limited number of rows will be retrieved. Could be useful > if ForeignScan could be in on that.
I’m surprised the limit part of the plan isn’t there already. Is that something
that needs to be tweaked in the paths passed to the FDW?
> The first attached results are from current HEAD; the second are
> after applying your patch. What I find interesting is that
> with use_remote_estimate enabled, it always goes for shipping the
> MIN as-is. Your patch changes the behavior with use_remote_estimate
> disabled, and we can see that the reason is that that mode estimates
> the query cost at 100.24 with the minmax optimization and 146.78 when
> shipping the aggregate. I've not dug into where we are getting
> those numbers without use_remote_estimate, but perhaps that could
> use refinement.
Odd. They should be the same, no?
> On the other hand, there is not that much difference
> between the actual runtimes, so perhaps it doesn't matter for
> postgres_fdw.
Maybe not with a loopback connection, but some of these queries send all 10000
values over the network, no? That could be a lot of additional latency in some
situations. ISTM that the code should either:
1. Push down the `LIMIT 1`; or
2. Push down the min/max and let the optimization happen remotely
> I can't speak to your results with ClickHouse, since I have no idea
> how the cost estimates are derived for that.
ClickHouse is column-oriented, and all the columns are sorted, so min/max
optimizations are completely different and never require an index scan or table
scan: they just read the first or last value in the relevant column. This is
why I started looking into this in the first place.
But your finding of the difference between use_remote_estimate true and false
helped me to figure out the right lever to pull. pg_clickhouse has some
remnants of use_remote_estimate from forking postgres_fdw back in 2019; they
don’t do anything, but it led me to a function that sets more-or-less
hard-coded estimates. In `estimate_path_cost_size()` I changed:
p_total_cost = 5.0 + coef
To
p_total_cost = 0.1 + coef
And that makes the min and max push down. Of course this impacts *all*
pg_clickhouse queries, but that’s mainly what we want for analytics-oriented
workloads.
Anyway, some comments on plans generated by your testing:
> explain (analyze, verbose) select min(c1) from "S 1"."T 3";
> explain (analyze, verbose) select min(c1) from "S 1"."T 3";
> explain (analyze, verbose) select min(c1) from "S 1"."T 3";
Index-only scan, 1 row (though estimates 10000 for some reason).
> explain (analyze, verbose) select min(c2) from "S 1"."T 3";
> explain (analyze, verbose) select min(c2) from "S 1"."T 3";
> explain (analyze, verbose) select min(c2) from "S 1"."T 3";
Table scan, 10000 rows, as expected.
> ALTER FOREIGN TABLE ft4 OPTIONS (use_remote_estimate 'false');
>
> explain (analyze, verbose) select min(c1) from ft4;
> explain (analyze, verbose) select min(c1) from ft4;
> explain (analyze, verbose) select min(c1) from ft4;
The Foreign Scan here confuses me:
-> Foreign Scan on public.ft4 (cost=100.00..761.35 rows=2925 width=4) (actual
time=0.144..0.144 rows=1.00 loops=1)
Output: ft4.c1
Remote SQL: SELECT c1 FROM "S 1"."T 3" ORDER BY c1 ASC NULLS LAST
How does it result in only one row? Does that mean it gets all of them back
from the seq scan but only reads the first? Are they still delivered over the
network and buffered? Seems like it would be good to have `LIMIT 1` pushed down.
> explain (analyze, verbose) select min(c2) from ft4;
> explain (analyze, verbose) select min(c2) from ft4;
> explain (analyze, verbose) select min(c2) from ft4;
Same confusion about the foreign scan plan:
-> Foreign Scan on public.ft4 (cost=100.00..761.35 rows=2925 width=4) (actual
time=0.831..0.831 rows=1.00 loops=1)
Output: ft4.c2
Remote SQL: SELECT c2 FROM "S 1"."T 3" ORDER BY c2 ASC NULLS LAST
But also its execution time is 0.900 ms, which implies a remote full table
scan. This makes sense, and `LIMIT 1` would reduce the network overhead, but
not the remote execution time.
> ALTER FOREIGN TABLE ft4 OPTIONS (SET use_remote_estimate 'true');
>
> explain (analyze, verbose) select min(c1) from ft4;
> explain (analyze, verbose) select min(c1) from ft4;
> explain (analyze, verbose) select min(c1) from ft4;
> explain (analyze, verbose) select min(c2) from ft4;
> explain (analyze, verbose) select min(c2) from ft4;
> explain (analyze, verbose) select min(c2) from ft4;
Interesting that this configuration pushes down min/max, but of course then the
same optimizations happen remotely as locally, but now we know only one row is
returned, right?
My patch doesn’t change this much, because the remote optimization still takes
place, it’s just that in that case it never returns more than a single row.
Setting `LIMIT` pushdown aside, I guess the question is, how ought an FDW
author to properly tweak cost estimates to encourage or discourage the min/max
optimization?
Best,
David
signature.asc
Description: Message signed with OpenPGP
