On Tue, Jan 27, 2026 at 1:35 PM Corey Huinker <[email protected]> wrote: >> >> > I'm not sure we can actually do that. The functions that compute the >> > statistics are all based off of row samples, not already computed >> > statistics. I don't think we can synthesize a rowsample from the imported >> > statistics, at least not accurately. If I'm misunderstanding what you're >> > suggesting, please correct me. >> >> I am comparing the calculation of statistics to the calculation of >> aggregates. We have code to compute aggregates on a partitioned table >> from the partial aggregates computed from the individual partitions. >> (Even though I am mentioning the partitioned table, the technique can >> be used for an inheritance hierarchy.) Similarly if we could come up >> >> with a representation of partial statistics, we could get partial >> statistics computed for the children (and the parent in >> non-partitioned inheritance). Use the partial statistics to compute >> the statistics for the parent without the need to synthesize row >> samples from the children. I haven't looked at all the kinds of >> statistics to see whether this is feasible. > > > We're limited to the existing data from pg_class and the security-barrier > view pg_stats. We also have pg_stats_ext and pg_stats_ext_exprs as well, but > those are for extended stats objects, which aren't useful to us in this > context. > > I've been a part of some research into the feasibility of caching the row > samples fetched, allowing the planner to generate on-the-fly statistics for > OLAP queries. If we ever got that functionality, we'd need a means of > exposing those row samples externally, and even then we'd have to wait for > the remote postgresql server to have that feature. So for now we are limited > to what pg_class and pg_stats tell us.
The way this is implemented, it will favour the usecases where foreign tables are not child tables. That leaves out the sharding use case which I believe is also a significant usecase. I think we need to think, how can we make that usecase benefit from this optimization. In case of sharding the foreign tables will be child tables pointing to very large tables on the remote side or at least the parent table they are part of will have very large data spread across multiple remotes. Not being able to use statistics available on the remote side seems a major limitation. But I don't have a better solution than to think of supporting some kind of partial statistics. -- Best Wishes, Ashutosh Bapat
