On Fri, Apr 3, 2026 at 2:14 AM Corey Huinker <[email protected]> wrote: >> >> The problem is not limited to this special case. Consider cases when >> 1) the remote table that has many rows are heavily updated after it >> got analyzed, and then 2) postgres_fdw imports its stats before it >> gets re-analyzed. The stats postgres_fdw imports would be stale, >> causing plan degradation. I don't think we should enable this feature >> by default until we guarantee stats freshness in some way. > > So it seems like we have the following configurations desired by at least > somebody: > > 0. Row Sampling Only > 1. Fetch stats and fall back to row sampling. > 2. Always analyze remote table (assuming it is a table that can hold stats), > then fetch stats, and fall back if necessary. > 3. Fetch stats, and if that turned up 0 attribute stats try an analyze, then > try to refetch and if it still fails go to row sampling. > > With the following interpretation of reltuples = 0: > > a. The table is definitively empty, stop. > b. The table is missing stats and running an analyze is cheap (assuming > remote analysis is even enabled) > c. if remote version >= 14 then a else b > > I'm of the opinion that 3c is the best configuration for most tables, and you > have advocated for 1a without an analyze option and 2a with one. Option 2 > seems a bit heavy handed to me, but I could see checking the remote > pg_stat_all_tables and making an analyze/no-analyze judgement call based on > that, perhaps call that analyze_stale_vacuum_interval or something like that. > That could be a neat feature for v20, and so whatever default we choose for > fetch_stats, I ask that we choose values that keep our options open for all > 4x3 configurations enumerated above.
Before discussing this, let's wrap up the work for v19. Here is a new version of the patch. Changes are: * As above, I think we should disable this feature by default for now, so I modified the patch as such. * You are defining a remote query per version at the top of the file that is used in fetch_attstats, but I think that that reduces readability, making maintenance hard, so I modified that function to build the query dynamically in it, as I proposed before. I also modified that function to use PQsendQuery, not PQsendQueryParams, as in fetch_relstats, for efficiency. * I moved the code for opening/closing the connection from postgresImportStatistics to fetch_remote_statistics, as the connection is only used in the latter function. I also removed useless cleanup processing in that function. * I added regression tests. * I fixed a minor bug I noticed while adding those tests. * I did some cleanup and editorialization including re-ordering functions in logical order. One thing I'm wondering is: we should rename ImportStatistics to ImportForeignStatistics, for consistency with ImportForeignSchema? Also, we should rename fetch_stats to import_stats, for consistency with eg, import_default? Anyway I think the patch is now in good shape, except comments/docs, which I will work on next. Best regards, Etsuro Fujita
v19-Add-remote-statistics-fetching-to-postgres_fdw.patch
Description: Binary data
