> > The problem is not limited to this special case. Consider cases when > 1) the remote table that has many rows are heavily updated after it > got analyzed, and then 2) postgres_fdw imports its stats before it > gets re-analyzed. The stats postgres_fdw imports would be stale, > causing plan degradation. I don't think we should enable this feature > by default until we guarantee stats freshness in some way.
So it seems like we have the following configurations desired by at least somebody: 0. Row Sampling Only 1. Fetch stats and fall back to row sampling. 2. Always analyze remote table (assuming it is a table that can hold stats), then fetch stats, and fall back if necessary. 3. Fetch stats, and if that turned up 0 attribute stats try an analyze, then try to refetch and if it still fails go to row sampling. With the following interpretation of reltuples = 0: a. The table is definitively empty, stop. b. The table is missing stats and running an analyze is cheap (assuming remote analysis is even enabled) c. if remote version >= 14 then a else b I'm of the opinion that 3c is the best configuration for most tables, and you have advocated for 1a without an analyze option and 2a with one. Option 2 seems a bit heavy handed to me, but I could see checking the remote pg_stat_all_tables and making an analyze/no-analyze judgement call based on that, perhaps call that analyze_stale_vacuum_interval or something like that. That could be a neat feature for v20, and so whatever default we choose for fetch_stats, I ask that we choose values that keep our options open for all 4x3 configurations enumerated above.
