>
> The problem is not limited to this special case.  Consider cases when
> 1) the remote table that has many rows are heavily updated after it
> got analyzed, and then 2) postgres_fdw imports its stats before it
> gets re-analyzed.  The stats postgres_fdw imports would be stale,
> causing plan degradation.  I don't think we should enable this feature
> by default until we guarantee stats freshness in some way.


So it seems like we have the following configurations desired by at least
somebody:

0. Row Sampling Only
1. Fetch stats and fall back to row sampling.
2. Always analyze remote table (assuming it is a table that can hold
stats), then fetch stats, and fall back if necessary.
3. Fetch stats, and if that turned up 0 attribute stats try an analyze,
then try to refetch and if it still fails go to row sampling.

With the following interpretation of reltuples = 0:

a. The table is definitively empty, stop.
b. The table is missing stats and running an analyze is cheap (assuming
remote analysis is even enabled)
c. if remote version >= 14 then a else b

I'm of the opinion that 3c is the best configuration for most tables, and
you have advocated for 1a without an analyze option and 2a with one. Option
2 seems a bit heavy handed to me, but I could see checking the remote
pg_stat_all_tables and making an analyze/no-analyze judgement call based on
that, perhaps call that analyze_stale_vacuum_interval or something like
that. That could be a neat feature for v20, and so whatever default we
choose for fetch_stats, I ask that we choose values that keep our options
open for all 4x3 configurations enumerated above.

Reply via email to