On Fri, Apr 3, 2026 at 2:14 AM Corey Huinker <[email protected]> wrote:
>>
>> The problem is not limited to this special case.  Consider cases when
>> 1) the remote table that has many rows are heavily updated after it
>> got analyzed, and then 2) postgres_fdw imports its stats before it
>> gets re-analyzed.  The stats postgres_fdw imports would be stale,
>> causing plan degradation.  I don't think we should enable this feature
>> by default until we guarantee stats freshness in some way.
>
> So it seems like we have the following configurations desired by at least 
> somebody:
>
> 0. Row Sampling Only
> 1. Fetch stats and fall back to row sampling.
> 2. Always analyze remote table (assuming it is a table that can hold stats), 
> then fetch stats, and fall back if necessary.
> 3. Fetch stats, and if that turned up 0 attribute stats try an analyze, then 
> try to refetch and if it still fails go to row sampling.
>
> With the following interpretation of reltuples = 0:
>
> a. The table is definitively empty, stop.
> b. The table is missing stats and running an analyze is cheap (assuming 
> remote analysis is even enabled)
> c. if remote version >= 14 then a else b
>
> I'm of the opinion that 3c is the best configuration for most tables, and you 
> have advocated for 1a without an analyze option and 2a with one. Option 2 
> seems a bit heavy handed to me, but I could see checking the remote 
> pg_stat_all_tables and making an analyze/no-analyze judgement call based on 
> that, perhaps call that analyze_stale_vacuum_interval or something like that. 
> That could be a neat feature for v20, and so whatever default we choose for 
> fetch_stats, I ask that we choose values that keep our options open for all 
> 4x3 configurations enumerated above.

Before discussing this, let's wrap up the work for v19.  Here is a new
version of the patch.  Changes are:

* As above, I think we should disable this feature by default for now,
so I modified the patch as such.

* You are defining a remote query per version at the top of the file
that is used in fetch_attstats, but I think that that reduces
readability, making maintenance hard, so I modified that function to
build the query dynamically in it, as I proposed before.  I also
modified that function to use PQsendQuery, not PQsendQueryParams, as
in fetch_relstats, for efficiency.

* I moved the code for opening/closing the connection from
postgresImportStatistics to fetch_remote_statistics, as the connection
is only used in the latter function.  I also removed useless cleanup
processing in that function.

* I added regression tests.

* I fixed a minor bug I noticed while adding those tests.

* I did some cleanup and editorialization including re-ordering
functions in logical order.

One thing I'm wondering is: we should rename ImportStatistics to
ImportForeignStatistics, for consistency with ImportForeignSchema?
Also, we should rename fetch_stats to import_stats, for consistency
with eg, import_default?

Anyway I think the patch is now in good shape, except comments/docs,
which I will work on next.

Best regards,
Etsuro Fujita

Attachment: v19-Add-remote-statistics-fetching-to-postgres_fdw.patch
Description: Binary data

Reply via email to