On Wed Jan 7, 2026 at 3:04 AM -03, Corey Huinker wrote:
>>
>> > I agree, if there is no fallback, then the default should be false. When
>> I was initially brainstorming this patch, Nathan Bossart had suggested
>> making it the default because 1) that would be an automatic benefit to
>> users and 2) the cost for attempting to import stats was small in
>> comparison to a table stample, so it was worth the attempt. I still want
>> users to get that automatic benefit, but if there is no fallback to
>> sampling then the default only makes sense as false.
>>
>> I think that the FDW API that I proposed could actually allow us to
>> fall back to sampling, by modifying StatisticsAreImportable so that it
>> also checks if 1) there are statistics on the remote server and 2) the
>> data is fresh enough, and if so, returns true; otherwise, returns
>> false; in the latter case we could fall back to sampling.  And if we
>> modified it as such, I think we could change the default to true.
>> (Checking #2 is necessary to avoid importing stale data, which would
>> degrade plan quality.)
>
>
> I've given this some more thought.
>
> First, we'd have to add the va_cols param to StatisticsAreImportable, which
> isn't itself terrible.
>
> Then, we'd have to determine that there are stats available for every
> mapped column (filtered by va_cols, if any). But the only way to do that is
> to query the pg_stats view on the remote end, and if we have done that,
> then we've already fetched the stats. Yes, we could avoid selecting the
> actual statistical values, and that would save some network bandwidth at
> the cost of having to do the query again with stats. So I don't really see
> the savings.
>
> Also, the pg_stats view is our security-barrier black box into statistics,
> and it gives no insight into how recently those stats were acquired. We
> could poke pg_stat_all_tables, assuming we can even query it, and then make
> a value judgement on the value of (CURRENT_TIMESTAMP -
> GREATEST(last_analyze, last_autoanalyze), but that value is highly
> subjective.
>
> I suppose we could move all of the statistics fetching
> into StatisticsAreImportable, And carry those values forward if they are
> satisfactory. That would leave ImportStatistics() with little to do other
> than form up the calls to pg_restore_*_stats(), but those could still fail,
> and at that point we'd have no way to fall back to sampling and analysis.
>
> I really want to make sampling fallback possible.
>
> Anyway, here's v8, incorporating the documentation feedback and Matheus's
> notes.

Thanks for the new version.

+static bool
+postgresStatisticsAreImportable(Relation relation)
+...
+
+       /*
+        * Server-level options can be overridden by table-level options, so 
check
+        * server-level first.
+        */
+       foreach(lc, server->options)
+       {
+               DefElem    *def = (DefElem *) lfirst(lc);
+
+               if (strcmp(def->defname, "fetch_stats") == 0)
+               {
+                       fetch_stats = defGetBoolean(def);
+                       break;
+               }
+       }
+
+       foreach(lc, table->options)
+       {
+               DefElem    *def = (DefElem *) lfirst(lc);
+
+               if (strcmp(def->defname, "fetch_stats") == 0)
+               {
+                       fetch_stats = defGetBoolean(def);
+                       break;
+               }
+       }
+
I don't think that it's good to make StatisticsAreImportable() routine
check if fetch_stats is enabled on foreign server/table options because
if so, every fdw implementation would need this same block of code and
also fdw implementations may forget or bypass these options which I
don't think that it would be a desired behavior. What about move this
check to analyze_rel()? Perhaps create a function that just check if the
fetch_stats is enabled.

If the above statement make sense, it seems to me that
StatisticsAreImportable() may not be needed at all. 

I think that we could make analyze_rel() check if fetch_stats is enable
on the foreign server/table and then call ImportStatistics() which could
return true or false. If it returns true it means that the statistics
was imported successfully, otherwise if it returns false we could
fallback to table sampling as we already do today. ImportStatistics
could return false if the foreign server don't have statistics for the
requested table, even  after running ANALYZE if remote_analyze is true.

Is that make sense? Any thoughts?

--
Matheus Alcantara
EDB: https://www.enterprisedb.com



Reply via email to