> > * I proposed the StatisticsAreImportable callback function, but the > ImportStatistics callback function is now allowed to fall back to > analyzing, so I don't think StatisticsAreImportable has much value > anymore. I'll withdraw my proposal. Attached is an updated patch > removing it, in which 0001 is merged into 0002 for ease of review > (0001 can't be tested without 0002). >
Thanks. > * I modified analyze.c to start progress monitor a bit earlier to > monitor the work of ImportStatistics (and AnalyzeForeignTable) as part > of initialization. > > * I noticed this odd behavior: > > create table t1 (id int, data text); > create foreign table ft1 (id int, data text) server loopback options > (table_name 't1'); > > alter foreign table ft1 options (add fetch_stats 'false'); > analyze ft1 (ctid); > ERROR: column "ctid" of relation "ft1" does not exist > > Looks good, but: > > alter foreign table ft1 options (set fetch_stats 'true'); > analyze ft1 (ctid); > NOTICE: remote table public.t1 has no statistics to import > HINT: Falling back to ANALYZE with regular row sampling. > ERROR: column "ctid" of relation "ft1" does not exist > > postgres_fdw is doing the remote access without any need to do so, > which isn't good. > > The cause is that the patches fail to validate the va_cols list given > to analyze_rel before importing stats. To fix, I modified analyze.c > to do so before calling ImportStatistics. I think this is also useful > when analyzing inheritance trees, as it avoids validating the list > repeatedly in that case. > I see this change and I like it. > * I separated a bit of code in examine_attribute that checks whether > the given column is analyzable into a new extern function so that FDWs > can use it. > Interesting. I wouldn't have dared make a change that affected regular analyze, but it makes sense. postgres_fdw side: > > * In fetch_remote_statistics, if we get reltuples=0 for v14 or later, > I think we should update only the relation stats with that info, and > avoid resorting to analyzing, for efficiency, as I proposed before. I > modified that function (and import_fetched_statistics) that way. > This will miss out on the case where the remote table did get analyzed once, when empty, but now isn't empty. I realize that shouldn't happen very often, but the cost of rowsampling a table that is empty is very low. > The root cause is that passing NULL to pg_restore_attribute_stats > leaves the stats unchanged. To fix, I modified > import_fetched_statistics to do pg_clear_attribute_stats before the > restore function. > +1 > * The matching logic in match_attrmap seems correct to me, but I think > it's still complicated than necessary: you are doing a merge join of > the RemoteAttributeMapping array and the attstats set by placing the > latter in the outer side, but it makes it simpler to do the join the > opposite way (ie, the former in the outer side), like the attached, > which doesn't need an inner for-loop anymore, as the inner side (ie, > the attstats set) consists of *distinct* columns. > The new match_attrmap seems reasonable. > * I moved table_has_extended_stats to extended_stats.c and renamed it > to match other functions in it so that other FDWs too can use it. > +1 > * The 0002 patch is using different levels (LOG, NOTICE, and WARNING) > for logging in postgresImportStatisitcs, but I think it's appropriate > to use elevel or WARNING for it. So I modified the patch to use > WARNING. I modified the wording as well to match analyze.c. > +1 > * I added at the top/end of postgresImportStatisitcs log messages that > show the start/end of importing stats. > +1 > That's it. > I see that remote_analyze didn't make it as a part of this patch. Is that something you'd repackaged as a follow-on patch, or are you just done with it?
