> > > The way this is implemented, it will favour the usecases where foreign > tables are not child tables.
It is true that this feature does not benefit the recursive do_analyze_rel() case. But it does help when those same tables are analyzed directly. > That leaves out the sharding use case > which I believe is also a significant usecase. I think we need to > think, how can we make that usecase benefit from this optimization. I agree that we should find a way to do that, but this handles the other case, and doesn't prevent us from later teaching postgresAnalyzeForeignTable() to use cache the rowsample locally for later use, which postgresImportStatistics() could then consider the relative benefits of using that local cached sample vs the already formed remote statistics. Even in that case, I'm guessing that the remote table's stats will be based on a larger and therefore better sample size then the sample we are able to pull across the wire and cache locally, so the remotely computed statistics would be better. Not being able to use statistics available on the remote side seems a > major limitation. But I don't have a better solution than to think of > supporting some kind of partial statistics. I'm not against trying to fetch and cache rowsamples, or cache some partially aggregated results of a rowsample, but this patch does not cover that. This patch should, at least in theory, reduce the number of table samples pulled across the wire by 50% and that seems worthwhile.
