>
>
> The way this is implemented, it will favour the usecases where foreign
> tables are not child tables.


It is true that this feature does not benefit the recursive
do_analyze_rel() case. But it does help when those same tables are analyzed
directly.


> That leaves out the sharding use case
> which I believe is also a significant usecase. I think we need to
> think, how can we make that usecase benefit from this optimization.


I agree that we should find a way to do that, but this handles the other
case, and doesn't prevent us from later teaching
postgresAnalyzeForeignTable() to use cache the rowsample locally for later
use, which postgresImportStatistics() could then consider the relative
benefits of using that local cached sample vs the already formed remote
statistics. Even in that case, I'm guessing that the remote table's stats
will be based on a larger and therefore better sample size then the sample
we are able to pull across the wire and cache locally, so the remotely
computed statistics would be better.

Not being able to use statistics available on the remote side seems a
> major limitation. But I don't have a better solution than to think of
> supporting some kind of partial statistics.


I'm not against trying to fetch and cache rowsamples, or cache some
partially aggregated results of a rowsample, but this patch does not cover
that. This patch should, at least in theory, reduce the number of table
samples pulled across the wire by 50% and that seems worthwhile.

Reply via email to