On Mon, Oct 26, 2020 at 9:44 PM Nikolay Samokhvalov <samokhva...@gmail.com>
wrote:

> On Mon, Oct 26, 2020 at 7:03 PM David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>> On Monday, October 26, 2020, Nikolay Samokhvalov <samokhva...@gmail.com>
>> wrote:
>>>
>>> Although, this triggers a question – should ANALYZE be automated in,
>>> say, pg_restore as well?
>>>
>>
>> Independent concern.
>>
>
> It's the same class of issues – after we created some objects, we lack
> statistics and willing to automate its collection. If the approach is
> automated in one case, it should be automated in the others, for
> consistency.
>

I don't see a need to force consistency between something that will affect,
at most, one table, and something that will affect an entire database or
cluster.  The other material difference is that the previous state of a
restore is "nothing" while in the create/reindex cases we are going from
live, populated, state to another.

I do observe that while the create/reindex analyze would run automatically
during the restore on object creation there would be no data present so it
would be close to a no-op in practice.


>
> And another question: how ANALYZE needs to be run? If it's under the
>>> user's control, there is an option to use vacuumdb --analyze and benefit
>>> from using -j to parallelize the work (and, in some cases, benefit from
>>> using --analyze-in-stages). If we had ANALYZE as a part of building indexes
>>> on expressions, should it be parallelized to the same extent as index
>>> creation (controlled by max_parallel_maintenance_workers)?
>>>
>>
>> None of that seems relevant here.  The only relevant parameter I see is
>> what to specify for “table_and_columns”.
>>
>
> I'm not sure I follow.
>

Describe how parallelism within the session that is auto-analyzing is
supposed to work.  vaccuumdb opens up multiple connections which shouldn't
happen here.

I suppose having the auto-analyze run three times with different targets
would work but I'm doubting that is a win.  I may just be underestimating
how long an analyze on an extremely large table with high statistics takes.

David J.

Reply via email to