Re: [HACKERS] Block level parallel vacuum

Masahiko Sawada Tue, 03 Dec 2019 12:29:30 -0800

On Tue, 3 Dec 2019 at 11:55, Amit Kapila <amit.kapil...@gmail.com> wrote:
>
> On Tue, Dec 3, 2019 at 12:56 AM Masahiko Sawada 
> <masahiko.saw...@2ndquadrant.com> wrote:
>>
>> On Sun, 1 Dec 2019 at 18:31, Sergei Kornilov <s...@zsrv.org> wrote:
>> >
>> > Hi
>> >
>> > > I think I got your point. Your proposal is that it's more efficient if
>> > > we make the leader process vacuum the index that can be processed only
>> > > the leader process (i.e. indexes not supporting parallel index vacuum)
>> > > while workers are processing indexes supporting parallel index vacuum,
>> > > right? That way, we can process indexes in parallel as much as
>> > > possible.
>> >
>> > Right
>> >
>> > > So maybe we can call vacuum_or_cleanup_skipped_indexes first
>> > > and then call vacuum_or_cleanup_indexes_worker. But I'm not sure that
>> > > there are parallel-safe remaining indexes after the leader finished
>> > > vacuum_or_cleanup_indexes_worker, as described on your proposal.
>> >
>> > I meant that after processing missing indexes (not supporting parallel 
>> > index vacuum), the leader can start processing indexes that support the 
>> > parallel index vacuum, along with parallel workers.
>> > Exactly call vacuum_or_cleanup_skipped_indexes after start parallel 
>> > workers but before vacuum_or_cleanup_indexes_worker or something with 
>> > similar effect.
>> > If we have 0 missed indexes - parallel vacuum will run as in current 
>> > implementation, with leader participation.
>>
>> I think your idea might not work well in some cases.
>
>
> Good point.  I am also not sure whether it is a good idea to make the 
> suggested change, but I think adding a comment on those lines is not a bad 
> idea which I have done in the attached patch.


Thank you for updating the patch!

>
> I have made some other changes as well.
> 1.
> + if (VacuumSharedCostBalance != NULL)
>   {
> - double msec;
> + int nworkers = pg_atomic_read_u32
> (VacuumActiveNWorkers);
> +
> + /* At least count itself */
> + Assert(nworkers >= 1);
> +
> + /* Update the shared cost
> balance value atomically */
> + while (true)
> + {
> + uint32 shared_balance;
> + uint32 new_balance;
> +
> uint32 local_balance;
> +
> + msec = 0;
> +
> + /* compute new balance by adding the local value */
> +
> shared_balance = pg_atomic_read_u32(VacuumSharedCostBalance);
> + new_balance = shared_balance + VacuumCostBalance;
> +
> /* also compute the total local balance */
> + local_balance = VacuumCostBalanceLocal + VacuumCostBalance;
> +
> +
> if ((new_balance >= VacuumCostLimit) &&
> + (local_balance > 0.5 * (VacuumCostLimit / nworkers)))
> + {
> +
> /* compute sleep time based on the local cost balance */
> + msec = VacuumCostDelay *
> VacuumCostBalanceLocal / VacuumCostLimit;
> + new_balance = shared_balance - VacuumCostBalanceLocal;
> +
> VacuumCostBalanceLocal = 0;
> + }
> +
> + if (pg_atomic_compare_exchange_u32(VacuumSharedCostBalance,
> +
>   &shared_balance,
> +
>   new_balance))
> + {
> + /* Updated successfully, break */
> +
> break;
> + }
> + }
> +
> + VacuumCostBalanceLocal += VacuumCostBalance;
>
> I see multiple problems with this code. (a) if the VacuumSharedCostBalance is 
> changed by the time of compare and exchange, then the next iteration might 
> not compute the correct values as you might have reset VacuumCostBalanceLocal 
> by that time. (b) In code line, new_balance = shared_balance - 
> VacuumCostBalanceLocal, you need to use new_balance instead of 
> shared_balance, otherwise, it won't account for the balance of the latest 
> cycle.  (c) In code line, msec = VacuumCostDelay * VacuumCostBalanceLocal / 
> VacuumCostLimit;, I think you need to use local_balance for reasons similar 
> to (b). (d) I think we can write this code with a lesser number of variables.

In your code, I think if two workers enter to compute_parallel_delay
function at the same time, they add their local balance to
VacuumSharedCostBalance and both workers sleep because both values
reach the VacuumCostLimit. But either one worker should not sleep in
this case.

>
> I have fixed all these problems and used a slightly different way to compute 
> the parallel delay.  See compute_parallel_delay() in the attached delta patch.
>
> 2.
> + /* Setup the shared cost-based vacuum delay and launch workers*/
> + if (nworkers > 0)
> + {
> + /*
> + * Reset the local value so that we compute cost balance during
> + * parallel index vacuuming.
> + */
> + VacuumCostBalance = 0;
> + VacuumCostBalanceLocal = 0;
> +
> + LaunchParallelWorkers(lps->pcxt, nworkers);
> +
> + /* Enable shared costing iff we process indexes in parallel. */
> + if (lps->pcxt->nworkers_launched > 0)
> + {
> + /* Enable shared cost balance */
> + VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
> + VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
> +
> + /*
> + * Set up shared cost balance and the number of active workers for
> + * vacuum delay.
> + */
> + pg_atomic_write_u32(VacuumSharedCostBalance, VacuumCostBalance);
> + pg_atomic_write_u32(VacuumActiveNWorkers, 0);
>
> This code has issues.  We can't initialize 
> VacuumSharedCostBalance/VacuumActiveNWorkers after launching workers as by 
> that time some other worker would have changed its value.  This has been 
> reported offlist by Mahendra and I have fixed it.
>
> 3. Changed the name of functions which were too long and I think new names 
> are more meaningful.  If you don't agree with these changes, then we can 
> discuss it.
>
> 4. Changed the order of parameters in many functions to match with existing 
> code.
>
> 5. Refactored the code at a few places so that it can be easy to follow.
>
> 6. Added/Edited many comments and other cosmetic changes.
>
> You can find all these changes in v35-0003-Code-review-amit.patch.

I've confirmed these changes and these look good to me.

> Few other things, I would like you to consider.
> 1.  I think disable_parallel_leader_participation related code can be 
> extracted into a separate patch as it is mainly a debug/test aid.  You can 
> also fix the problem reported by Mahendra in that context.

Agreed. I'll create a patch for disable_parallel_leader_participation.

> 2. I think if we cam somehow disallow very small indexes to use parallel 
> workers, then it will be better.   Can we use  min_parallel_index_scan_size 
> to decide whether a particular index can participate in a parallel vacuum?

I think it's a good idea but I'm concerned that the default value of
min_parallel_index_scan_size, 512kB, is too small for parallel vacuum
purpose. Given that people who want to use parallel vacuum are likely
to have a big table the indexes that can be skipped by the default
value would be only brin indexes, I think. Also I guess that the
reason why the default value is small is that
min_parallel_index_scan_size compares to the number of blocks being
scanned during index scan, not whole index. On the other hand in
parallel vacuum we will compare it to the whole index blocks because
the index vacuuming is always full scan. So I'm also concerned that
user will get confused about reasonable setting.

As another idea how about using min_parallel_table_scan_size instead?
That is, we cannot do parallel vacuum on the table smaller than that
value. I think this idea had already been proposed once in this thread
but now I think it's also a good idea.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Block level parallel vacuum

Reply via email to