Hello,
On 2017-10-19 19:46, Andres Freund wrote:
On 2017-10-19 14:36:56 +0300, Sokolov Yura wrote:
> > + init_local_spin_delay(&delayStatus);
>
> The way you moved this around has the disadvantage that we now do this -
> a number of writes - even in the very common case where the lwlock can
> be acquired directly.
Excuse me, I don't understand fine.
Do you complain against init_local_spin_delay placed here?
Yes.
I could place it before perform_spin_delay under `if (!spin_inited)` if
you
think it is absolutely must.
Placing it in other place will complicate code.
Or you complain against setting `mask` and `add`?
That seems right.
In both cases, I think simpler version should be accepted first. It
acts
as algorithm definition. And it already gives measurable improvement.
Well, in scalability. I'm less sure about uncontended performance.
> > + * We intentionally do not call finish_spin_delay here, because
> > the loop
> > + * above usually finished by queuing into the wait list on
> > contention, and
> > + * doesn't reach spins_per_delay thereby doesn't sleep inside of
> > + * perform_spin_delay. Also, different LWLocks has very different
> > + * contention pattern, and it is wrong to update spin-lock
> > statistic based
> > + * on LWLock contention.
> > + */
>
> Huh? This seems entirely unconvincing. Without adjusting this here we'll
> just spin the same way every iteration. Especially for the case where
> somebody else holds LW_FLAG_LOCKED that's quite bad.
LWLock's are very different. Some of them are always short-term
(BufferLock), others are always locked for a long time.
That seems not particularly relevant. The same is true for
spinlocks. The relevant question isn't how long the lwlock is held,
it's
how long LW_FLAG_LOCKED is held - and that should only depend on
contention (i.e. bus speed, amount of times put into sleep while
holding
lock, etc), not on how long the lock is held.
I've tried to place this delay into lock itself (it has 2 free bytes),
but this attempt performed worse.
That seems unsurprising - there's a *lot* of locks, and we'd have to
tune all of them. Additionally there's a bunch of platforms where we do
*not* have free bytes (consider the embedding in BufferTag).
Now I understand, that delays should be stored in array indexed by
tranche. But I have no time to test this idea. And I doubt it will
give
cardinally better results (ie > 5%), so I think it is better to accept
patch in this way, and then experiment with per-tranche delay.
I don't think tranches have any decent predictive power here.
Look after "Make acquiring LWLock to look more like spinlock".
First `skip_wait_list` iterations there is no attempt to queue self into
wait list. It gives most of improvement. (without it there is just no
degradation on high number of clients, but a little decline on low
clients, because current algorithm is also a little `spinny` (because
it attempts to acquire lock again after queueing into wait list)).
`skip_wait_list` depends on tranche very much. And per-tranche
`skip_wait_list` should be calculated based on ability to acquire lock
without any sleep, ie without both queuing itself into wait list and
falling into `pg_usleep` inside `perform_spin_delay` . I suppose, it is
obvious that `spins_per_delay` have to be proportional to
`skip_wait_list`
(it should be at least greater than `skip_wait_list`). Therefore
`spins_per_delay` is also should be runtime-dependent on lock's tranche.
Am I wrong?
Without that per-tranche auto-tuning, it is better to not touch
`spins_per_delay` inside of `LWLockAttemptLockOrQueue`, I think.
With regards,
--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers