Good day Robert, Jim, and everyone.

On 2017-06-08 00:06, Jim Van Fleet wrote:
Robert Haas <robertmh...@gmail.com> wrote on 06/07/2017 12:12:02 PM:

> OK -- would love the feedback and any suggestions on how to
mitigate the low
> end problems.

Did you intend to attach a patch?
Yes I do -- tomorrow or Thursday -- needs a little cleaning up ...

> Sokolov Yura has a patch which, to me, looks good for pgbench rw
> performance.  Does not do so well with hammerdb (about the same as
base) on
> single socket and two socket.

Any idea why?  I think we will have to understand *why* certain
things
help in some situations and not others, not just *that* they do, in
order to come up with a good solution to this problem.

My patch improves acquiring contended/blocking LWLock on NUMA cause:
a. patched procedure generates much lesser writes, especially because
  taking WaitListLock is unified with acquiring the lock itself.
  Access to modified memory is very expensive on NUMA, so less writes
  leads to less wasted time.
b. it spins several time on lock->state in attempts to acquire lock
  before starting attempts to queue self to wait list. It is really the
  cause of some speedup. Without spinning patch just removes
  degradation on contention.
  I don't know why spinning doesn't improves single socket performance
  though :-) Probably still because all algorithmic overhead (system
  calls, sleeping and awakening process) is not too expensive until
  NUMA is involved.

Looking at the data now -- LWLockAquire philosophy is different -- at
first glance I would have guessed "about the same" as the base, but I
can not yet explain why we have super pgbench rw performance and "the
same" hammerdb performance

My patch improves only blocking contention, ie when a lot of EXCLUSIVE
locks are involved. pgbench rw generates a lot of write traffic, so
there is a lot of contention and waiting on WALInsertLocks (in
XLogInsertRecord, and waiting in XLogFlush), WalWriteLock (in
XLogFlush), CLogControlLock (in TransactionIdSetTreeStatus).

The case when SHARED lock is much more common than EXCLUSIVE is not
affected by patch, because SHARED is acquired then on the fast path
in both original and patched version.

So, looks like hammerdb doesn't produce much EXCLUSIVE contention on
LWLocks, so it is not improved with the patch.

Splitting ProcArrayLock helps with acquiring SHARED lock on NUMA in
absence of EXCLUSIVE lock because of the same reason why my patch
improves acquiring of blocking lock: less writes to same memory.
Since every process writes to some one part of ProcArrayLock, there
is a lot less writes to each part of ProcArrayLock, so acquiring
SHARED lock pays lesser for accessing to modified memory on NUMA.

Probably I'm mistaken somewhere.



--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to