On 02/12/2014 04:04 PM, Heikki Linnakangas wrote:
On 02/12/2014 10:50 PM, Andres Freund wrote:
On February 12, 2014 9:33:38 PM CET, Tom Lane <t...@sss.pgh.pa.us> wrote:
Andres Freund <and...@2ndquadrant.com> writes:
On 2014-02-12 14:39:37 -0500, Andrew Dunstan wrote:
On investigation I found that a number of processes were locked
waiting for
one wedged process to end its transaction, which never happened
(this
transaction should normally take milliseconds). oprofile revealed
that
postgres was spending 87% of its time in s_lock(), and strace on the
wedged
process revealed that it was in a tight loop constantly calling
select(). It
did not respond to a SIGTERM.

That's a deficiency of the gin fastupdate cache: a) it bases it's
size
on work_mem which usually makes it *far* too big b) it doesn't
perform the
cleanup in one go if it can get a suitable lock, but does independent
locking for each entry. That usually leads to absolutely horrific
performance under concurreny.

I'm not sure that what Andrew is describing can fairly be called a
concurrent-performance problem.  It sounds closer to a stuck lock.
Are you sure you've diagnosed it correctly?

No. But I've several times seen similar backtraces where it wasn't actually stuck, just livelocked. I'm just on my mobile right now, but afair Andrew described a loop involving lots of semaphores and spinlock, that shouldn't be the case if it were actually stuck. If there dozens of processes waiting on the same lock, cleaning up a large amount of items one by one, it's not surprising if its dramatically slow.

Perhaps we should use a lock to enforce that only one process tries to clean up the pending list at a time.


Is that going to serialize all these inserts?

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to