25 окт. 2014 г., в 4:31, Jim Nasby <jim.na...@bluetreble.com> написал(а):
> Please don't top-post. > > On 10/24/14, 3:40 AM, Borodin Vladimir wrote: >> I have taken some backtraces (they are attached to the letter) of two >> processes with such command: >> pid=17981; while true; do date; gdb -batch -e back >> /usr/pgsql-9.4/bin/postgres $pid; echo; echo; echo; echo; sleep 0.1; done >> >> Process 17981 was holding the lock for a long time - >> http://pastie.org/9671931. >> And process 13886 was waiting for lock (in different time and from different >> blocker actually but I don’t think it is really important) - >> http://pastie.org/9671939. >> >> As I can see, 17981 is actually waiting for LWLock on BufFreelistLock in >> StrategyGetBuffer function, freelist.c:134 while holding exclusive lock on >> relation. I will try to increase NUM_BUFFER_PARTITIONS (on read-only load it >> also gave us some performance boost) and write the result in this thread. > > BufFreelistLock becomes very contended when shared buffers are under a lot of > pressure. > > Here's what I believe is happening: > > If RelationGetBufferForTuple() decides it needs to extend, this happens: > LockRelationForExtension(relation, ExclusiveLock); > buffer = ReadBufferBI(relation, P_NEW, bistate); > > Assuming bistate is false (I didn't check the bulk case), ReadBufferBI() ends > up at ReadBuffer_common(), which calls BufferAlloc(). In the normal case, > BufferAlloc() won't find the necessary buffer, so it will call > StrategyGetBuffer(), which will end up getting the freelist lock. Currently > the free list is normally empty, which means we now need to run the clock > sweep to find a victim buffer. The clock sweep will keep running until it > finds a buffer that is not pinned and has usage_count = 0. If shared buffers > are under heavy pressure, you can have a huge number of them with usage_count > = 5, which for 100GB shared buffers and an 8K BLKSZ, you could have to check > buffers *52 million* times (assuming you finally find a buffer on the start > of the 5th loop) before you find a victim. > > Keep in mind that's all happening while you're holding both the extension > lock *and the freelist lock*, which basically means no one else in the entire > system can allocate a new buffer. I’ll try the same workload with recent patch from Andres Freund [0]. > > This is one reason why a large shared_buffers setting is usually > counter-productive. Experience with older versions is that setting it higher > than about 8GB is more likely to hurt than to help. Newer versions are > probably better, but I think you'll be hard-pressed to find a workload where > 100GB makes sense. It might if your entire database fits in shared_buffers > (though, even then there's probably a number of O(n) or worse operations that > will hurt you), but if your database is > shared_buffers you're probably in > trouble. > > I suggest cutting shared_buffers *way* down. Old-school advice for this > machine would be 8G (since 25% of 128G would be too big). You might be able > to do better than 8G, but I recommend not even trying unless you've got a > good way to test your performance. > > If you can test performance and find an optimal setting for shared_buffers, > please do share your test data and findings. :) Of course, it works well with shared_buffers <= 8GB. But we have seen that on read-only load when data set fits in RAM with <=8GB shared_buffers we hit BufFreelistLock LWLock while moving pages between shared buffers and page cache. Increasing shared_buffers size to the size of data set improves performance up to 2,5X faster on this read-only load. So we started testing configuration with huge shared_buffers under writing load and that’s why I started this thread. Since StrategyGetBuffer() does not use BufFreelistLock LWLock any more [1] I’ll also re-run tests with read-only load and small shared_buffers. [0] http://git.postgresql.org/pg/commitdiff/d72731a70450b5e7084991b9caa15cb58a2820df [1] http://git.postgresql.org/pg/commitdiff/1dcfb8da09c47d2a7502d1dfab06c8be4b6cf323 > -- > Jim Nasby, Data Architect, Blue Treble Consulting > Data in Trouble? Get it in Treble! http://BlueTreble.com -- Vladimir