On Thu, Jun 27, 2013 at 9:01 AM, Andres Freund <and...@2ndquadrant.com> wrote: > Contention wise I aggree. What I have seen is that we have a huge > amount of cacheline bouncing around the buffer header spinlocks.
How did you measure that? > I have previously added some adhoc instrumentation that printed the > amount of buffers that were required (by other backends) during a > bgwriter cycle and the amount of buffers that the buffer manager could > actually write out. I think you can see how many are needed from buffers_alloc. No? > I don't think I actually found any workload where > the bgwriter actually wroute out a relevant percentage of the necessary > pages. Check. > Problems with the current code: > > * doesn't manipulate the usage_count and never does anything to used > pages. Which means it will just about never find a victim buffer in a > busy database. Right. I was thinking that was part of this patch, but it isn't. I think we should definitely add that. In other words, the background writer's job should be to run the clock sweep and add buffers to the free list. I think we should also split the lock: a spinlock for the freelist, and an lwlock for the clock sweep. > * by far not aggressive enough, touches only a few buffers ahead of the > clock sweep. Check. Fixing this might be a separate patch, but then again maybe not. The changes we're talking about here provide a natural feedback mechanism: if we observe that the freelist is empty (or less than some length, like 32 buffers?) set the background writer's latch, because we know it's not keeping up. > * does not advance the clock sweep, so the individual backends will > touch the same buffers again and transfer all the buffer spinlock > cacheline around Yes, I think that should be fixed as part of this patch too. It's obviously connected to the point about usage counts. > * The adaption logic it has makes it so slow to adapt that it takes > several minutes to adapt. Yeah. I don't know if fixing that will fall naturally out of these other changes or not, but I think it's a second-order concern in any event. > There's another thing we could do to noticeably improve scalability of > buffer acquiration. Currently we do a huge amount of work under the > freelist lock. > In StrategyGetBuffer: > LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE); > ... > // check freelist, will usually be empty > ... > for (;;) > { > buf = &BufferDescriptors[StrategyControl->nextVictimBuffer]; > > ++StrategyControl->nextVictimBuffer; > > LockBufHdr(buf); > if (buf->refcount == 0) > { > if (buf->usage_count > 0) > { > buf->usage_count--; > } > else > { > /* Found a usable buffer */ > if (strategy != NULL) > AddBufferToRing(strategy, buf); > return buf; > } > } > UnlockBufHdr(buf); > } > > So, we perform the entire clock sweep until we found a single buffer we > can use inside a *global* lock. At times we need to iterate over the > whole shared buffers BM_MAX_USAGE_COUNT (5) times till we pushed down all > the usage counts enough (if the database is busy it can take even > longer...). > In a busy database where usually all the usagecounts are high the next > backend will touch a lot of those buffers again which causes massive > cache eviction & bouncing. > > It seems far more sensible to only protect the clock sweep's > nextVictimBuffer with a spinlock. With some care all the rest can happen > without any global interlock. That's a lot more spinlock acquire/release cycles, but it might work out to a win anyway. Or it might lead to the system suffering a horrible spinlock-induced death spiral on eviction-heavy workloads. > I think even after fixing this - which we definitely should do - having > a sensible/more aggressive bgwriter moving pages onto the freelist makes > sense because then backends then don't need to deal with dirty pages. Or scanning to find evictable pages. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers