Re: [HACKERS] Re: We have got a serious problem with pg_clog/WAL synchronization

Kenneth Marshall Fri, 13 Aug 2004 10:05:44 -0700

> "Min Xu (Hsu)" <[EMAIL PROTECTED]> writes:
> > It seems to me this is an interesting phenomena of interactions between 
> > frequent events of transaction commits and infrequent events of system 
> > checkpoints. A potential alternative solution to adding a new shared 
> > lock to the frequent commit operation is to let the infrequent 
> > checkpoint operation take more overhead. I suppose acquiring/releasing 
> > an extra lock for each commit would incur extra performance overhead, 
> > even when the lock is not contented.  On the other hand, let the 
> > checkpoing operation acquire some existing locks (exclusively) to 
> > effectively disallowing committing transactions to interfere with the 
> > checkpoint process might be a better solution since it incur higher 
> > overhead only when necessary.
> 
> Unfortunately, there isn't any pre-existing lock that will serve.
> A transaction that is between XLogInsert'ing its COMMIT record and
> updating the shared pg_clog data area does not hold any lock that
> could be used to prevent a checkpoint from starting.  (Or it didn't
> until yesterday's patch, anyway.)
> 
> I looked briefly at reorganizing the existing code so that we'd do the
> COMMIT XLogInsert while we're holding lock on the shared pg_clog data,
> which would solve the problem without adding any new lock acquisition.
> But this seemed extremely messy to do.  Also it would be optimizing
> transaction commit at the cost of pessimizing other uses of pg_clog,
> which might have to wait longer to get at the shared data.  Adding the
> new lock has the advantage that we can be sure it's not blocking
> anything we don't want it to block.
> 
> Thanks for thinking about the problem though ...
> 
>    regards, tom lane
>


One problem with a high-traffic LWLock is that they require a write
to shared memory for both the shared lock and the exclusive lock. On
the increasingly prevalent SMP machines, this will cause the invalidation
of the cache-line containing the lock and the consequent reload and its
inherent delay. Would it be possible to use a latch + version number in
this case to minimize this problem by allowing all but the checkpoint to
perform a read-only action on the latch? This should eliminate the cache-line
shenanigans on SMP machines.

Ken Marshall 

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Re: We have got a serious problem with pg_clog/WAL synchronization

Reply via email to