On Jan 16, 2011, at 4:37 PM, Kevin Grittner wrote:
> Robert Haas  wrote:
> 
>> a quick-and-dirty attempt to limit the amount of I/O caused by hint
>> bits. I'm still very interested in knowing what people think about
>> that.
> 
> I found the elimination of the response-time spike promising.  I
> don't think I've seen enough data yet to feel comfortable endorsing
> it, though.  I guess the question in my head is: how much of the
> lingering performance hit was due to having to go to clog and how
> much was due to competition with the deferred writes?  If much of it
> is due to repeated recalculation of visibility based on clog info, I
> think there would need to be some way to limit how many times that
> happened before the hint bits were saved.

What if we sped up the case where hint bits aren't set? Has anyone collected 
data on the actual pain points of checking visibility when hint bits aren't 
set? How about when setting hint bits is intentionally delayed? I wish we had 
some more infrastructure around the XIDCACHE counters; having that info 
available for people's general workloads might be extremely valuable. Even if I 
was to compile with it turned on, it seems the only way to get at it is via 
stderr, which is very hard to deal with.

Lacking performance data (and for my own education), I've spent the past few 
hours studying HeapTupleSatisfiesNow(). If I'm understanding it correctly, the 
three critical functions from a performance standpoint are 
TransactionIdIsCurrentTransactionId, TransactionIdIsInProgress and 
TransactionIdDidCommit. Note that all 3 can potentially be called twice; once 
to check xmin and once to check xmax.

ISTM TransactionIdIsCurrentTransactionId is missing a shortcut: shouldn't we be 
able to immediately return false if the XID we're checking is older than some 
value, like global xmin? Maybe it's only worth checking that case if we hit a 
subtransaction, but if the check is faster than one or two loops through the 
binary search... I would think this at least warrants a one XID cache ala 
cachedFetchXidStatus (though it would need to be a different cache...) Another 
issue is that TransactionIdIsInProgress will call this function as well, unless 
it skips out because the transaction is < RecentXmin.

TransactionIdIsInProgress does a fair amount of easy checking already... the 
biggest thing is that if it's less than RecentXmin we bounce out immediately. 
If we can't bounce out immediately though, this routine gets pretty expensive 
unless the XID is currently running and is top-level. It's worse if there are 
subxacts and can be horribly bad if any subxact caches have overflowed. Note 
that if anything has overflowed, then we end up going to clog and possibly 
pg_subtrans.

Finally, TransactionIdDidCommit hits clog.

So the degenerate cases seem to be:

- Really old XIDs. These suck because there's a good chance we'll have to read 
from clog.
- XIDs > RecontXmin that are not currently running top-level transactions. The 
pain here increases with subtransaction use.

For the second case, if we can ensure that RecentXmin is not very old then 
there's generally a smaller chance that TransactionIdIsInProgress has to do a 
lot of work. My experience is that most systems that have a high transaction 
rate don't end up with a lot of long-running transactions. Storing a list of 
the X oldest transactions would allow us to keep RecentXmin closer to the most 
recent XID.

For the first case, we should be able to create a more optimized clog lookup 
method that works for older XIDs. If we restrict this to XIDs that are older 
than GlobalXmin then we can simplify things because we don't have to worry 
about transactions that are in-progress. We also don't need to differentiate 
between subtransactions and their parents (though, we obviously need to figure 
out whether a subtransaction is considered to be committed or not). Because 
we're restricting this to XIDs that we know we can determine the state of, we 
only need to store a maximum of 1 bit per XID. That's already half the size of 
clog. But because we don't have to build this list on the fly (we're don't need 
to update it on every commit/abort as long as we know the range of XIDs that 
are stored), we don't have to support random writes. That means we can use a 
structure that's more complex to maintain than a simple bitmap. Or maybe we 
stick with a bitmap but compress it.
--
Jim C. Nasby, Database Architect                   j...@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to