On Thu, Jan 5, 2012 at 6:15 AM, Florian Pflug <f...@phlo.org> wrote: > On 64-bit machines at least, we could simply mmap() the stable parts of the > CLOG into the backend address space, and access it without any locking at all.
True. I think this could be done, but it would take some fairly careful thought and testing because (1) we don't currently use mmap() anywhere else in the backend AFAIK, so we might run into portability issues (think: Windows) and perhaps unexpected failure modes (e.g. mmap() fails because there are too many mappings already). Also, it's not completely guaranteed to be a win. Sure, you save on locking, but now you are doing an mmap() call in every backend instead of just one read() into shared memory. If concurrency isn't a problem that might be more expensive on net. Or maybe no, but I'm kind of inclined to steer clear of this whole area at least for 9.2. So far, the only test result I have only supports the notion that we run into trouble when NUM_CPUS > NUM_CLOG_BUFFERS, and people have to before they can even start their I/Os. That can be fixed with a pretty modest reengineering. I'm sure there is a second-order effect from the cost of repeated I/Os per se, which a backend-private cache of one form or another might well help with, but it may not be very big. Test results are welcome, of course. > I believe that we could also compress the stable part by 50% if we use one > instead of two bits per txid. AFAIK, we need two bits because we > > a) Distinguish between transaction where were ABORTED and those which never > completed (due to i.e. a backend crash) and > > b) Mark transaction as SUBCOMMITTED to achieve atomic commits. > > Which both are strictly necessary for the stable parts of the clog. Well, if we're going to do compression at all, I'm inclined to think that we should compress by more than a factor of two. Jim Nasby's numbers (the worst we've seen so far) show that 18% of 1k blocks of XIDs were all commits. Presumably if we reduced the chunk size to, say, 8 transactions, that percentage would go up, and even that would be enough to get 16x compression rather than 2x. Of course, then keeping the uncompressed CLOG files becomes required rather than optional, but that's OK. What bothers me about compressing by only 2x is that the act of compressing is not free. You have to read all the chunks and then write out new chunks, and those chunks then compete for each other in cache. Who is to say that we're not better off just reading the uncompressed data at that point? At least then we have only one copy of it. > Note that > we could still keep the uncompressed CLOG around for debugging purposes - the > additional compressed version would require only 2^32/8 bytes = 512 MB in the > worst case, which people who're serious about performance can very probably > spare. I don't think it'd be even that much, because we only ever use half the XID space at a time, and often probably much less: the default value of vacuum_freeze_table_age is only 150 million transactions. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers