On Tue, Apr 21, 2015 at 11:04 AM, Bruce Momjian <br...@momjian.us> wrote: > Yes, it might be too much optimization to try to get the checkpoint to > flush all those pages sequentially, but I was thinking of our current > behavior where, after an update of all rows, we effectively write out > the entire table because we have dirtied every page. I guess with later > prune-based writes, we aren't really writing all the pages as we have > the pattern where pages with prunable content is kind of random. I guess > I was just wondering what value there is to your write-then-skip idea, > vs just writing the first X% of pages we find? Your idea certainly > spreads out the pruning, and doesn't require knowing the size of the > table, though I though that information was easily determined. > > One thing to consider is how we handle pruning of index scans that hit > multiple heap pages. Do we still write X% of the pages in the table, or > %X of the heap pages we actually access via SELECT? With the > write-then-skip approach, we would do X% of the pages we access, while > with the first-X% approach, we would probably prune all of them as we > would not be accessing most of the table. I don't think we can do the > first first-X% of pages and have the percentage based on the number of > pages accessed as we have no way to know how many heap pages we will > access from the index. (We would know for bitmap scans, but that > complexity doesn't seem worth it.) That would argue, for consistency > with sequential and index-based heap access, that your approach is best.
I actually implemented something like this for setting hint bits a few years ago: http://www.postgresql.org/message-id/aanlktik5qzr8wts0mqcwwmnp-qhgrdky5av5aob7w...@mail.gmail.com http://www.postgresql.org/message-id/aanlktimgkag7wdu-x77gnv2gh6_qo5ss1u5b6q1ms...@mail.gmail.com At least in later versions, the patch writes a certain number of hinted pages, then skips writing a run of pages, then writes another run of hinted pages. The basic problem here is that, after the fsync queue compaction patch went in, the benefits on my tests were pretty modest. Yeah, it costs something to write out lots of dirty pages, but before the fsync queue compaction stuff, the initial scan of an unhinted table took like 6x the time on the machine I tested on, but after that, it was like 1.5x the time. Blunting that spike just wasn't exciting enough. It strikes me that it would be better to have an integrated strategy for this problem. It doesn't make sense to have one strategy for deciding whether to set hint bits and a separate strategy for deciding whether to HOT-prune. And if we decide to set hint bits and HOT-prune, it might be smart to try to mark the page all-visible, too, if it is and we're not about to update it. I believe we're losing a lot of performance on OLTP workloads by re-dirtying the same pages over and over again. We've probably all hit cases where there is an obvious loss of performance because of this sort of thing, but I'm starting to think it's hurting us in a lot of less-obvious ways. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers