On Wed, Jan 15, 2014 at 10:53 AM, Mel Gorman <mgor...@suse.de> wrote: > I realise that now and sorry for the noise. > > I later read the parts of the thread that covered the strict ordering > requirements and in a summary mail I split the requirements in two. In one, > there are dirty sticky pages that the kernel should not writeback unless > it has no other option or fsync is called. This may be suitable for large > temporary files that Postgres does not necessarily want to hit the platter > but also does not have strict ordering requirements for. The second is have > pages that are strictly kept dirty until the application syncs them. An > unbounded number of these pages would blow up but maybe bounds could be > placed on it. There are no solid conclusions on that part yet.
I think that the bottom line is that we're not likely to make massive changes to the way that we do block caching now. Even if some other scheme could work much better on Linux (and so far I'm unconvinced that any of the proposals made here would in fact work much better), we aim to be portable to Windows as well as other UNIX-like systems (BSD, Solaris, etc.). So using completely Linux-specific technology in an overhaul of our block cache seems to me to have no future. On the other hand, giving the kernel hints about what we're doing that would enable it to be smarter seems to me to have a lot of potential. Ideas so far mentioned include: - Hint that we're going to do an fsync on file X at time Y, so that the kernel can schedule the write-out to complete right around that time. - Hint that a block is a good candidate for reclaim without actually purging it if there's no memory pressure. - Hint that a page we modify in our cache should be dropped from the kernel cache. - Hint that a page we write back to the operating system should be dropped from the kernel cache after the I/O completes. It's hard to say which of these ideas will work well without testing them, and the overhead of the extra system calls might be significant in some of those cases, but it seems a promising line of inquiry. And the idea of being able to do an 8kB atomic write with OS support so that we don't have to save full page images in our write-ahead log to cover the "torn page" scenario seems very intriguing indeed. If that worked well, it would be a *big* deal for us. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers