On Tue, Dec 10, 2013 at 9:22 PM, Jeff Janes <jeff.ja...@gmail.com> wrote: >> Communicating more with the kernel (through posix_fadvise, fallocate, >> aio, iovec, etc...) would probably be good, but it does expose more >> kernel issues. posix_fadvise, for instance, is a double-edged sword >> ATM. I do believe, however, that exposing those issues and prompting a >> fix is far preferable than silently working around them. > > > Getting the kernel to improve those things so PostgreSQL can be changed to > use them more aggressively seems almost hopeless to me. PostgreSQL would > have to be coded to take advantage of the improved versions, while defending > itself from the pre-improved versions. And my understanding is that > different distributions of Linux cherry pick changes to the kernel back and > forth into their code, so just looking at the kernel version number without > also looking at the distribution doesn't mean very much about whether we > have the improved feature or not. Or am I misinformed about that? > > If we can point things out to the kernel hackers things that would be > absolute improvements, where PostgreSQL and everything else just magically > start working better if that improvement makes it in, that is great. Both if > both systems have to be changed in sync to derive any benefit, how do we > coordinate that?
Well, posix_fadvise is one such thing. It's a cheap form of AIO used by more than a few programs that want I/O performance, and in its current form is sub-optimal, the fix is rather simple, it just needs a lot of testing. But my report on LKML[0] spurred little actual work. So it's possible this kind of thing will need patches attached. On Tue, Dec 10, 2013 at 9:34 PM, Andres Freund <and...@2ndquadrant.com> wrote: > On 2013-12-04 05:39:23 -0200, Claudio Freire wrote: >> Problem is, Postgres relies on a working kernel cache for checkpoints. >> Checkpoint logic would have to be heavily reworked to account for an >> impaired kernel cache. > > I don't think checkpoints are the critical problem with that, they are > nicely in the background and we could easily add sorting. Problem is, with DirectIO, they won't be so background. Currently, checkpoints assume there's a background process catching all I/O requests, sorting them, and flushing them as optimally as possible. This makes the checkpoint's slow-paced write pattern benignly background, since it will be scheduled opportunistically by the kernel. If you use DirectIO, however, a write will pretty much physically move the writing head (when it reaches the queue's head at least) of rotating media, causing delays on all other pending I/O requests. That's quite un-backgroundly of it. A few blocks per second like that can pretty much kill sequential scans (I've seen that effect happen with fadvise). [0] https://lkml.org/lkml/2012/11/9/353 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers