On Mon, Jan 13, 2014 at 9:12 PM, Andres Freund <and...@2ndquadrant.com> wrote: > For one, postgres doesn't use mmap for files (and can't without major > new interfaces). Frequently mmap()/madvise()/munmap()ing 8kb chunks has > horrible consequences for performance/scalability - very quickly you > contend on locks in the kernel.
I may as well dump this in this thread. We've discussed this in person a few times, including at least once with Ted T'so when he visited Dublin last year. The fundamental conflict is that the kernel understands better the hardware and other software using the same resources, Postgres understands better its own access patterns. We need to either add interfaces so Postgres can teach the kernel what it needs about its access patterns or add interfaces so Postgres can find out what it needs to know about the hardware context. The more ambitious and interesting direction is to let Postgres tell the kernel what it needs to know to manage everything. To do that we would need the ability to control when pages are flushed out. This is absolutely necessary to maintain consistency. Postgres would need to be able to mark pages as unflushable until some point in time in the future when the journal is flushed. We discussed various ways that interface could work but it would be tricky to keep it low enough overhead to be workable. The less exciting, more conservative option would be to add kernel interfaces to teach Postgres about things like raid geometries. Then Postgres could use directio and decide to do prefetching based on the raid geometry, how much available i/o bandwidth and iops is available, etc. Reimplementing i/o schedulers and all the rest of the work that the kernel provides inside Postgres just seems like something outside our competency and that none of us is really excited about doing. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers