On Wed, Dec 4, 2013 at 4:28 AM, Tatsuo Ishii <is...@postgresql.org> wrote: >>> Can we avoid the Linux kernel problem by simply increasing our shared >>> buffer size, say up to 80% of memory? >> It will be swap more easier. > > Is that the case? If the system has not enough memory, the kernel > buffer will be used for other purpose, and the kernel cache will not > work very well anyway. In my understanding, the problem is, even if > there's enough memory, the kernel's cache does not work as expected.
Problem is, Postgres relies on a working kernel cache for checkpoints. Checkpoint logic would have to be heavily reworked to account for an impaired kernel cache. Really, there's no difference between fixing the I/O problems in the kernel(s) vs in postgres. The only difference is, in the kernel(s), everyone profits, and you've got a huge head start. Communicating more with the kernel (through posix_fadvise, fallocate, aio, iovec, etc...) would probably be good, but it does expose more kernel issues. posix_fadvise, for instance, is a double-edged sword ATM. I do believe, however, that exposing those issues and prompting a fix is far preferable than silently working around them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers