James Bottomley <james.bottom...@hansenpartnership.com> writes:
> The current mechanism for coherency between a userspace cache and the
> in-kernel page cache is mmap ... that's the only way you get the same
> page in both currently.

Right.

> glibc used to have an implementation of read/write in terms of mmap, so
> it should be possible to insert it into your current implementation
> without a major rewrite.  The problem I think this brings you is
> uncontrolled writeback: you don't want dirty pages to go to disk until
> you issue a write()

Exactly.

> I think we could fix this with another madvise():
> something like MADV_WILLUPDATE telling the page cache we expect to alter
> the pages again, so don't be aggressive about cleaning them.

"Don't be aggressive" isn't good enough.  The prohibition on early write
has to be absolute, because writing a dirty page before we've done
whatever else we need to do results in a corrupt database.  It has to
be treated like a write barrier.

> The problem is we can't give you absolute control of when pages are
> written back because that interface can be used to DoS the system: once
> we get too many dirty uncleanable pages, we'll thrash looking for memory
> and the system will livelock.

Understood, but that makes this direction a dead end.  We can't use
it if the kernel might decide to write anyway.

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to