On 29 March 2018 at 13:06, Thomas Munro <thomas.mu...@enterprisedb.com> wrote:
> On Thu, Mar 29, 2018 at 6:00 PM, Justin Pryzby <pry...@telsasoft.com> > wrote: > > The retries are the source of the problem ; the first fsync() can return > EIO, > > and also *clears the error* causing a 2nd fsync (of the same data) to > return > > success. > > What I'm failing to grok here is how that error flag even matters, > whether it's a single bit or a counter as described in that patch. If > write back failed, *the page is still dirty*. So all future calls to > fsync() need to try to try to flush it again, and (presumably) fail > again (unless it happens to succeed this time around). > <http://www.enterprisedb.com> > You'd think so. But it doesn't appear to work that way. You can see yourself with the error device-mapper destination mapped over part of a volume. I wrote a test case here. https://github.com/ringerc/scrapcode/blob/master/testcases/fsync-error-clear.c I don't pretend the kernel behaviour is sane. And it's possible I've made an error in my analysis. But since I've observed this in the wild, and seen it in a test case, I strongly suspect that's what I've described is just what's happening, brain-dead or no. Presumably the kernel marks the page clean when it dispatches it to the I/O subsystem and doesn't dirty it again on I/O error? I haven't dug that deep on the kernel side. See the stackoverflow post for details on what I found in kernel code analysis. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services