Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

Thomas Munro Tue, 03 Apr 2018 19:46:11 -0700

On Wed, Apr 4, 2018 at 2:14 PM, Bruce Momjian <[email protected]> wrote:
> On Tue, Apr  3, 2018 at 10:05:19PM -0400, Bruce Momjian wrote:
>> On Wed, Apr  4, 2018 at 01:54:50PM +1200, Thomas Munro wrote:
>> > I believe there were some problems of that nature (with various
>> > twists, based on other concurrent activity and possibly different
>> > fds), and those problems were fixed by the errseq_t system developed
>> > by Jeff Layton in Linux 4.13.  Call that "bug #1".
>>
>> So all our non-cutting-edge Linux systems are vulnerable and there is no
>> workaround Postgres can implement?  Wow.
>
> Uh, are you sure it fixes our use-case?  From the email description it
> sounded like it only reported fsync errors for every open file
> descriptor at the time of the failure, but the checkpoint process might
> open the file _after_ the failure and try to fsync a write that happened
> _before_ the failure.


I'm not sure of anything.  I can see that it's designed to report
errors since the last fsync() of the *file* (presumably via any fd),
which sounds like the desired behaviour:

https://github.com/torvalds/linux/blob/master/mm/filemap.c#L682

 * When userland calls fsync (or something like nfsd does the equivalent), we
 * want to report any writeback errors that occurred since the last fsync (or
 * since the file was opened if there haven't been any).

But I'm not sure what the lifetime of the passed-in "file" and more
importantly "file->f_wb_err" is.  Specifically, what happens to it if
no one has the file open at all, between operations?  It is reference
counted, see fs/file_table.c.  I don't know enough about it to
comment.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

Reply via email to