Assuming that Erlang doesn't lie about the return status, then we'd
throw an error on a broken fsync which would kill the
couch_db_updater. In the case of delayed_commits we'd lose the last
delayed commit interval of writes just as any other error.

That's based on these two lines:

https://github.com/apache/couchdb-couch/blob/master/src/couch_file.erl#L207
https://github.com/apache/couchdb-couch/blob/master/src/couch_file.erl#L312

Since we assert that the return value is ok.

A quick skim of unix_efile.c shows that its passing the return of
fsync to check_error which sets an errno if there was an error. So
assuming that EINTR doesn't some how crazily get mutated into an ok
atom response, we're fine.

https://github.com/erlang/otp/blob/master/erts/emulator/drivers/unix/unix_efile.c#L478-L482
https://github.com/erlang/otp/blob/master/erts/emulator/drivers/unix/unix_efile.c#L94-L102

On Thu, May 21, 2015 at 3:10 PM, Jan Lehnardt <[email protected]> wrote:
>
>> On 21 May 2015, at 21:40, Alexander Shorin <[email protected]> wrote:
>>
>> I think it worth to cross post to erlang-questions@ ML. Would you?
>
> if we don’t get any further here, sure :) — I just don’t want to make
> a fool of myself, should this be a simple answer and I feel more
> comfortable in this particular crowd, with the CoC and all :)
>
> Best
> Jan
> --
>
>> --
>> ,,,^..^,,,
>>
>>
>> On Thu, May 21, 2015 at 10:23 PM, Jan Lehnardt <[email protected]> wrote:
>>> Hi all,
>>>
>>> I stumbled across https://ldpreload.com/blog/signalfd-is-useless and 
>>> wondered how this squares against our use of fsync().
>>>
>>> A quick glance at 
>>> https://github.com/erlang/otp/blob/master/erts/emulator/drivers/unix/unix_efile.c
>>>  reveals that EINTR is handled in multiple places, but only in 
>>> read/write/sendfile functions, but not fsync. I also tried to trace the 
>>> calling code of efile_fsync() (or efile_fdatasync()), but I got lost pretty 
>>> quickly in some dtrace macro indirections, so I don’t know if there is any 
>>> retry logic higher up.
>>>
>>> I’m not experienced enough here to make a call, but does that mean that we 
>>> have a possible scenario where EINTR interrupts an fsync call after which a 
>>> crash (machine or CouchDB) leaves part of a database not fsynced? Or would 
>>> the failing fsync bubble up to the corresponding, say, PUT request handler? 
>>> How about with delayed_commits=true, is the possible data-loss window then 
>>> 2 seconds rather than the documented 1s?
>>>
>>> Can anyone shed any light on this?
>>>
>>> Best
>>> Jan
>>> --
>>>
>>>
>>>
>
> --
> Professional Support for Apache CouchDB:
> http://www.neighbourhood.ie/couchdb-support/
>

Reply via email to