[ceph-users] Re: what happens if a server crashes with cephfs?

Gregory Farnum Thu, 08 Dec 2022 08:45:15 -0800

On Thu, Dec 8, 2022 at 8:42 AM Manuel Holtgrewe <zyklenf...@gmail.com> wrote:
>
> Hi Charles,
>
> as far as I know, CephFS implements POSIX semantics. That is, if the CephFS 
> server cluster dies for whatever reason then this will translate in I/O 
> errors. This is the same as if your NFS server dies or you run the program 
> locally on a workstation/laptop and the machine loses power. POSIX file 
> systems guarantee that data is persisted on the storage after a file is closed


Actually the "commit on close" is *entirely* an NFS-ism and is not
part of posix. If you expect a closed file to be flushed to disk
anywhere else (including CephFS), you will be disappointed. You need
to use fsync/fdatasync/sync/syncfs.
-Greg

> or fsync() is called. Otherwise, the data may still be "in flight", e.g., in 
> the OS I/O cache or even the runtime library's cache.
>
> This is not a bug but a feature as this improves performance when appending 
> small bits to a file and the HDD head does not have to move every time 
> something is written and not a full 4kb block has to be written for SSD.
>
> Posix semantics even go further, enforcing certain guarantees if files are 
> written from multiple clients. Recently, something called "lazy I/O" has been 
> introduced [1] in CephFS which allows to explicitly relax certain of these 
> guarantees to improve performance.
>
> I don't think there even is a ceph mount setting that allows you to configure 
> local cache mechanisms as for NFS. For NFS, I have seen setups where two 
> clients saw two different versions of the same -- closed -- file because one 
> had written to the file and this was not yet reflected on the second client. 
> To the best of my knowledge, this will not happen with CephFS.
>
> I'd be happy to learn to be wrong if I'm wrong. ;-)
>
> Best wishes,
> Manuel
>
> [1] https://docs.ceph.com/en/latest/cephfs/lazyio/
>
> On Thu, Dec 8, 2022 at 5:09 PM Charles Hedrick <hedr...@rutgers.edu> wrote:
>>
>> thanks. I'm evaluating cephfs for a computer science dept. We have users 
>> that run week-long AI training jobs. They use standard packages, which they 
>> probably don't want to modify. At the moment we use NFS. It uses synchronous 
>> I/O, so if somethings goes wrong, the users' jobs pause until we reboot, and 
>> then continue. However there's an obvious performance penalty for this.
>> ________________________________
>> From: Gregory Farnum <gfar...@redhat.com>
>> Sent: Thursday, December 8, 2022 2:08 AM
>> To: Dhairya Parmar <dpar...@redhat.com>
>> Cc: Charles Hedrick <hedr...@rutgers.edu>; ceph-users@ceph.io 
>> <ceph-users@ceph.io>
>> Subject: Re: [ceph-users] Re: what happens if a server crashes with cephfs?
>>
>> More generally, as Manuel noted you can (and should!) make use of fsync et 
>> al for data safety. Ceph’s async operations are not any different at the 
>> application layer from how data you send to the hard drive can sit around in 
>> volatile caches until a consistency point like fsync is invoked.
>> -Greg
>>
>> On Wed, Dec 7, 2022 at 10:02 PM Dhairya Parmar 
>> <dpar...@redhat.com<mailto:dpar...@redhat.com>> wrote:
>> Hi Charles,
>>
>> There are many scenarios where the write/close operation can fail but
>> generally
>> failures/errors are logged (normally every time) to help debug the case.
>> Therefore
>> there are no silent failures as such except you encountered  a very rare
>> bug.
>> - Dhairya
>>
>>
>> On Wed, Dec 7, 2022 at 11:38 PM Charles Hedrick 
>> <hedr...@rutgers.edu<mailto:hedr...@rutgers.edu>> wrote:
>>
>> > I believe asynchronous operations are used for some operations in cephfs.
>> > That means the server acknowledges before data has been written to stable
>> > storage. Does that mean there are failure scenarios when a write or close
>> > will return an error? fail silently?
>> >
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io>
>> > To unsubscribe send an email to 
>> > ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io>
>> >
>> >
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io>
>> To unsubscribe send an email to 
>> ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io>
>>
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: what happens if a server crashes with cephfs?

Reply via email to