Hi Dave,

On Tue, Sep 21, 2021 at 1:20 PM David Schulz <dsch...@ucalgary.ca> wrote:
>
> Hi Everyone,
>
> For a couple of weeks I've been battling a corruption in Ceph FS that
> happens when a writer on one node writes a line and calls sync as is
> typical with logging and the file is corrupted when the same file that
> is being written is read from another client.
>
> The cluster is a Nautilus 14.2.9 and the clients are all kernel client
> mounting the filesystem with CentOS 8.4 kernel
> 4.18.0-305.10.2.el8_4.x86_64.  Bluestore OSDs and Eraseure coding are
> both used.  The cluster was upgraded from Mimic (the first installed
> versoin) at some point.
>
> Here is a little python3 program that triggers the issue:
>
> import os
> import time
>
> fh=open("test.log", "a")
>
> while True:
>      start = time.time()
>      fh.writelines("test2\n")
>      end = time.time()
>      fh.flush()
>      junk=os.getpid()
>      fh.writelines(f"took {(end - start)}\n")
>      fh.flush()
>      time.sleep(1)
>
> If I run this on one client and repeatedly run "wc -l " on a different
> client.  The wc will do 2 different behaviours, sometimes NULL bytes get
> scribbled in the file and the next line of output is appended and other
> times the file gets truncated.
>
> I did update from 14.2.2 to 14.2.9 (I had the a clone of the 14.2.9 repo
> on hand).  I read the release notes and there did seem to be some
> related fixes between 14.2.2 and 14.2.9 but nothing after 14.2.9.
>
> I can't seem to find any references to a problem like this anywhere.
> Does anyone have any ideas?

You're probably hitting this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1996680

Try upgrading your kernel.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to