[ceph-users] Re: MDS_DAMAGE in 17.2.7 / Cannot delete affected files

2023-11-30 Thread Sebastian Knust

Hi Patrick,

On 30.11.23 03:58, Patrick Donnelly wrote:


I've not yet fully reviewed the logs but it seems there is a bug in
the detection logic which causes a spurious abort. This does not
appear to be actually new damage.


We are accessing the metadata (read-only) daily. The issue only popped 
up after updating to 17.2.7. Of course, this does not mean that there 
was no damage there before, only that it was not detected.


Are you using postgres?

Not on top of CephFS, no. We do use postgres on some RBD volumes.



If you can share details about your snapshot
workflow and general workloads that would be helpful (privately if
desired).


Our CephFS root looks like this:
/archive
/homes
/no-snapshot
/other-snapshot
/scratch

We are running snapshots on /homes and /other-snapshot with the same 
schedule. We mount the filesystem with a Kernel client on one of the 
Ceph Hosts (not running the MDS) and mkdir / rmdir as needed.
- daily between 06:00 and 19:45 UTC (inclusive): Create a snapshot every 
15 minutes, delete it unless it is hourly (xx:00) one hour later
- daily on the full hour: Create a snapshot, delete the 24 hours old 
snapshot unless it is midnight

- daily at midnight delete the snapshot from 14 days ago unless it is Sunday
- every Sunday at midnight delete the snapshot from 8 weeks ago

Workload is two main Samba servers (one only sharing a subdirectory 
which is generally not accessed on the other). Client access to those 
servers is limited to 1GBit/s each. Until Tuesday, we also had a 
mailserver with Dovecot running on top of CephFS. This was migrated on 
Tuesday to an RBD volume as we had some issues with hanging access to 
some files / directories (interestingly only in the main tree, in 
snapshots access was without issue). Additionally, we have a Nextcloud 
instance with ~200 active users storing data in CephFS as well as some 
other Kernel clients with little / sporadic traffic, some running Samba, 
some NFS, some interactive SSH / x2go servers with direct user access, 
some specialised web applications (notably OMERO).


We run daily incremental backups of most of the CephFS content with 
Bareos running on a dedicated server which has the whole CephFS tree 
mounted read-only. For most data a full backup is performed every two 
months, for some data only every six months. The affected area is 
contained in this "every six months" full backup portion of the file 
system tree.



Two weeks ago we deleted a folder structure with 6 TB, average file size 
in the range of 1GB. The structure was unter /other-snapshot as well. 
This led to severe load on the MDS, especially starting midnight. In 
conjunction with Ubuntu kernel mount, we also had issues with 
non-released capabilities preventing read-access to the /other-snapshot 
part.


To combat these lingering problems, we deleted all snapshots in 
/other-snapshot which led to a half a dozen PGs stuck in snaptrim state 
(and a few hundred in snaptrim_wait). Updating from 17.2.6 to 17.2.7 
solved that issue quickly, the affected PGs became unstuck and the whole 
cluster was in active+clean a few hours later.






For now, I'll hold off on running first-damage.py to try to remove the
affected files / inodes. Ultimately however, this seems to be the most
sensible solution to me, at least with regards to cluster downtime.


Please give me another day to review then feel free to use
first-damage.py to cleanup. If you see new damage please upload the
logs.

We are in no hurry and will probably run first-damage.py sometime next 
week. I will report new damage if it comes in.


Cheers
Sebastian

--
Dr. Sebastian Knust  | Bielefeld University
IT Administrator | Faculty of Physics
Office: D2-110   | Universitätsstr. 25
Phone: +49 521 106 5234  | 33615 Bielefeld
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS_DAMAGE in 17.2.7 / Cannot delete affected files

2023-11-29 Thread Patrick Donnelly
Hi Sebastian,

On Wed, Nov 29, 2023 at 3:11 PM Sebastian Knust
 wrote:
>
> Hello Patrick,
>
> On 27.11.23 19:05, Patrick Donnelly wrote:
> >
> > I would **really** love to see the debug logs from the MDS. Please
> > upload them using ceph-post-file [1]. If you can reliably reproduce,
> > turn on more debugging:
> >
> >> ceph config set mds debug_mds 20
> >> ceph config set mds debug_ms 1
> >
> > [1] https://docs.ceph.com/en/reef/man/8/ceph-post-file/
> >
>
> Uploaded debug log and core dump, see ceph-post-file:
> 02f78445-7136-44c9-a362-410de37a0b7d
> Unfortunately, we cannot easily shut down normal access to the cluster
> for these tests, therefore there is quite some clutter in the logs. The
> logs show three crashes, the last one with enabled core dumping (ulimits
> set to unlimited)
>
> A note on reproducibility: To recreate the crash, reading the contents
> of the file prior to removal seems necessary. Simply calling stat on the
> file and then performing the removal also yields an Input/output error
> but does not crash the MDS.
>
> Interestingly, the MDS_DAMAGE flag is reset on restart of the MDS and
> only comes back once the files in question are accessed (stat call is
> sufficient).

I've not yet fully reviewed the logs but it seems there is a bug in
the detection logic which causes a spurious abort. This does not
appear to be actually new damage.

Are you using postgres? If you can share details about your snapshot
workflow and general workloads that would be helpful (privately if
desired).

> For now, I'll hold off on running first-damage.py to try to remove the
> affected files / inodes. Ultimately however, this seems to be the most
> sensible solution to me, at least with regards to cluster downtime.

Please give me another day to review then feel free to use
first-damage.py to cleanup. If you see new damage please upload the
logs.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS_DAMAGE in 17.2.7 / Cannot delete affected files

2023-11-29 Thread Sebastian Knust

Hello Patrick,

On 27.11.23 19:05, Patrick Donnelly wrote:


I would **really** love to see the debug logs from the MDS. Please
upload them using ceph-post-file [1]. If you can reliably reproduce,
turn on more debugging:


ceph config set mds debug_mds 20
ceph config set mds debug_ms 1


[1] https://docs.ceph.com/en/reef/man/8/ceph-post-file/



Uploaded debug log and core dump, see ceph-post-file: 
02f78445-7136-44c9-a362-410de37a0b7d
Unfortunately, we cannot easily shut down normal access to the cluster 
for these tests, therefore there is quite some clutter in the logs. The 
logs show three crashes, the last one with enabled core dumping (ulimits 
set to unlimited)


A note on reproducibility: To recreate the crash, reading the contents 
of the file prior to removal seems necessary. Simply calling stat on the 
file and then performing the removal also yields an Input/output error 
but does not crash the MDS.


Interestingly, the MDS_DAMAGE flag is reset on restart of the MDS and 
only comes back once the files in question are accessed (stat call is 
sufficient).



For now, I'll hold off on running first-damage.py to try to remove the 
affected files / inodes. Ultimately however, this seems to be the most 
sensible solution to me, at least with regards to cluster downtime.


Cheers
Sebastian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS_DAMAGE in 17.2.7 / Cannot delete affected files

2023-11-27 Thread Patrick Donnelly
Hello Sebastian,

On Fri, Nov 24, 2023 at 8:49 AM Sebastian Knust
 wrote:
>
> Hi,
>
> After updating from 17.2.6 to 17.2.7 with cephadm, our cluster went into
> MDS_DAMAGE state. We had some prior issues with faulty kernel clients
> not releasing capabilities, therefore the update might just be a
> coincidence.
>
> `ceph tell mds.cephfs:0 damage ls` lists 56 affected files all with
> these general details:
>
> {
>  "damage_type": "dentry",
>  "id": 123456,
>  "ino": 1234567890,
>  "frag": "*",
>  "dname": "some-filename.ext",
>  "snap_id": "head",
>  "path": "/full/path/to/file"
> }
>
> The behaviour upon trying to access file information in the (Kernel
> mounted) filesystem is a bit inconsistent. Generally, the first `stat`
> call seems to result in "Input/output error", the next call provides all
> `stat` data as expected from an undamaged file. The file can be read
> with `cat` with full and correct content (verified with backup) once the
> stat call succeeds.
>
> Scrubbing the affected subdirectories with `ceph tell mds.cephfs:0 scrub
> start /path/to/dir/ recursive,repair,force` does not fix the issue.
>
> Trying to delete the file results in an "Input/output error". If the
> stat calls beforehand succeeded, this also crashes the active MDS with
> these messages in the system journal:
> > Nov 24 14:21:15 iceph-18.servernet ceph-mds[1946861]: 
> > mds.0.cache.den(0x10012271195 DisplaySettings.json) newly corrupt dentry to 
> > be committed: [dentry 
> > #0x1/homes/huser/d3data/transfer/hortkrass/FLIMSIM/2023-04-12-irf-characterization/2-qwp-no-extra-filter-pc-off-tirf-94-tirf-cursor/DisplaySettings.json
> >  [1000275c4a0,head] auth (dversion lock) pv=0 v=225 ino=0x10012271197 
> > state=1073741824 | inodepin=1 0x56413e1e2780]
> > Nov 24 14:21:15 iceph-18.servernet ceph-mds[1946861]: log_channel(cluster) 
> > log [ERR] : MDS abort because newly corrupt dentry to be committed: [dentry 
> > #0x1/homes/huser/d3data/transfer/hortkrass/FLIMSIM/2023-04-12-irf-characterization/2-qwp-no-extra-filter-pc-off-tirf-94-tirf-cursor/DisplaySettings.json
> >  [1000275c4a0,head] auth (dversion lock) pv=0 v=225 ino=0x10012271197 
> > state=1073741824 | inodepin=1 0x56413e1e2780]
> > Nov 24 14:21:15 iceph-18.servernet 
> > ceph-eafd0514-3644-11eb-bc6a-3cecef2330fa-mds-cephfs-iceph-18-ujfqnd[1946838]:
> >  2023-11-24T13:21:15.654+ 7f3fdcde0700 -1 mds.0.cache.den(0x10012271195 
> > DisplaySettings.json) newly corrupt dentry to be committed: [dentry 
> > #0x1/homes/huser/d3data/transfer/hortkrass/FLIMSIM/2023-04-12-irf-characterization/2-qwp-no-extra-filter-pc-off-tirf-94-tirf-cursor/DisplaySettings.json
> >  [1000275c4a0,head] auth (dversion lock) pv=0 v=225 ino=0x1001>
> > Nov 24 14:21:15 iceph-18.servernet 
> > ceph-eafd0514-3644-11eb-bc6a-3cecef2330fa-mds-cephfs-iceph-18-ujfqnd[1946838]:
> >  2023-11-24T13:21:15.654+ 7f3fdcde0700 -1 log_channel(cluster) log 
> > [ERR] : MDS abort because newly corrupt dentry to be committed: [dentry 
> > #0x1/homes/huser/d3data/transfer/hortkrass/FLIMSIM/2023-04-12-irf-characterization/2-qwp-no-extra-filter-pc-off-tirf-94-tirf-cursor/DisplaySettings.json
> >  [1000275c4a0,head] auth (dversion lock) pv=0 v=225 ino=0x10012>
> > Nov 24 14:21:15 iceph-18.servernet 
> > ceph-eafd0514-3644-11eb-bc6a-3cecef2330fa-mds-cephfs-iceph-18-ujfqnd[1946838]:
> >  
> > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/mds/MDSRank.cc:
> >  In function 'void MDSRank::abort(std::string_view)' thread 7f3fdcde0700 
> > time 2023-11-24T13:21:15.655088+
> > Nov 24 14:21:15 iceph-18.servernet ceph-mds[1946861]: 
> > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/mds/MDSRank.cc:
> >  In function 'void MDSRank::abort(std::string_view)' thread 7f3fdcde0700 
> > time 2023-11-24T13:21:15.655088+
> >   
> > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/mds/MDSRank.cc:
> >  937: ceph_abort_msg("abort() called")
> >
> >ceph version 17.2.7 
> > (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
> >1: 
> > (ceph::__ceph_abort(char const*, int, char const*, 
> > std::__cxx11::basic_string, 
> > std::allocator > const&)+0xd7) [0x7f3fe5a1cb03]
> >2: 
> > (MDSRank::abort(std::basic_string_view 
> > >)+0x7d) [0x5640f2e6fa2d]
> >3: 
> > (CDentry::check_corruption(bool)+0x740) 

[ceph-users] Re: MDS_DAMAGE in 17.2.7 / Cannot delete affected files

2023-11-24 Thread Dan van der Ster
Hi Sebastian,

You can find some more discussion and fixes for this type of fs
corruption here:
https://www.spinics.net/lists/ceph-users/msg76952.html

--
Dan van der Ster
CTO

Clyso GmbH
p: +49 89 215252722 | a: Vancouver, Canada
w: https://clyso.com | e: dan.vanders...@clyso.com

We are hiring: https://www.clyso.com/jobs/

On Fri, Nov 24, 2023 at 5:48 AM Sebastian Knust
 wrote:
>
> Hi,
>
> After updating from 17.2.6 to 17.2.7 with cephadm, our cluster went into
> MDS_DAMAGE state. We had some prior issues with faulty kernel clients
> not releasing capabilities, therefore the update might just be a
> coincidence.
>
> `ceph tell mds.cephfs:0 damage ls` lists 56 affected files all with
> these general details:
>
> {
>  "damage_type": "dentry",
>  "id": 123456,
>  "ino": 1234567890,
>  "frag": "*",
>  "dname": "some-filename.ext",
>  "snap_id": "head",
>  "path": "/full/path/to/file"
> }
>
> The behaviour upon trying to access file information in the (Kernel
> mounted) filesystem is a bit inconsistent. Generally, the first `stat`
> call seems to result in "Input/output error", the next call provides all
> `stat` data as expected from an undamaged file. The file can be read
> with `cat` with full and correct content (verified with backup) once the
> stat call succeeds.
>
> Scrubbing the affected subdirectories with `ceph tell mds.cephfs:0 scrub
> start /path/to/dir/ recursive,repair,force` does not fix the issue.
>
> Trying to delete the file results in an "Input/output error". If the
> stat calls beforehand succeeded, this also crashes the active MDS with
> these messages in the system journal:
> > Nov 24 14:21:15 iceph-18.servernet ceph-mds[1946861]: 
> > mds.0.cache.den(0x10012271195 DisplaySettings.json) newly corrupt dentry to 
> > be committed: [dentry 
> > #0x1/homes/huser/d3data/transfer/hortkrass/FLIMSIM/2023-04-12-irf-characterization/2-qwp-no-extra-filter-pc-off-tirf-94-tirf-cursor/DisplaySettings.json
> >  [1000275c4a0,head] auth (dversion lock) pv=0 v=225 ino=0x10012271197 
> > state=1073741824 | inodepin=1 0x56413e1e2780]
> > Nov 24 14:21:15 iceph-18.servernet ceph-mds[1946861]: log_channel(cluster) 
> > log [ERR] : MDS abort because newly corrupt dentry to be committed: [dentry 
> > #0x1/homes/huser/d3data/transfer/hortkrass/FLIMSIM/2023-04-12-irf-characterization/2-qwp-no-extra-filter-pc-off-tirf-94-tirf-cursor/DisplaySettings.json
> >  [1000275c4a0,head] auth (dversion lock) pv=0 v=225 ino=0x10012271197 
> > state=1073741824 | inodepin=1 0x56413e1e2780]
> > Nov 24 14:21:15 iceph-18.servernet 
> > ceph-eafd0514-3644-11eb-bc6a-3cecef2330fa-mds-cephfs-iceph-18-ujfqnd[1946838]:
> >  2023-11-24T13:21:15.654+ 7f3fdcde0700 -1 mds.0.cache.den(0x10012271195 
> > DisplaySettings.json) newly corrupt dentry to be committed: [dentry 
> > #0x1/homes/huser/d3data/transfer/hortkrass/FLIMSIM/2023-04-12-irf-characterization/2-qwp-no-extra-filter-pc-off-tirf-94-tirf-cursor/DisplaySettings.json
> >  [1000275c4a0,head] auth (dversion lock) pv=0 v=225 ino=0x1001>
> > Nov 24 14:21:15 iceph-18.servernet 
> > ceph-eafd0514-3644-11eb-bc6a-3cecef2330fa-mds-cephfs-iceph-18-ujfqnd[1946838]:
> >  2023-11-24T13:21:15.654+ 7f3fdcde0700 -1 log_channel(cluster) log 
> > [ERR] : MDS abort because newly corrupt dentry to be committed: [dentry 
> > #0x1/homes/huser/d3data/transfer/hortkrass/FLIMSIM/2023-04-12-irf-characterization/2-qwp-no-extra-filter-pc-off-tirf-94-tirf-cursor/DisplaySettings.json
> >  [1000275c4a0,head] auth (dversion lock) pv=0 v=225 ino=0x10012>
> > Nov 24 14:21:15 iceph-18.servernet 
> > ceph-eafd0514-3644-11eb-bc6a-3cecef2330fa-mds-cephfs-iceph-18-ujfqnd[1946838]:
> >  
> > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/mds/MDSRank.cc:
> >  In function 'void MDSRank::abort(std::string_view)' thread 7f3fdcde0700 
> > time 2023-11-24T13:21:15.655088+
> > Nov 24 14:21:15 iceph-18.servernet ceph-mds[1946861]: 
> > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/mds/MDSRank.cc:
> >  In function 'void MDSRank::abort(std::string_view)' thread 7f3fdcde0700 
> > time 2023-11-24T13:21:15.655088+
> >   
> > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/mds/MDSRank.cc:
> >  937: ceph_abort_msg("abort() called")
> >
> >ceph version 17.2.7 
> > (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
> >1: 
> > (ceph::__ceph_abort(char const*, int, char const*, 
> >