We have a ceph cluster of only nvme drives.

 

Very recently our overall OSD write latency increase pretty dramatically and 
our overall thoughput has really decreased.

 

One thing that seems to correlate with the start of this problem are the below 
ERROR line from the logs. All our OSD nodes are creating these log lines now.

 

Can anyone tell me what this might be telling us? All and any help is greatly 
appreciated.

 

Mar 31 23:21:56 ceph1d03 
ceph-8797e570-96be-11ed-b022-506b4b7d76e1-osd-46[12898]: debug 
2024-04-01T03:21:56.953+0000 7effbba51700  0 <cls> 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/cls/fifo/cls_fifo.cc:112:
 ERROR: int 
rados::cls::fifo::{anonymous}::read_part_header(cls_method_context_t, 
rados::cls::fifo::part_header*): failed decoding part header

 

-- 

Mark Selby

Sr Linux Administrator, The Voleon Group

mse...@voleon.com 

 

 This email is subject to important conditions and disclosures that are listed 
on this web page: https://voleon.com/disclaimer/.

 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to