Hi,

We decided to delete the pool before the snaptrim finished after 4 days waiting.
Now we have bigger issue, many osd started to flap, 2 of them cannot even 
restart due after.

Did some bluestore fsck on the not started osds and has many messages like this 
inside:

2021-05-17 18:37:07.176203 7f416d20bec0 10 stupidalloc 0x0x564e4e804f50 
init_add_free 0x482d0778000~4000
2021-05-17 18:37:07.176204 7f416d20bec0 10 freelist enumerate_next 
0x482d0784000~4000
2021-05-17 18:37:07.176204 7f416d20bec0 10 stupidalloc 0x0x564e4e804f50 
init_add_free 0x482d0784000~4000
2021-05-17 18:37:07.176205 7f416d20bec0 10 freelist enumerate_next 
0x482d078c000~c000
2021-05-17 18:37:07.176206 7f416d20bec0 10 stupidalloc 0x0x564e4e804f50 
init_add_free 0x482d078c000~c000
[root@hk-cephosd-2002 ~]# tail -f /tmp/ceph-osd-44-fsck.log
2021-05-17 18:39:16.466967 7f416d20bec0 20 bluefs _read_random read buffered 
0x2cd6e8f~ed6 of 1:0x372e0700000+4200000
2021-05-17 18:39:16.467154 7f416d20bec0 20 bluefs _read_random got 3798
2021-05-17 18:39:16.467179 7f416d20bec0 10 bluefs _read_random h 0x564e4e658500 
0x24d6e35~ee2 from file(ino 216551 size 0x43a382d mtime 2021-05-17 
13:21:19.839668 bdev 1 allocated 4400000 extents [1:0x35bc7c00000+4400000])
2021-05-17 18:39:16.467186 7f416d20bec0 20 bluefs _read_random read buffered 
0x24d6e35~ee2 of 1:0x35bc7c00000+4400000
2021-05-17 18:39:16.467409 7f416d20bec0 20 bluefs _read_random got 3810

and

uh oh, missing shared_blob

I've set back buffered_io to false back because when restart the osds always 
had to wait to fix degraded pgs.
Many of the SSDs are smashing at the moment on 100% and don't really know what 
to do to stop the process and bring back the 2 ssds :/

Some paste: https://justpaste.it/9bj3a

Some metric (each column is 1 server metric, total 3 servers):
How it is smashing the ssds: https://i.ibb.co/x3xm0Rj/ssds.png
IOWAIT Super high due to ssd utilization: https://i.ibb.co/683TR9y/iowait.png
Capacity seems coming back: https://i.ibb.co/mz4Lq2r/space.png

Thank you the help.

________________________________
This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to