Good Morning,
I have an odd situation where a pg is listed inconsistent, but rados is 
struggling to tell me about it:

# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 1 requests are blocked > 32 sec; 1 osds have 
slow requests; 1 scrub errors
pg 22.1611 is active+clean+inconsistent, acting 
[294,1080,970,324,722,70,949,874,943,606,518]
1 scrub errors

# rados list-inconsistent-pg .us-smr.rgw.buckets
["22.1611"]

# rados list-inconsistent-obj 22.1611
[]error 2: (2) No such file or directory

A little background, I got into this state because the inconsistent pg popped 
up in ceph -s. I used list-inconsistent-obj to find which osd was causing the 
problem:

{
                "osd": 497,
                "missing": false,
                "read_error": true,
                "data_digest_mismatch": false,
                "omap_digest_mismatch": false,
                "size_mismatch": false,
                "size": 599488
            },


Because it was a read error I check SMART stats for that osd's disk and sure 
enough, it had some uncorrected read errors. In order to stop it from causing 
more problems I stopped the daemon to let ceph recover from the other osds. The 
cluster has now finished rebalancing, but remains in ERR state as it still 
thinks this pg is inconsistent.

ceph pg query output is here: https://hastebin.com/mamesokexa.cpp

Thanks,
Aaron
CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended 
recipient and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, any disclosure, distribution or other use of this e-mail message or 
attachments is prohibited. If you have received this e-mail message in error, 
please delete and notify the sender immediately. Thank you.

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to