On 19. okt. 2016 13:00, Ronny Aasen wrote:
On 06. okt. 2016 13:41, Ronny Aasen wrote:
hello

I have a few osd's in my cluster that are regularly crashing.

[snip]


ofcourse having 3 osd's dying regularly is not good for my health. so i
have set noout, to avoid heavy recoveries.

googeling this error messages gives exactly 1 hit:
https://github.com/ceph/ceph/pull/6946

where it saies:  "the shard must be removed so it can be reconstructed"
but with my 3 osd's failing, i am not certain witch of them contain the
broken shard. (or perhaps all 3 of them?)

a bit reluctant to delete on all 3. I have 4+2 erasure coding.
( erasure size 6 min_size 4 ) so finding out witch one is bad would be
nice.

hope someone have an idea how to progress.

kind regards
Ronny Aasen

i again have this problem with crashing osd's. a more detailed log is on
the tail of this mail.

Does anyone have any suggestions on how i can identify what shard that
needs to be removed to allow the EC to recover. ?

and more importantly how i can stop the osd's from crashing?


kind regards
Ronny Aasen


Answering my own question for googleabillity.

using this one liner.

for dir in $(find /var/lib/ceph/osd/ceph-* -maxdepth 2 -type d -name '5.26*' | sort | uniq) ; do find $dir -name '*3a3938238e1f29.00000000002d80ca*' -type f -ls ;done

i got a list of all shards of the problematic object.
One of the object had size 0 but was otherways readable without any io errors. I guess this explains the inconsistent size, but it does not explain why ceph decides it's better to crash 3 osd's, rather then move a 0 byte file into a "LOST+FOUND" style directory structure.
Or just delete it, since it will not have any useful data anyway.

Deleting this file (mv to /tmp). allowed the 3 broken osd's to start, and have been running for >24h now. while usualy they crash within 10 minutes. Yay!

Generally you need to check _all_ shards on the given pg. Not just the 3 crashing. This was what confused me since i only focused on the crashing osd's

I used the oneliner that checked osd's for the pg since due to backfilling the pg was spread all over the place. And i could run it from ansible to reduce tedious work.

Also it would be convinient to be able to mark a broken/inconsistent pg manually "inactive". Instead of crashing 3 osd's and taking lots of other pg's with them down. One could set the pg inactive while troubleshooting, and unset pg-inactive when done. without having osd's crash and all the following high load rebalancing.

Also i ran a find for 0 size files on that pg and there are multiple other files. are a 0 byte rbd_data file on a pg a normal occurence, or can i have more similar problems in the future due to the other 0 size files ?


kind regards
Ronny Aasen


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to