Re: [ceph-users] offending shards are crashing osd's

2016-10-19 Thread Ronny Aasen

On 06. okt. 2016 13:41, Ronny Aasen wrote:

hello

I have a few osd's in my cluster that are regularly crashing.


[snip]



ofcourse having 3 osd's dying regularly is not good for my health. so i
have set noout, to avoid heavy recoveries.

googeling this error messages gives exactly 1 hit:
https://github.com/ceph/ceph/pull/6946

where it saies:  "the shard must be removed so it can be reconstructed"
but with my 3 osd's failing, i am not certain witch of them contain the
broken shard. (or perhaps all 3 of them?)

a bit reluctant to delete on all 3. I have 4+2 erasure coding.
( erasure size 6 min_size 4 ) so finding out witch one is bad would be
nice.

hope someone have an idea how to progress.

kind regards
Ronny Aasen


i again have this problem with crashing osd's. a more detailed log is on 
the tail of this mail.


Does anyone have any suggestions on how i can identify what shard that 
needs to be removed to allow the EC to recover. ?


and more importantly how i can stop the osd's from crashing?


kind regards
Ronny Aasen





-- query of pg in question --
# ceph pg 5.26 query
{
 "state": "active+undersized+degraded+remapped+wait_backfill",
 "snap_trimq": "[]",
 "epoch": 138744,
 "up": [
 27,
 109,
 2147483647,
 2147483647,
 62,
 75
 ],
 "acting": [
 2147483647,
 2147483647,
 32,
 107,
 62,
 38
 ],
 "backfill_targets": [
 "27(0)",
 "75(5)",
 "109(1)"
 ],
 "actingbackfill": [
 "27(0)",
 "32(2)",
 "38(5)",
 "62(4)",
 "75(5)",
 "107(3)",
 "109(1)"
 ],
 "info": {
 "pgid": "5.26s2",
 "last_update": "84093'35622",
 "last_complete": "84093'35622",
 "log_tail": "82361'32622",
 "last_user_version": 0,
 "last_backfill": "MAX",
 "purged_snaps": "[1~7]",
 "history": {
 "epoch_created": 61149,
 "last_epoch_started": 138692,
 "last_epoch_clean": 136567,
 "last_epoch_split": 0,
 "same_up_since": 138691,
 "same_interval_since": 138691,
 "same_primary_since": 138691,
 "last_scrub": "84093'35622",
 "last_scrub_stamp": "2016-10-18 06:18:28.253508",
 "last_deep_scrub": "84093'35622",
 "last_deep_scrub_stamp": "2016-10-14 05:33:56.701167",
 "last_clean_scrub_stamp": "2016-10-14 05:33:56.701167"
 },
 "stats": {
 "version": "84093'35622",
 "reported_seq": "210475",
 "reported_epoch": "138730",
 "state": "active+undersized+degraded+remapped+wait_backfill",
 "last_fresh": "2016-10-19 12:40:32.982617",
 "last_change": "2016-10-19 12:03:29.377914",
 "last_active": "2016-10-19 12:40:32.982617",
 "last_peered": "2016-10-19 12:40:32.982617",
 "last_clean": "2016-07-19 12:03:54.814292",
 "last_became_active": "0.00",
 "last_became_peered": "0.00",
 "last_unstale": "2016-10-19 12:40:32.982617",
 "last_undegraded": "2016-10-19 12:02:03.030755",
 "last_fullsized": "2016-10-19 12:02:03.030755",
 "mapping_epoch": 138627,
 "log_start": "82361'32622",
 "ondisk_log_start": "82361'32622",
 "created": 61149,
 "last_epoch_clean": 136567,
 "parent": "0.0",
 "parent_split_bits": 0,
 "last_scrub": "84093'35622",
 "last_scrub_stamp": "2016-10-18 06:18:28.253508",
 "last_deep_scrub": "84093'35622",
 "last_deep_scrub_stamp": "2016-10-14 05:33:56.701167",
 "last_clean_scrub_stamp": "2016-10-14 05:33:56.701167",
 "log_size": 3000,
 "ondisk_log_size": 3000,
 "stats_invalid": "0",
 "stat_sum": {
 "num_bytes": 99736657920,
 "num_objects": 12026,
 "num_object_clones": 0,
 "num_object_copies": 84182,
 "num_objects_missing_on_primary": 0,
 "num_objects_degraded": 24052,
 "num_objects_misplaced": 90583,
 "num_objects_unfound": 0,
 "num_objects_dirty": 12026,
 "num_whiteouts": 0,
 "num_read": 86122,
 "num_read_kb": 9446184,
 "num_write": 35622,
 "num_write_kb": 182277312,
 "num_scrub_errors": 0,
 "num_shallow_scrub_errors": 0,
 "num_deep_scrub_errors": 0,
 "num_objects_recovered": 0,
 "num_bytes_recovered": 0,
 "num_keys_recovered": 0,
 "num_objects_omap": 0,
 "num_objects_hit_set_archive": 0,
 "n

Re: [ceph-users] offending shards are crashing osd's

2016-10-21 Thread Ronny Aasen

On 19. okt. 2016 13:00, Ronny Aasen wrote:

On 06. okt. 2016 13:41, Ronny Aasen wrote:

hello

I have a few osd's in my cluster that are regularly crashing.


[snip]



ofcourse having 3 osd's dying regularly is not good for my health. so i
have set noout, to avoid heavy recoveries.

googeling this error messages gives exactly 1 hit:
https://github.com/ceph/ceph/pull/6946

where it saies:  "the shard must be removed so it can be reconstructed"
but with my 3 osd's failing, i am not certain witch of them contain the
broken shard. (or perhaps all 3 of them?)

a bit reluctant to delete on all 3. I have 4+2 erasure coding.
( erasure size 6 min_size 4 ) so finding out witch one is bad would be
nice.

hope someone have an idea how to progress.

kind regards
Ronny Aasen


i again have this problem with crashing osd's. a more detailed log is on
the tail of this mail.

Does anyone have any suggestions on how i can identify what shard that
needs to be removed to allow the EC to recover. ?

and more importantly how i can stop the osd's from crashing?


kind regards
Ronny Aasen



Answering my own question for googleabillity.

using this one liner.

for dir in $(find /var/lib/ceph/osd/ceph-* -maxdepth 2  -type d -name 
'5.26*' | sort | uniq) ; do find $dir -name 
'*3a3938238e1f29.002d80ca*' -type f -ls ;done


i got a list of all shards of the problematic object.
One of the object had size 0 but was otherways readable without any io 
errors. I guess this explains the inconsistent size, but it does not 
explain why ceph decides it's better to crash 3 osd's, rather then move 
a 0 byte file into a "LOST+FOUND" style directory structure.

Or just delete it, since it will not have any useful data anyway.

Deleting this file (mv to /tmp). allowed the 3 broken osd's to start, 
and have been running for >24h now. while usualy they crash within 10 
minutes. Yay!


Generally you need to check _all_ shards on the given pg. Not just the 3 
crashing. This was what confused me since i only focused on the crashing 
osd's


I used the oneliner that checked osd's for the pg since due to 
backfilling the pg was spread all over the place. And i could run it 
from ansible to reduce tedious work.


Also it would be convinient to be able to mark a broken/inconsistent pg 
manually "inactive". Instead of crashing 3 osd's and taking lots of 
other pg's with them down. One could set the pg inactive while 
troubleshooting, and unset pg-inactive when done. without having osd's 
crash and all the following high load rebalancing.


Also i ran a find for 0 size files on that pg and there are multiple 
other files.  are a 0 byte rbd_data file on a pg a normal occurence, or 
can i have more similar problems in the future due to the other 0 size 
files ?



kind regards
Ronny Aasen


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com