If I could get it started, I could flush-evict the cache, but that's not seeming likely.
On Fri, Jan 26, 2018 at 8:33 AM David Turner <drakonst...@gmail.com> wrote: > I wouldn't be shocked if they were out of space, but `ceph osd df` only > showed them as 45% full when I was first diagnosing this. Now they are > showing completely full with the same command. I'm thinking the cache tier > behavior might have changed to Luminous because I was keeping my cache > completely empty before with a max target objects of 0 which flushed things > out consistently after my minimum flush age. I noticed it wasn't keeping > up with the flushing as well as it had in Jewel, but didn't think too much > of it. Anyway, that's something I can tinker with after the pools are back > up and running. > > If they are full and on Bluestore, what can I do to clean them up? I > assume that I need to keep the metadata pool in-tact, but I don't need to > maintain any data in the cache pool. I have a copy of everything written > in the last 24 hours prior to this incident and nothing is modified after > it is in cephfs. > > On Fri, Jan 26, 2018 at 8:23 AM Nick Fisk <n...@fisk.me.uk> wrote: > >> I can see this in the logs: >> >> >> >> 2018-01-25 06:05:56.292124 7f37fa6ea700 -1 log_channel(cluster) log [ERR] >> : full status failsafe engaged, dropping updates, now 101% full >> >> 2018-01-25 06:05:56.325404 7f3803f9c700 -1 >> bluestore(/var/lib/ceph/osd/ceph-9) _do_alloc_write failed to reserve 0x4000 >> >> 2018-01-25 06:05:56.325434 7f3803f9c700 -1 >> bluestore(/var/lib/ceph/osd/ceph-9) _do_write _do_alloc_write failed with >> (28) No space left on device >> >> 2018-01-25 06:05:56.325462 7f3803f9c700 -1 >> bluestore(/var/lib/ceph/osd/ceph-9) _txc_add_transaction error (28) No >> space left on device not handled on operation 10 (op 0, counting from 0) >> >> >> >> Are they out of space, or is something mis-reporting? >> >> >> >> Nick >> >> >> >> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf >> Of *David Turner >> *Sent:* 26 January 2018 13:03 >> *To:* ceph-users <ceph-users@lists.ceph.com> >> *Subject:* [ceph-users] BlueStore.cc: 9363: FAILED assert(0 == >> "unexpected error") >> >> >> >> http://tracker.ceph.com/issues/22796 >> >> >> >> I was curious if anyone here had any ideas or experience with this >> problem. I created the tracker for this yesterday when I woke up to find >> all 3 of my SSD OSDs not running and unable to start due to this segfault. >> These OSDs are in my small home cluster and hold the cephfs_cache and >> cephfs_metadata pools. >> >> >> >> To recap, I upgraded from 10.2.10 to 12.2.2, successfully swapped out my >> 9 OSDs to Bluestore, reconfigured my crush rules to utilize OSD classes, >> failed to remove the CephFS cache tier due to >> http://tracker.ceph.com/issues/22754, created these 3 SSD OSDs and >> updated the cephfs_cache and cephfs_metadata pools to use the >> replicated_ssd crush rule... fast forward 2 days of this working great to >> me waking up with all 3 of them crashed and unable to start. There is an >> OSD log with debug bluestore = 5 attached to the tracker at the top of the >> email. >> >> >> >> My CephFS is completely down while these 2 pools are inaccessible. The >> OSDs themselves are in-tact if I need to move the data out manually to the >> HDDs or something. Any help is appreciated. >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com