Re: [ceph-users] BlueStore.cc: 9363: FAILED assert(0 == "unexpected error")

David Turner Sat, 27 Jan 2018 07:48:01 -0800

I looked up the procedure to rebuild the metadata pool for CephFS and it
looks really doable. I have the increased complication of the cache tier in
here.  I was curious if it's possible to create a new CephFS with an
existing data pool and build the metadata pool off of it. That would allow
me to get rid of the cache tier by removing the FS layer on top of it and
then recreate the FS with the same data pool.


Any thoughts? I don't have a good test cluster to create a new FS on it and
do these things to it.

On Fri, Jan 26, 2018, 8:44 AM David Turner <drakonst...@gmail.com> wrote:

> I also just got my new SSDs that are 480GB if they could be used to move
> the PGs to.  Thank you for your help.
>
> On Fri, Jan 26, 2018 at 8:33 AM David Turner <drakonst...@gmail.com>
> wrote:
>
>> If I could get it started, I could flush-evict the cache, but that's not
>> seeming likely.
>>
>> On Fri, Jan 26, 2018 at 8:33 AM David Turner <drakonst...@gmail.com>
>> wrote:
>>
>>> I wouldn't be shocked if they were out of space, but `ceph osd df` only
>>> showed them as 45% full when I was first diagnosing this.  Now they are
>>> showing completely full with the same command.  I'm thinking the cache tier
>>> behavior might have changed to Luminous because I was keeping my cache
>>> completely empty before with a max target objects of 0 which flushed things
>>> out consistently after my minimum flush age.  I noticed it wasn't keeping
>>> up with the flushing as well as it had in Jewel, but didn't think too much
>>> of it.  Anyway, that's something I can tinker with after the pools are back
>>> up and running.
>>>
>>> If they are full and on Bluestore, what can I do to clean them up?  I
>>> assume that I need to keep the metadata pool in-tact, but I don't need to
>>> maintain any data in the cache pool.  I have a copy of everything written
>>> in the last 24 hours prior to this incident and nothing is modified after
>>> it is in cephfs.
>>>
>>> On Fri, Jan 26, 2018 at 8:23 AM Nick Fisk <n...@fisk.me.uk> wrote:
>>>
>>>> I can see this in the logs:
>>>>
>>>>
>>>>
>>>> 2018-01-25 06:05:56.292124 7f37fa6ea700 -1 log_channel(cluster) log
>>>> [ERR] : full status failsafe engaged, dropping updates, now 101% full
>>>>
>>>> 2018-01-25 06:05:56.325404 7f3803f9c700 -1
>>>> bluestore(/var/lib/ceph/osd/ceph-9) _do_alloc_write failed to reserve 
>>>> 0x4000
>>>>
>>>> 2018-01-25 06:05:56.325434 7f3803f9c700 -1
>>>> bluestore(/var/lib/ceph/osd/ceph-9) _do_write _do_alloc_write failed with
>>>> (28) No space left on device
>>>>
>>>> 2018-01-25 06:05:56.325462 7f3803f9c700 -1
>>>> bluestore(/var/lib/ceph/osd/ceph-9) _txc_add_transaction error (28) No
>>>> space left on device not handled on operation 10 (op 0, counting from 0)
>>>>
>>>>
>>>>
>>>> Are they out of space, or is something mis-reporting?
>>>>
>>>>
>>>>
>>>> Nick
>>>>
>>>>
>>>>
>>>> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On
>>>> Behalf Of *David Turner
>>>> *Sent:* 26 January 2018 13:03
>>>> *To:* ceph-users <ceph-users@lists.ceph.com>
>>>> *Subject:* [ceph-users] BlueStore.cc: 9363: FAILED assert(0 ==
>>>> "unexpected error")
>>>>
>>>>
>>>>
>>>> http://tracker.ceph.com/issues/22796
>>>>
>>>>
>>>>
>>>> I was curious if anyone here had any ideas or experience with this
>>>> problem.  I created the tracker for this yesterday when I woke up to find
>>>> all 3 of my SSD OSDs not running and unable to start due to this segfault.
>>>> These OSDs are in my small home cluster and hold the cephfs_cache and
>>>> cephfs_metadata pools.
>>>>
>>>>
>>>>
>>>> To recap, I upgraded from 10.2.10 to 12.2.2, successfully swapped out
>>>> my 9 OSDs to Bluestore, reconfigured my crush rules to utilize OSD classes,
>>>> failed to remove the CephFS cache tier due to
>>>> http://tracker.ceph.com/issues/22754, created these 3 SSD OSDs and
>>>> updated the cephfs_cache and cephfs_metadata pools to use the
>>>> replicated_ssd crush rule... fast forward 2 days of this working great to
>>>> me waking up with all 3 of them crashed and unable to start.  There is an
>>>> OSD log with debug bluestore = 5 attached to the tracker at the top of the
>>>> email.
>>>>
>>>>
>>>>
>>>> My CephFS is completely down while these 2 pools are inaccessible.  The
>>>> OSDs themselves are in-tact if I need to move the data out manually to the
>>>> HDDs or something.  Any help is appreciated.
>>>>
>>>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] BlueStore.cc: 9363: FAILED assert(0 == "unexpected error")

Reply via email to