Re: [ceph-users] "ceph fs" commands hang forever and kill monitors

Richard Hesketh Wed, 27 Sep 2017 05:18:39 -0700

On 27/09/17 12:32, John Spray wrote:
> On Wed, Sep 27, 2017 at 12:15 PM, Richard Hesketh
> <richard.hesk...@rd.bbc.co.uk> wrote:
>> As the subject says... any ceph fs administrative command I try to run hangs 
>> forever and kills monitors in the background - sometimes they come back, on 
>> a couple of occasions I had to manually stop/restart a suffering mon. Trying 
>> to load the filesystem tab in the ceph-mgr dashboard dumps an error and can 
>> also kill a monitor. However, clients can mount the filesystem and 
>> read/write data without issue.
>>
>> Relevant excerpt from logs on an affected monitor, just trying to run 'ceph 
>> fs ls':
>>
>> 2017-09-26 13:20:50.716087 7fc85fdd9700  0 mon.vm-ds-01@0(leader) e19 
>> handle_command mon_command({"prefix": "fs ls"} v 0) v1
>> 2017-09-26 13:20:50.727612 7fc85fdd9700  0 log_channel(audit) log [DBG] : 
>> from='client.? 10.10.10.1:0/2771553898' entity='client.admin' 
>> cmd=[{"prefix": "fs ls"}]: dispatch
>> 2017-09-26 13:20:50.950373 7fc85fdd9700 -1 
>> /build/ceph-12.2.0/src/osd/OSDMap.h: In function 'const string& 
>> OSDMap::get_pool_name(int64_t) const' thread 7fc85fdd9700 time 2017-09-26 
>> 13:20:50.727676
>> /build/ceph-12.2.0/src/osd/OSDMap.h: 1176: FAILED assert(i != 
>> pool_name.end())
>>
>>  ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
>> const*)+0x102) [0x55a8ca0bb642]
>>  2: (()+0x48165f) [0x55a8c9f4165f]
>>  3: 
>> (MDSMonitor::preprocess_command(boost::intrusive_ptr<MonOpRequest>)+0x1d18) 
>> [0x55a8ca047688]
>>  4: (MDSMonitor::preprocess_query(boost::intrusive_ptr<MonOpRequest>)+0x2a8) 
>> [0x55a8ca048008]
>>  5: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x700) 
>> [0x55a8c9f9d1b0]
>>  6: (Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>)+0x1f93) 
>> [0x55a8c9e63193]
>>  7: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0xa0e) 
>> [0x55a8c9e6a52e]
>>  8: (Monitor::_ms_dispatch(Message*)+0x6db) [0x55a8c9e6b57b]
>>  9: (Monitor::ms_dispatch(Message*)+0x23) [0x55a8c9e9a053]
>>  10: (DispatchQueue::entry()+0xf4a) [0x55a8ca3b5f7a]
>>  11: (DispatchQueue::DispatchThread::entry()+0xd) [0x55a8ca16bc1d]
>>  12: (()+0x76ba) [0x7fc86b3ac6ba]
>>  13: (clone()+0x6d) [0x7fc869bd63dd]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
>> interpret this.
>>
>> I'm running Luminous. The cluster and FS have been in service since Hammer 
>> and have default data/metadata pool names. I discovered the issue after 
>> attempting to enable directory sharding.
> 
> Well that's not good...
> 
> The assertion is because your FSMap is referring to a pool that
> apparently no longer exists in the OSDMap.  This should be impossible
> in current Ceph (we forbid removing pools if they're in use), but
> could perhaps have been caused in an earlier version of Ceph when it
> was possible to remove a pool even if CephFS was referring to it?
> 
> Alternatively, perhaps something more severe is going on that's
> causing your mons to see a wrong/inconsistent view of the world.  Has
> the cluster ever been through any traumatic disaster recovery type
> activity involving hand-editing any of the cluster maps?  What
> intermediate versions has it passed through on the way from Hammer to
> Luminous?
> 
> Opened a ticket here: http://tracker.ceph.com/issues/21568
> 
> John


I've reviewed my notes (i.e. I've grepped my IRC logs); I actually inherited 
this cluster from a colleague who left shortly after I joined, so unfortunately 
there is some of its history I cannot fill in.

Turns out the cluster actually predates Firefly. Looking at dates my suspicion 
is that it went Emperor -> Firefly -> Giant -> Hammer. I inherited it at 
Hammer, and took it Hammer -> Infernalis -> Jewel -> Luminous myself. I know I 
did make sure to do the tmap_upgrade step on cephfs but can't remember if I did 
it at Infernalis or Jewel.

Infernalis was a tricky upgrade; the attempt was aborted once after the first 
set of OSDs didn't come back up after upgrade (had to remove/downgrade and 
readd), and setting sortbitwise as the documentation suggested after a 
successful second attempt caused everything to break and degrade slowly until 
it was unset and recovered. Never had disaster recovery involve mucking around 
with the pools while I was administrating it, but unfortunately I cannot speak 
for the cluster's pre-Hammer history. The only pools I have removed were ones I 
created temporarily for testing crush rules/benchmarking.

I have hand-edited the crush map (extract, decompile, modify, recompile, 
inject) at times because I found it more convenient for creating new crush 
rules than using the CLI tools, but not the OSD map.

Why would the cephfs have been referring to other pools?

Rich

signature.asc
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] "ceph fs" commands hang forever and kill monitors

Reply via email to