Hi

I've just started an upgrade of a test cluster from 16.2.6 -> 16.2.7 and immediately hit a problem.

The cluster started as octopus, and has upgraded through to 16.2.6 without any trouble. It is a conventional deployment on Debian 10, NOT using cephadm. All was clean before the upgrade. It contains nodes as follows:
- Node 1: MON, MGR, MDS, RGW
- Node 2: MON, MGR, MDS, RGW
- Node 3: MON
- Node 4-6: OSDs

In the absence of any specific upgrade instructions for 16.2.7, I upgraded Node 1 and rebooted. The MON on that host will now not start, throwing the following assertion:

2021-12-09T14:56:40.098+00:00 xxxxtstmon01 ceph-mon[960]: 
/build/ceph-16.2.7/src/mds/FSMap.cc: In function 'void FSMap::sanity(bool) 
const' thread 7f2d309085c0 time 2021-12-09T14:56:40.098395+0000
2021-12-09T14:56:40.098+00:00 xxxxtstmon01 ceph-mon[960]: 
/build/ceph-16.2.7/src/mds/FSMap.cc: 868: FAILED 
ceph_assert(info.compat.writeable(fs->mds_map.compat))
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  ceph version 16.2.7 
(dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  1: 
(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) 
[0x7f2d3222423c]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  2: 
/usr/lib/ceph/libceph-common.so.2(+0x277414) [0x7f2d32224414]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  3: 
(FSMap::sanity(bool) const+0x2a8) [0x7f2d327331c8]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  4: 
(MDSMonitor::update_from_paxos(bool*)+0x396) [0x55a32fe6b546]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  5: 
(PaxosService::refresh(bool*)+0x10a) [0x55a32fd960ca]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  6: 
(Monitor::refresh_from_paxos(bool*)+0x17c) [0x55a32fc54bec]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  7: 
(Monitor::init_paxos()+0xfc) [0x55a32fc54e9c]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  8: 
(Monitor::preinit()+0xbb9) [0x55a32fc7eb09]
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  9: main()
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  10: 
__libc_start_main()
2021-12-09T14:56:40.103+00:00 xxxxtstmon01 ceph-mon[960]:  11: _start()

ceph health detail merely shows mon01 down, and the 5 crashes before the 
service stopped auto-restarting.

Any ideas please?

Thanks, Chris
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to