Hi,

I wanted to explore the stretch mode in pacific (16.2.4) and see how it behaves with a DC failure. It seems as if I'm hitting the same or at least a similar issue here. To verify if it's the stretch mode I removed the cluster and rebuilt it without stretch mode, three hosts in three DCs and started to reboot. First I rebooted one node, the cluster came back to HEALTH_OK. Then I rebooted two of the three nodes and again everything recovered successfully. Then I rebuilt a 5 node cluster, two DCs in stretch mode with three MONs, one being a tiebreaker in a virtual third DC. The stretch rule was applied (4 replicas across all 4 nodes).

To test a DC failure I simply shut down two nodes from DC2, although the pool's min_size was reduced to 1 by ceph I couldn't read or write anything to a mapped rbd, althouh ceph still was responsive with two active MONs. When I booted the other two nodes again the cluster was not able to recover, it ends up in a loop of restarting the MON containers (the OSDs recover eventually) until systemd shuts them down due to too many restarts. For a couple of seconds I get a ceph status, but I never get all three MONs up. When there are two MONs up and I restart the missing one a different MON is shut down.

I also see the error message mentioned here in this thread

heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7ff3b3aa5700' had timed out after 0.000000000s

I'll add some more information, a stack trace from MON failure:

---snip---
2021-05-25T15:44:26.988562+02:00 pacific1 conmon[5132]: 5 mon.pacific1@0(leader).paxos(paxos updating c 9288..9839) is_readable = 1 - now=2021-05-25T13:44:26.730359+0000 lease_expire=2021-05-25T13:44:30.270907+0000 has v0 lc 9839 2021-05-25T15:44:26.988638+02:00 pacific1 conmon[5132]: debug -5> 2021-05-25T13:44:26.726+0000 7ff3b1aa1700 2 mon.pacific1@0(leader) e13 send_reply 0x55e37aae3860 0x55e37affa9c0 auth_reply(proto 2 0 (0) Success) v1 2021-05-25T15:44:26.988714+02:00 pacific1 conmon[5132]: debug -4> 2021-05-25T13:44:26.726+0000 7ff3b1aa1700 5 mon.pacific1@0(leader).paxos(paxos updating c 9288..9839) is_readable = 1 - now=2021-05-25T13:44:26.731084+0000 lease_expire=2021-05-25T13:44:30.270907+0000 has v0 lc 9839 2021-05-25T15:44:26.988790+02:00 pacific1 conmon[5132]: debug -3> 2021-05-25T13:44:26.726+0000 7ff3b1aa1700 2 mon.pacific1@0(leader) e13 send_reply 0x55e37b14def0 0x55e37ab11ba0 auth_reply(proto 2 0 (0) Success) v1 2021-05-25T15:44:26.988929+02:00 pacific1 conmon[5132]: debug -2> 2021-05-25T13:44:26.730+0000 7ff3b1aa1700 5 mon.pacific1@0(leader).osd e117 send_incremental [105..117] to client.84146 2021-05-25T15:44:26.989012+02:00 pacific1 conmon[5132]: debug -1> 2021-05-25T13:44:26.734+0000 7ff3b1aa1700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc: In function 'void OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, uint64_t) const' thread 7ff3b1aa1700 time 2021-05-25T13:44:26.732857+0000 2021-05-25T15:44:26.989087+02:00 pacific1 conmon[5132]: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc: 658: FAILED ceph_assert(target_v >= 9)
2021-05-25T15:44:26.989163+02:00 pacific1 conmon[5132]:
2021-05-25T15:44:26.989239+02:00 pacific1 conmon[5132]: ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable) 2021-05-25T15:44:26.989314+02:00 pacific1 conmon[5132]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7ff3bf61a59c] 2021-05-25T15:44:26.989388+02:00 pacific1 conmon[5132]: 2: /usr/lib64/ceph/libceph-common.so.2(+0x2767b6) [0x7ff3bf61a7b6] 2021-05-25T15:44:26.989489+02:00 pacific1 conmon[5132]: 3: (OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, unsigned long) const+0x539) [0x7ff3bfa529f9] 2021-05-25T15:44:26.989560+02:00 pacific1 conmon[5132]: 4: (OSDMonitor::reencode_incremental_map(ceph::buffer::v15_2_0::list&, unsigned long)+0x1c9) [0x55e377b36df9] 2021-05-25T15:44:26.989627+02:00 pacific1 conmon[5132]: 5: (OSDMonitor::get_version(unsigned long, unsigned long, ceph::buffer::v15_2_0::list&)+0x1f4) [0x55e377b37234] 2021-05-25T15:44:26.989693+02:00 pacific1 conmon[5132]: 6: (OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned long)+0x301) [0x55e377b3a3c1] 2021-05-25T15:44:26.989759+02:00 pacific1 conmon[5132]: 7: (OSDMonitor::send_incremental(unsigned int, MonSession*, bool, boost::intrusive_ptr<MonOpRequest>)+0x104) [0x55e377b3b094] 2021-05-25T15:44:26.989825+02:00 pacific1 conmon[5132]: 8: (OSDMonitor::check_osdmap_sub(Subscription*)+0x72) [0x55e377b42792] 2021-05-25T15:44:26.989891+02:00 pacific1 conmon[5132]: 9: (Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0xe82) [0x55e3779da402] 2021-05-25T15:44:26.989967+02:00 pacific1 conmon[5132]: 10: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x78d) [0x55e377a002ed] 2021-05-25T15:44:26.990046+02:00 pacific1 conmon[5132]: 11: (Monitor::_ms_dispatch(Message*)+0x670) [0x55e377a01910] 2021-05-25T15:44:26.990113+02:00 pacific1 conmon[5132]: 12: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c) [0x55e377a2ffdc] 2021-05-25T15:44:26.990179+02:00 pacific1 conmon[5132]: 13: (DispatchQueue::entry()+0x126a) [0x7ff3bf854b1a] 2021-05-25T15:44:26.990255+02:00 pacific1 conmon[5132]: 14: (DispatchQueue::DispatchThread::entry()+0x11) [0x7ff3bf904b71] 2021-05-25T15:44:26.990330+02:00 pacific1 conmon[5132]: 15: /lib64/libpthread.so.0(+0x814a) [0x7ff3bd10a14a]
2021-05-25T15:44:26.990420+02:00 pacific1 conmon[5132]:  16: clone()
2021-05-25T15:44:26.990497+02:00 pacific1 conmon[5132]:
2021-05-25T15:44:26.990573+02:00 pacific1 conmon[5132]: debug 0> 2021-05-25T13:44:26.742+0000 7ff3b1aa1700 -1 *** Caught signal (Aborted) ** 2021-05-25T15:44:26.990648+02:00 pacific1 conmon[5132]: in thread 7ff3b1aa1700 thread_name:ms_dispatch
2021-05-25T15:44:26.990723+02:00 pacific1 conmon[5132]:
2021-05-25T15:44:26.990806+02:00 pacific1 conmon[5132]: ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable) 2021-05-25T15:44:26.990883+02:00 pacific1 conmon[5132]: 1: /lib64/libpthread.so.0(+0x12b20) [0x7ff3bd114b20]
2021-05-25T15:44:26.990958+02:00 pacific1 conmon[5132]:  2: gsignal()
2021-05-25T15:44:26.991034+02:00 pacific1 conmon[5132]:  3: abort()
2021-05-25T15:44:26.991110+02:00 pacific1 conmon[5132]: 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7ff3bf61a5ed] 2021-05-25T15:44:26.991176+02:00 pacific1 conmon[5132]: 5: /usr/lib64/ceph/libceph-common.so.2(+0x2767b6) [0x7ff3bf61a7b6] 2021-05-25T15:44:26.991251+02:00 pacific1 conmon[5132]: 6: (OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, unsigned long) const+0x539) [0x7ff3bfa529f9] 2021-05-25T15:44:26.991326+02:00 pacific1 conmon[5132]: 7: (OSDMonitor::reencode_incremental_map(ceph::buffer::v15_2_0::list&, unsigned long)+0x1c9) [0x55e377b36df9] 2021-05-25T15:44:26.991393+02:00 pacific1 conmon[5132]: 8: (OSDMonitor::get_version(unsigned long, unsigned long, ceph::buffer::v15_2_0::list&)+0x1f4) [0x55e377b37234] 2021-05-25T15:44:26.991460+02:00 pacific1 conmon[5132]: 9: (OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned long)+0x301) [0x55e377b3a3c1] 2021-05-25T15:44:26.991557+02:00 pacific1 conmon[5132]: 10: (OSDMonitor::send_incremental(unsigned int, MonSession*, bool, boost::intrusive_ptr<MonOpRequest>)+0x104) [0x55e377b3b094] 2021-05-25T15:44:26.991628+02:00 pacific1 conmon[5132]: 11: (OSDMonitor::check_osdmap_sub(Subscription*)+0x72) [0x55e377b42792] 2021-05-25T15:44:26.991695+02:00 pacific1 conmon[5132]: 12: (Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0xe82) [0x55e3779da402] 2021-05-25T15:44:26.991761+02:00 pacific1 conmon[5132]: 13: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x78d) [0x55e377a002ed] 2021-05-25T15:44:26.991827+02:00 pacific1 conmon[5132]: 14: (Monitor::_ms_dispatch(Message*)+0x670) [0x55e377a01910] 2021-05-25T15:44:26.991893+02:00 pacific1 conmon[5132]: 15: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c) [0x55e377a2ffdc] 2021-05-25T15:44:26.991959+02:00 pacific1 conmon[5132]: 16: (DispatchQueue::entry()+0x126a) [0x7ff3bf854b1a] 2021-05-25T15:44:26.992025+02:00 pacific1 conmon[5132]: 17: (DispatchQueue::DispatchThread::entry()+0x11) [0x7ff3bf904b71] 2021-05-25T15:44:26.992091+02:00 pacific1 conmon[5132]: 18: /lib64/libpthread.so.0(+0x814a) [0x7ff3bd10a14a]
2021-05-25T15:44:26.992156+02:00 pacific1 conmon[5132]:  19: clone()
---snip---


I can't tell if this is due to the limited resources in my virtual cluster but I figured since the non-stretch mode seems to work as expected this could be a problem with the stretch mode. I can provide more information if required, just let me know what I can do.

Regards,
Eugen


[1] https://docs.ceph.com/en/latest/rados/operations/stretch-mode/


Zitat von Adrian Nicolae <adrian.nico...@rcs-rds.ro>:

Hi guys,

I'm testing Ceph Pacific 16.2.4 in my lab before deciding if I will put it in production on a 1PB+ storage cluster with rgw-only access.

I noticed a weird issue with my mons :

- if I reboot a mon host, the ceph-mon container is not starting after reboot

- I can see with 'ceph orch ps' the following output :

mon.node01               node01               running (20h)   4m ago     20h   16.2.4     8d91d370c2b8  0a2e86af94b2 mon.node02               node02               running (115m)  12s ago    115m  16.2.4     8d91d370c2b8  51f4885a1b06 mon.node03               node03               stopped         4m ago     19h   <unknown>  <unknown>     <unknown>

(where node03 is the host which was rebooted).

- I tried to start the mon container manually on node03 with '/bin/bash /var/lib/ceph/c2d41ac4-baf5-11eb-865d-2dc838a337a3/mon.node03/unit.run' and I've got the following output :

debug 2021-05-23T08:24:25.192+0000 7f9a9e358700  0 mon.node03@-1(???).osd e408 crush map has features 3314933069573799936, adjusting msgr requires debug 2021-05-23T08:24:25.192+0000 7f9a9e358700  0 mon.node03@-1(???).osd e408 crush map has features 432629308056666112, adjusting msgr requires debug 2021-05-23T08:24:25.192+0000 7f9a9e358700  0 mon.node03@-1(???).osd e408 crush map has features 432629308056666112, adjusting msgr requires debug 2021-05-23T08:24:25.192+0000 7f9a9e358700  0 mon.node03@-1(???).osd e408 crush map has features 432629308056666112, adjusting msgr requires cluster 2021-05-23T08:07:12.189243+0000 mgr.node01.ksitls (mgr.14164) 36380 : cluster [DBG] pgmap v36392: 417 pgs: 417 active+clean; 33 KiB data, 605 MiB used, 651 GiB / 652 GiB avail; 9.6 KiB/s rd, 0 B/s wr, 15 op/s debug 2021-05-23T08:24:25.196+0000 7f9a9e358700  1 mon.node03@-1(???).paxosservice(auth 1..51) refresh upgraded, format 0 -> 3 debug 2021-05-23T08:24:25.208+0000 7f9a88176700  1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7f9a88176700' had timed out after 0.000000000s debug 2021-05-23T08:24:25.208+0000 7f9a9e358700  0 mon.node03@-1(probing) e5  my rank is now 1 (was -1) debug 2021-05-23T08:24:25.212+0000 7f9a87975700  0 mon.node03@1(probing) e6  removed from monmap, suicide.

root@node03:/home/adrian# systemctl status ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3@mon.node03.service ● ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3@mon.node03.service - Ceph mon.node03 for c2d41ac4-baf5-11eb-865d-2dc838a337a3      Loaded: loaded (/etc/systemd/system/ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3@.service; enabled; vendor preset: enabled)
     Active: inactive (dead) since Sun 2021-05-23 08:10:00 UTC; 16min ago
    Process: 1176 ExecStart=/bin/bash /var/lib/ceph/c2d41ac4-baf5-11eb-865d-2dc838a337a3/mon.node03/unit.run (code=exited, status=0/SUCCESS)     Process: 1855 ExecStop=/usr/bin/docker stop ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3-mon.node03 (code=exited, status=1/FAILURE)     Process: 1861 ExecStopPost=/bin/bash /var/lib/ceph/c2d41ac4-baf5-11eb-865d-2dc838a337a3/mon.node03/unit.poststop (code=exited, status=0/SUCCESS)
   Main PID: 1176 (code=exited, status=0/SUCCESS)

The only fix I could find was to redeploy the mon with :

ceph orch daemon rm  mon.node03 --force
ceph orch daemon add mon node03

However, even if it's working after redeploy, it's not giving me a lot of trust to use it in a production environment having an issue like that.  I could reproduce it with 2 different mons so it's not just an exception.

My setup is based on Ubuntu 20.04 and docker instead of podman :

root@node01:~# docker -v
Docker version 20.10.6, build 370c289

Do you know a workaround for this issue or is this a known bug ? I noticed that there are some other complaints with the same behaviour in Octopus as well and the solution at that time was to delete the /var/lib/ceph/mon folder .


Thanks.






_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to