Package: corosync Version: 3.1.0-3 Severity: normal Tags: patch upstream Forwarded: https://github.com/corosync/corosync/issues/630
As reported by Lukey3332: Sometimes corosync crashes at startup, but only if compression is enabled. Distribution: Debian Bullseye Corosync version: Corosync Cluster Engine, version '3.1.0' Copyright (c) 2006-2018 Red Hat, Inc. Kronosnet version: Package: libknet1 Source: kronosnet Version: 1.20-4 Here is a backtrace: #0 __GI___pthread_mutex_lock (mutex=mutex@entry=0x555558619958) at ../nptl/pthread_mutex_lock.c:67 #1 0x00007ffff7e6c728 in pmtud_reschedule (knet_h=0x555555577320 <_logsys_log_printf>, knet_h@entry=0x555558619958) at threads_common.c:42 #2 get_global_wrlock (knet_h=knet_h@entry=0x555555577320 <_logsys_log_printf>) at threads_common.c:61 #3 0x00007ffff7e64316 in knet_handle_compress (knet_h=0x555555577320 <_logsys_log_printf>, knet_handle_compress_cfg=0x7fffffffcea0) at compress.c:503 #4 0x000055555559ae8f in totemknet_configure_compression (knet_context=knet_context@entry=0x55555574d900, totem_config=totem_config@entry=0x7fffffffd310) at totemknet.c:1565 #5 0x000055555559c104 in totemknet_initialize (poll_handle=0x5555556ddd30, knet_context=0x55555574d900, totem_config=0x7fffffffd310, stats=<optimized out>, context=0x5555556eed20, deliver_fn=0x55555558c810 <main_deliver_fn>, iface_change_fn=0x55555558d9c0 <main_iface_change_fn>, mtu_changed=0x55555558c290 <totempg_mtu_changed>, target_set_completed=0x55555558ddd0 <target_set_completed>) at totemknet.c:1149 #6 0x0000555555588550 in totemnet_initialize (loop_pt=loop_pt@entry=0x5555556ddd30, net_context=net_context@entry=0x5555557026f8, totem_config=totem_config@entry=0x7fffffffd310, stats=0x555555702728, context=context@entry=0x5555556eed20, deliver_fn=deliver_fn@entry=0x55555558c810 <main_deliver_fn>, iface_change_fn=0x55555558d9c0 <main_iface_change_fn>, mtu_changed=0x55555558c290 <totempg_mtu_changed>, target_set_completed=0x55555558ddd0 <target_set_completed>) at totemnet.c:343 #7 0x000055555559541a in totemsrp_initialize (poll_handle=poll_handle@entry=0x5555556ddd30, srp_context=srp_context@entry=0x5555556760f0 <totemsrp_context>, totem_config=totem_config@entry=0x7fffffffd310, stats=stats@entry=0x5555556760c0 <totempg_stats>, deliver_fn=deliver_fn@entry=0x5555555970b0 <totempg_deliver_fn>, confchg_fn=confchg_fn@entry=0x5555555965a0 <totempg_confchg_fn>, waiting_trans_ack_cb_fn=0x555555596560 <totempg_waiting_trans_ack_cb>) at totemsrp.c:981 #8 0x0000555555597c28 in totempg_initialize (poll_handle=0x5555556ddd30, totem_config=totem_config@entry=0x7fffffffd310) at totempg.c:824 #9 0x000055555555e0de in main (argc=-11504, argv=<optimized out>, envp=<optimized out>) at main.c:1526 corosync.conf: # Please read the corosync.conf.5 manual page totem { version: 2 cluster_name: tele-clu key: <snip> crypto_cipher: aes256 crypto_hash: sha256 knet_compression_model: zlib knet_compression_level: 6 link_mode: passive interface { linknumber: 0 knet_link_priority: 1 } interface { linknumber: 1 knet_link_priority: 0 } token: 5000 } logging { # Log the source file and line where messages are being # generated. When in doubt, leave off. Potentially useful for # debugging. fileline: off # Log to standard error. When in doubt, set to yes. Useful when # running in the foreground (when invoking "corosync -f") to_stderr: yes # Log to a log file. When set to "no", the "logfile" option # must not be set. to_logfile: yes logfile: /var/log/corosync/corosync.log # Log to the system log daemon. When in doubt, set to yes. to_syslog: yes # Log debug messages (very verbose). When in doubt, leave off. debug: off # Log messages with time stamps. When in doubt, set to hires (or on) #timestamp: hires logger_subsys { subsys: QUORUM debug: off } } quorum { # Enable and configure quorum subsystem (default: off) # see also corosync.conf.5 and votequorum.5 provider: corosync_votequorum } nodelist { node { # Hostname of the node name: tele-clu-01 # Cluster membership node identifier nodeid: 1 ring0_addr: 192.168.233.1 ring1_addr: 192.168.178.241 } node { # Hostname of the node name: tele-clu-02 # Cluster membership node identifier nodeid: 2 ring0_addr: 192.168.233.2 ring1_addr: 192.168.178.242 } node { # Hostname of the node name: tele-clu-03 # Cluster membership node identifier nodeid: 3 ring0_addr: 192.168.233.6 ring1_addr: 192.168.178.243 } } ------------------------------ As commented by fabbione: It turns out the issue is in corosync configuration handling when doing compress. Fix for master is here: https://github.com/corosync/corosync/pull/631 Same patch applies to 3.1.1 Backport to 3.1.0 attached to the upstream issue.