- **Milestone**: 5.19.07 --> future
---
** [tickets:#2950] clmd: crash after split brain event**
**Status:** accepted
**Milestone:** future
**Created:** Tue Oct 30, 2018 11:21 AM UTC by Gary Lee
**Last Updated:** Wed Jul 03, 2019 06:28 AM UTC
**Owner:** nobody
After applying [#2935] so that one SC is kept up after a split brain, clmd
sometimes crashes:
~~~
2018-10-30 22:04:38.926 SC-1 osafimmnd[211]: NO SERVER STATE:
IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY
2018-10-30 22:04:38.926 SC-1 osafimmpbed: NO Update epoch 4 committing with
ccbId:10000002a/4294967338
2018-10-30 22:04:39.699 SC-1 osafclmd[275]: ER saImmOiImplementerSet failed rc:
6, exiting
2018-10-30 22:04:39.701 SC-1 osafamfnd[304]: ER
safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery
is:nodeFailfast
2018-10-30 22:04:39.701 SC-1 osafamfnd[304]: Rebooting OpenSAF NodeId = 131343
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId =
131343, SupervisionTime = 60
~~~
CLMD trace:
~~~
<143>1 2018-10-31T10:38:11.036839+11:00 SC-1 osafclmd 275 osafclmd [meta
sequenceId="20598"] 275:clm/clmd/clms_imm.cc:820 >> clms_retry_pending_rtupdates
<143>1 2018-10-31T10:38:11.036848+11:00 SC-1 osafclmd 275 osafclmd [meta
sequenceId="20599"] 275:clm/clmd/clms_imm.cc:823 <<
clms_retry_pending_rtupdates: Implementerset yet to happen, try later
<143>1 2018-10-31T10:38:11.036861+11:00 SC-1 osafclmd 275 osafclmd [meta
sequenceId="20600"] 275:clm/clmd/clms_main.cc:490 TR There is an IMM task to be
tried again. setting poll time out to 500
<143>1 2018-10-31T10:38:11.099564+11:00 SC-1 osafclmd 275 osafclmd [meta
sequenceId="20601"] 278:mds/mds_dt_trans.c:755 >>
mdtm_process_poll_recv_data_tcp
<139>1 2018-10-31T10:38:11.09987+11:00 SC-1 osafclmd 275 osafclmd [meta
sequenceId="20602"] 600:clm/clmd/clms_imm.cc:2771 ER saImmOiImplementerSet
failed rc: 6, exiting
~~~
Increasing the waiting time appears to fix the issue.
~~~
diff --git a/src/clm/clmd/clms_imm.cc b/src/clm/clmd/clms_imm.cc
index 017607d..cea4755 100644
--- a/src/clm/clmd/clms_imm.cc
+++ b/src/clm/clmd/clms_imm.cc
@@ -42,7 +42,7 @@ static uint32_t clms_lock_send_no_start_cbk(CLMS_CLUSTER_NODE
*nodeop);
static const SaVersionT immVersion = {'A', 2, 1};
const unsigned int sleep_delay_ms = 500;
-const unsigned int max_waiting_time_ms = 60 * 1000; /* 60 seconds */
+const unsigned int max_waiting_time_ms = 120 * 1000; /* 120 seconds */
/**
* Initialize the track response patricia tree for the node
~~~
---
Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets