- Description has changed:

Diff:

~~~~

--- old
+++ new
@@ -30,7 +30,7 @@
  
  const unsigned int sleep_delay_ms = 500;
 -const unsigned int max_waiting_time_ms = 60 * 1000; /* 60 seconds */
-+const unsigned int max_waiting_time_ms = 180 * 1000; /* 60 seconds */
++const unsigned int max_waiting_time_ms = 120 * 1000; /* 120 seconds */
  
  /**
   * Initialize the track response patricia tree for the node

~~~~




---

** [tickets:#2950] clmd: crash after split brain event**

**Status:** accepted
**Milestone:** 5.18.12
**Created:** Tue Oct 30, 2018 11:21 AM UTC by Gary Lee
**Last Updated:** Wed Oct 31, 2018 12:36 AM UTC
**Owner:** nobody


After applying  [#2935] so that one SC is kept up after a split brain, clmd 
sometimes crashes:

~~~
2018-10-30 22:04:38.926 SC-1 osafimmnd[211]: NO SERVER STATE: 
IMM_SERVER_SYNC_SERVER --> IMM_SERVER_READY
2018-10-30 22:04:38.926 SC-1 osafimmpbed: NO Update epoch 4 committing with 
ccbId:10000002a/4294967338
2018-10-30 22:04:39.699 SC-1 osafclmd[275]: ER saImmOiImplementerSet failed rc: 
6, exiting
2018-10-30 22:04:39.701 SC-1 osafamfnd[304]: ER 
safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
2018-10-30 22:04:39.701 SC-1 osafamfnd[304]: Rebooting OpenSAF NodeId = 131343 
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131343, SupervisionTime = 60
~~~

CLMD trace:

~~~
<143>1 2018-10-31T10:38:11.036839+11:00 SC-1 osafclmd 275 osafclmd [meta 
sequenceId="20598"] 275:clm/clmd/clms_imm.cc:820 >> clms_retry_pending_rtupdates
<143>1 2018-10-31T10:38:11.036848+11:00 SC-1 osafclmd 275 osafclmd [meta 
sequenceId="20599"] 275:clm/clmd/clms_imm.cc:823 << 
clms_retry_pending_rtupdates: Implementerset yet to happen, try later
<143>1 2018-10-31T10:38:11.036861+11:00 SC-1 osafclmd 275 osafclmd [meta 
sequenceId="20600"] 275:clm/clmd/clms_main.cc:490 TR There is an IMM task to be 
tried again. setting poll time out to 500
<143>1 2018-10-31T10:38:11.099564+11:00 SC-1 osafclmd 275 osafclmd [meta 
sequenceId="20601"] 278:mds/mds_dt_trans.c:755 >> 
mdtm_process_poll_recv_data_tcp
<139>1 2018-10-31T10:38:11.09987+11:00 SC-1 osafclmd 275 osafclmd [meta 
sequenceId="20602"] 600:clm/clmd/clms_imm.cc:2771 ER saImmOiImplementerSet 
failed rc: 6, exiting
~~~

Increasing the waiting time appears to fix the issue.

~~~
diff --git a/src/clm/clmd/clms_imm.cc b/src/clm/clmd/clms_imm.cc
index 017607d..cea4755 100644
--- a/src/clm/clmd/clms_imm.cc
+++ b/src/clm/clmd/clms_imm.cc
@@ -42,7 +42,7 @@ static uint32_t clms_lock_send_no_start_cbk(CLMS_CLUSTER_NODE 
*nodeop);
 static const SaVersionT immVersion = {'A', 2, 1};
 
 const unsigned int sleep_delay_ms = 500;
-const unsigned int max_waiting_time_ms = 60 * 1000; /* 60 seconds */
+const unsigned int max_waiting_time_ms = 120 * 1000; /* 120 seconds */
 
 /**
  * Initialize the track response patricia tree for the node
~~~



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to