- Description has changed:

Diff:

~~~~

--- old
+++ new
@@ -23,11 +23,15 @@
 ...
 ~~~
 </br>
+The main problem is the standby IMMD also broadcast D2ND_DISCARD_NODE message 
when it receives an NCSMDS_DOWN from IMMND. See immd_process_immnd_down().
+
+If the NCSMDS_DOWN event comes to the 2 IMMDs at the same time, the 2 
D2ND_DISCARD_NODE messages will be stamped with the same number. One of the 2 
will be discarded by IMMNDs, no problem here.
+But if there's a latency of NCSMDS_DOWN event, an other fevs message (in this 
case it's D2ND_DISCARD_IMPL for @OpenSafImmPBE) will be discarded by IMMNDs, 
that will cause fevs message loss.
 
 Details of the problem is explained here
 </br>
 
-http://sequencediagram.org/index.html?initialData=A4QwTgLglgxloDsIAICSBZdARAjAKFElnhCTUywCYDxo5EUN0A5LAZzzzACMB7AD2S8AbgFMw5bDgA0TKgC45OecgDKAFQCCrAEIBNZAFpJC5JoDC61ADUAogB0EjgGajhbZDF4BXJOORQHjgADAAsAMwAbJxMrB6GAMRgogAmAHwmlCqu7gD6ALaibGwgAOainDwCQmISclk5Hl6+EP6ByCERAOyVfIIi-vXyAFSjbKIIKcj53DDTbKXIzrwSjfOLneFdjtzeKFAoXoUeELwmOMgANiCtMRSURgncl96iGbHs2W5sBQvIbBAQPlgKlkAB3A4ACw6YS2vWqAzqFGUqghEBg0NOZksNls-0BrUcOz2yEhIDYCAA5ChSrwUBBIaJprN1kswLx8pkiQg2GcGUy1s0-BJ2gCoJdLjCItE8B94klUu9kV88scSuV4f1aud5IKfMKAkFYdsnAhuKIYCBvONzvjxZKyRTqchkjBRFAxFN+cy5vkFtJHAd8UDgCdGUtvqy0dDNibHJoYBBvCAJQBPaTIb1rP2LNiQnyXKbm4PA0HRmHhUIADjuUkez1eSpYnwjqr+AJDZahUrhXD6NUGmDiiiH7ACpQQKyZKW8wEusBuoOzfwAFLGAJTc3mZ8NqspM2Nsjm29qXXgA5AAQivN+vd9vfYR2qUI1GrvdYh9rOWq0jOZ7cYIOo4Z6i0bQeCmyQgCkqYAdyADqmjIOgRTqkyQoQPIh4ANQdFeAC8AF4EAA
+http://sequencediagram.org/index.html?initialData=A4QwTgLglgxloDsIAICSBZdARAjACgGUAVAIQEoAoUSWeEJNTLAJjwEEBhIy66ORFBnQA5LAGcKFMACMA9gA9ksgG4BTMI2z5i5ADRCW7LmQBcMAK5gwqhgDNVysQRsATDrPNITyHAAYALADMAGySMgpKahpComImMVhKCMgEHMzIUGLILrIA7ghhcooq6pq4hKSmwhwE2AQA+lgA8gDqwsgONhCSkgbalQC0AMTWLgB8CXEsoo2oqWwASlj1wk1YAKLIYhAgALbAqi7IuVAQABY+AYEA7L1MrJzcADxD0gA25qoDkyaizMtYOYcRbLDAABQAMshbLINAABJoHBAEEC2VC7XZgkjrQoRErRe5GbgmeyOZwINweBiZZAIWQoczAFwgCCHZAQWQAc1U51KJ3OZRwAB0ECKxLIMihtntgFlechdqoxGIQNzjqcLn4grcKAYHsZhqMJphYiZpgCgSD6uCodL9mz+Zqrjq+hVyC9OdYbN9CY9TOgSBwxMoFUqVWqYRotTdccUooK3aYwWAoAwPCh5blwAhU5yRSKWmxkOgw6rVMgYFSICZo9dkABqHzIACEAF5LtqeuE46UfkQzuXJtlMjBwEdzbN5ktrehIaHlWX8whpKpR+YxOXZLZsoy3rAWWzSVkEOZdiuwEv+yyAORZXJnACeyARSJRaIxWM2NLpKFs5jebxPi4I5jocsaRL2vrGL8NR1I0rTtJ0SCXmcNJISglaKnKEp6sgbwHho5z0IKQA
 
 </br>
 ~~~

~~~~

- **status**: unassigned --> accepted
- **assigned_to**: Hung Nguyen
- Attachments has changed:

Diff:

~~~~

--- old
+++ new
@@ -0,0 +1 @@
+logs.7z (250.9 kB; application/octet-stream)

~~~~




---

** [tickets:#2029] imm: fevs message lost during failover**

**Status:** accepted
**Milestone:** 4.7.2
**Created:** Tue Sep 13, 2016 11:05 AM UTC by Hung Nguyen
**Last Updated:** Tue Sep 13, 2016 11:05 AM UTC
**Owner:** Hung Nguyen
**Attachments:**

- [logs.7z](https://sourceforge.net/p/opensaf/tickets/2029/attachment/logs.7z) 
(250.9 kB; application/octet-stream)


There's fevs message loss when failing over between 2 SCs.

</br>
~~~
Sep  8 11:50:00 SC-2-1 osafimmnd[4241]: NO Implementer locally disconnected. 
Marking it as doomed 232 <754, 2010f> (@OpenSafImmPBE)
Sep  8 11:50:00 SC-2-1 osafimmnd[4241]: NO Implementer locally disconnected. 
Marking it as doomed 233 <755, 2010f> (OsafImmPbeRt_B)
...
Sep  8 11:50:00 SC-2-1 osafimmnd[4241]: NO Implementer disconnected 233 <755, 
2010f> (OsafImmPbeRt_B)
~~~
</br>

The IMMNDs never receive the D2ND_DISCARD_IMPL for @OpenSafImmPBE, so that 
applier keeps being mark as dying

</br>
~~~
Sep  8 11:50:02 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
Sep  8 11:50:03 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
Sep  8 11:50:04 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
...
Sep  8 11:59:08 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
Sep  8 11:59:09 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
Sep  8 11:59:10 SC-2-1 osafimmnd[4241]: NO ImmModel::getPbeBSlave reports 
missing PbeBSlave locally => unsafe
...
~~~
</br>
The main problem is the standby IMMD also broadcast D2ND_DISCARD_NODE message 
when it receives an NCSMDS_DOWN from IMMND. See immd_process_immnd_down().

If the NCSMDS_DOWN event comes to the 2 IMMDs at the same time, the 2 
D2ND_DISCARD_NODE messages will be stamped with the same number. One of the 2 
will be discarded by IMMNDs, no problem here.
But if there's a latency of NCSMDS_DOWN event, an other fevs message (in this 
case it's D2ND_DISCARD_IMPL for @OpenSafImmPBE) will be discarded by IMMNDs, 
that will cause fevs message loss.

Details of the problem is explained here
</br>

http://sequencediagram.org/index.html?initialData=A4QwTgLglgxloDsIAICSBZdARAjACgGUAVAIQEoAoUSWeEJNTLAJjwEEBhIy66ORFBnQA5LAGcKFMACMA9gA9ksgG4BTMI2z5i5ADRCW7LmQBcMAK5gwqhgDNVysQRsATDrPNITyHAAYALADMAGySMgpKahpComImMVhKCMgEHMzIUGLILrIA7ghhcooq6pq4hKSmwhwE2AQA+lgA8gDqwsgONhCSkgbalQC0AMTWLgB8CXEsoo2oqWwASlj1wk1YAKLIYhAgALbAqi7IuVAQABY+AYEA7L1MrJzcADxD0gA25qoDkyaizMtYOYcRbLDAABQAMshbLINAABJoHBAEEC2VC7XZgkjrQoRErRe5GbgmeyOZwINweBiZZAIWQoczAFwgCCHZAQWQAc1U51KJ3OZRwAB0ECKxLIMihtntgFlechdqoxGIQNzjqcLn4grcKAYHsZhqMJphYiZpgCgSD6uCodL9mz+Zqrjq+hVyC9OdYbN9CY9TOgSBwxMoFUqVWqYRotTdccUooK3aYwWAoAwPCh5blwAhU5yRSKWmxkOgw6rVMgYFSICZo9dkABqHzIACEAF5LtqeuE46UfkQzuXJtlMjBwEdzbN5ktrehIaHlWX8whpKpR+YxOXZLZsoy3rAWWzSVkEOZdiuwEv+yyAORZXJnACeyARSJRaIxWM2NLpKFs5jebxPi4I5jocsaRL2vrGL8NR1I0rTtJ0SCXmcNJISglaKnKEp6sgbwHho5z0IKQA

</br>
~~~
Sep  8 11:50:00 SC-2-1 osafimmd[4226]: WA IMMND DOWN on active controller 2 
detected at standby immd!! 1. Possible failover
...
Sep  8 11:50:00 SC-2-1 osafimmd[4226]: WA Message count:10437 + 1 != 10437
Sep  8 11:50:00 SC-2-1 osafimmnd[4241]: WA DISCARD DUPLICATE FEVS message:10437
Sep  8 11:50:00 SC-2-1 osafimmnd[4241]: WA Error code 2 returned for message 
type 82 - ignoring
~~~
</br>

Attached is the logs


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to