- **status**: review --> fixed
- **Comment**:

changeset:   5933:bb53270bfe18
tag:         tip
parent:      5929:468f7cf19611
user:        Anders Bjornerstedt <anders.bjornerst...@ericsson.com>
date:        Wed Sep 24 15:48:12 2014 +0200
summary:     IMM: Failure to send completed to PBE defaulted to ccb-recovery 
[#1127]

changeset:   5932:2505c06b19ca
branch:      opensaf-4.5.x
parent:      5928:3cd62e8831a7
user:        Anders Bjornerstedt <anders.bjornerst...@ericsson.com>
date:        Wed Sep 24 15:48:12 2014 +0200
summary:     IMM: Failure to send completed to PBE defaulted to ccb-recovery 
[#1127]

changeset:   5931:3fff80ea7b42
branch:      opensaf-4.4.x
parent:      5927:832244b78b65
user:        Anders Bjornerstedt <anders.bjornerst...@ericsson.com>
date:        Wed Sep 24 15:52:11 2014 +0200
summary:     IMM: Failure to send completed to PBE defaulted to ccb-recovery 
[#1127]

changeset:   5930:214972614415
branch:      opensaf-4.3.x
parent:      5926:72def88cf2f8
user:        Anders Bjornerstedt <anders.bjornerst...@ericsson.com>
date:        Wed Sep 24 15:52:11 2014 +0200
summary:     IMM: Failure to send completed to PBE defaulted to ccb-recovery 
[#1127]




---

** [tickets:#1127] IMM: Failure to send completed to PBE can cause cluster 
restart.**

**Status:** fixed
**Milestone:** 4.3.3
**Created:** Tue Sep 23, 2014 07:58 AM UTC by Anders Bjornerstedt
**Last Updated:** Wed Sep 24, 2014 11:22 AM UTC
**Owner:** Anders Bjornerstedt

This ticket is similar to #1096:

 http://sourceforge.net/p/opensaf/tickets/1096/

The PBE detaches after having received the ccb-operations for a ccb but before
having received the completed-callback. In this case there are no OIs so
the completed-callback to PBE is to be sent directly when handling the apply
downcall from the user.

Detachment itself (of the PBE or any imm client) arrives over fevs, so that
is actually not the problem. The client node will only be removed in conjuction
with clearing of the implementer in ImmModel. Thus the return from ImmModel of 
a non-null pbeConn means the client-node must exist. This is an "invariant"
i.e. an assertable condition. 

The problem that *does* exist in immnd_evt_proc_ccb_apply is that the send 
itself over MDS may fail, due to a race with a PBE going down. In that case
the code in immnd_evt_proc_ccb_apply will explititly abort, which will happen
on all nodes, which will result in a cluster restart.

It is this abort() on send failure which is wrong. The other abort on client
node not found should be changed to an assert.

So the problem that needs to be fixed is to remove the abort on send failure
and instead "drop" the ccb apply to the recovery case, letting the apply
result be resolved by the PBE restart/recovery.
Indeed, it is concewivable that the PBE may have received the completed&commit
message even if the sending IMMND receives an error from MDS on the send. 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to