Hi,
I said somewhere at the beginning of this review process that the root
cause of the problem
is the dubious feature of implicit class/implementer-set, in particular
for appliers.
Unfortunately, even though this feature is just about useless from the
functional perspective,
it does have a performance advantage for reconnecting Ois/appliers.
Regardless if the OI/applier
relies on the implicit class/object implementer-set or does explicit
class/object implementer-set
the IDEMPOTENCY case is valuable for performance. I mean the case where
the OI/applier attaches
and automatically gets the class/object bindings set. This is valuable
even if the OI/applier
does explicit class/object implementer set. Such a call can be resolved
locally (does not need to
go over fevs) if the local info already indicates that the binding is
set. Only if the local info is missing
in regard to that particular class/object impl/appl set will it need to
go distributed. This is more valuable
than it appears at first look. The main performance issue is during
imm-sync when class/object oi/applier set is
rejected with TRY_AGAIN, *except* if it is already set for that
class/object.
An applier is just a special kind of OI and in general we dont want
different connection behavior for
appliers than regular OIs. It complicates documentation, which most
users dont read anyway.
Besides that there is the backwards compatibility issue.
If there is any scenario where the implicit class/object set could
sensibly be used by an application,
it would be where the class/object-applier has been set up at one node
but is ordered by the
application (directly or indirectly) to be disconnected and reconnected.
The best example is an si-swap
or similar where the applier is moved from one processor to another in a
manner coordinated or
at least known by the application. Because that is the case where the
application could conceivably
rely on implicit class/object oi/applier set, since it could *know* that
it was set previously.
Currently just about everything around this works as expected. The
exception is where a joining node
gets synced. It does not get the total applier info synced. Not one
application has reported any problem
with this over the couple of years this mechanism has been in place. The
real problem reported by this
ticket is the final lkine in the problem description:
PL3 osafimmnd[392]: ER Sync-verify: Established node has different
Implementer-id: 5 for name: @whatever, sync says 6.
That should not be possible because the root applier information (name,
id, node-location) *is* synced.
The reason it *does* happen is simply because a case of attach of
implicit applier-set and class-applier-set
gets rejected at veteran nodes due to ccb interference but accepted at a
recently synced node that has no
info on class/object applier status and hence does not detect the
ccb-conflict. The implicit mechanism is
executed even if the application does not rely on it. That is even if
the application does explicit class-applier-set,
this will get executed *after* the basic implementer-set; and the basic
implementer-set does the implicit check
when it finds the class binding already existing.
Zorans patch fixes the problem by only performing the class/obj check at
the node where the client resides.
The local pre-check will work only at veteran nodes and if it catches
ccb interference the request is rejected
locally. If the request is at a freshly synced node with no
class-obj-applier bindings then the check will not
work, i.e. it will not detect any ccb -interference. But neither will
the check detect ccb-interference at other
nodes because at remote nodes the check is NOT DONE. But the
applier-clss-ccb interference does not
matter at other nodes because the applier does not exist there. So the
end result is that the applier
attaches at the synced node and does not get any class/obj callbacks
until after the local client explicitly
sets such class/obj bindings.
So the only feature that does not work after Zorans patch (assuming I
have understood it) is the IMPLICT
attach of clas/object applier at a freshly synced node. But this I claim
is covered in the documentation/README:
Applier bindings are even weaker in this "starting context" sense
because applier bindings to object/class are not synced. The only
applier data that is synced (after a node restart) is the existence of
the applier name and whether ot not any client is currently attached as
that applier.
So I think we are covered.
/AndersBj
On 10/09/2015 08:55 AM, Neelakanta Reddy wrote:
> Hi zoran,
>
> Comments inline.
>
> /Neel.
> On Thursday 08 October 2015 07:56 PM, Zoran Milinkovic wrote:
>> Hi Neelakanta,
>>
>> From the sync, a new joined node gets only applier name and node id where
>> the applier is created. The new node does not know anything about on which
>> objects and classes appliers are set.
>> That's a difference from implementers where each node knows on which objects
>> or classes implementers are set.
> I agree.
>> This lack of information for appliers may make problems if already used
>> applier name is tried to be set and it fails. On new joined node, applier
>> set can succeed. In this case we have inconsistent system which ends up in
>> the node reboot.
> I agree, since the problem leads to node reboot, this ticket must be
> fixed completely in this release.
>> Because applier set info (set on objects and classes) resides only on the
>> node where the applier is created, the originating node first detach the
>> applier locally, and then send discard implementer (applier) to all nodes
>> sending IMMD_EVT_ND2D_DISCARD_IMPL to IMMD.
> Here the meaning of the originating node means from where the request is
> arrived.
> There are two cases:
> case 1: applier set request arrives from the veteran nodes
>
> Then the present immModel_implementerSet (with this patch) will discard
> the implementer.
>
> case 2: applier set request comes from the newly sync nodes
>
> In this case since the new node does not have object and class applier
> information then in the newly synced nodes the applier got connected
> and in the veteran nodes the applier is not connected. (This is similar
> to the problem reported in the ticket)
>> Object and class implementer set don't have anything with implementer set.
>> That are two different kind of calls.
>>
> yes, the class/objectimplementer set are different calls from
> saImmOiImplementerSet .
>
> To generalize for the appliers:
>
> 1. In the ImmModel::discardImplementer if it is an applier, remove the
> applier and class binding (i.e remove the disconnected applier from
> class info)
>
> 2. allow the connection of implementer(applier case only) even when
> there are on-going CCBs.
> applier checks need to be removed in immModel_implementerSet
>
> Even though the implementerset call succeeds (appliers are connected).
> But, the callbacks will be sent only when object/classimplementerset is
> called.
>
> 3. when the object/classimplementerset is called then information is
> available to check if there are any ongoing CCBs , then discard the
> implementer.
>
> Eg:
>
> The primary middleware usage of applier is AMF service.
> The AMF service also while connecting as an applier, sets
> classImplementerset for all AMF classes.
>
>
>> Best regards,
>> Zoran
>>
>>
>> -----Original Message-----
>> From: Neelakanta Reddy [mailto:[email protected]]
>> Sent: Thursday, October 08, 2015 2:02 PM
>> To: Zoran Milinkovic
>> Cc: [email protected]
>> Subject: Re: [PATCH 1 of 1] imm: synchronize applier set on all nodes [#1504]
>>
>> Hi zoran,
>>
>> The below patch does not solve the problem indicated in the ticket.
>>
>> .
>>
>> if (originatedAtThisNd) { /*Send reply to client from this ND. */
>> immnd_client_node_get(cb, clnt_hdl, &cl_node);
>> - if (cl_node == NULL) {
>> + if ((cl_node == NULL) || (discardImplementer == SA_TRUE)) {
>>
>>
>> The above lines will only send IMMD_EVT_ND2D_DISCARD_IMPL when the
>> implementer is connected from the originating node.
>> In case , if the same applier is connected from the newly joined node
>> DISCARD_IMPL is not sent.
>>
>> The solution is DISCARD_IMPL in immnd_evt_proc_cl_impl_set and
>> immnd_evt_proc_obj_impl_set.
>> Allow creation of applier in immnd_evt_proc_impl_set_rsp(i.e Remove the
>> applier checks, so that the applier get connected with
>> saImmOiImplementerSet(). If there is any active CCBs then the applier will
>> be discarded either in class/object implementer set.
>>
>>
>> /Neel.
>>
>> On Wednesday 07 October 2015 08:16 PM, Zoran Milinkovic wrote:
>>> osaf/services/saf/immsv/immnd/ImmModel.cc | 12 +++++++++---
>>> osaf/services/saf/immsv/immnd/ImmModel.hh | 3 ++-
>>> osaf/services/saf/immsv/immnd/immnd_evt.c | 15 ++++++++++-----
>>> osaf/services/saf/immsv/immnd/immnd_init.h | 2 +-
>>> 4 files changed, 22 insertions(+), 10 deletions(-)
>>>
>>>
>>> If applier fails on the local node, the global discard implementer message
>>> will be broadcasted to all nodes.
>>>
>>> diff --git a/osaf/services/saf/immsv/immnd/ImmModel.cc
>>> b/osaf/services/saf/immsv/immnd/ImmModel.cc
>>> --- a/osaf/services/saf/immsv/immnd/ImmModel.cc
>>> +++ b/osaf/services/saf/immsv/immnd/ImmModel.cc
>>> @@ -1589,7 +1589,7 @@ SaAisErrorT
>>> immModel_implementerSet(IMMND_CB *cb, const IMMSV_OCTET_STRING*
>>> implName,
>>> SaUint32T implConn, SaUint32T implNodeId,
>>> SaUint32T implId, MDS_DEST mds_dest,
>>> - SaUint32T implTimeout)
>>> + SaUint32T implTimeout, SaBoolT *discardImplementer)
>>> {
>>> return
>>> ImmModel::instance(&cb->immModel)->implementerSet(implName,
>>> @@ -1597,7 +1597,8 @@ immModel_implementerSet(IMMND_CB *cb, co
>>> implNodeId,
>>> implId,
>>> (SaUint64T) mds_dest,
>>> - implTimeout);
>>> + implTimeout,
>>> + discardImplementer);
>>> }
>>>
>>> SaAisErrorT
>>> @@ -13030,12 +13031,15 @@ ImmModel::implementerSet(const IMMSV_OCT
>>> SaUint32T nodeId,
>>> SaUint32T implementerId,
>>> SaUint64T mds_dest,
>>> - SaUint32T implTimeout)
>>> + SaUint32T implTimeout,
>>> + SaBoolT *discardImplementer)
>>> {
>>> SaAisErrorT err = SA_AIS_OK;
>>> CcbVector::iterator i;
>>> TRACE_ENTER();
>>>
>>> + *discardImplementer = SA_FALSE;
>>> +
>>> if(immNotWritable() && !protocol43Allowed()) {
>>> TRACE_LEAVE();
>>> return SA_AIS_ERR_TRY_AGAIN; @@ -13108,6 +13112,7 @@
>>> ImmModel::implementerSet(const IMMSV_OCT
>>> TRACE("TRY_AGAIN: ccb %u is active
>>> on object '%s' "
>>> "bound to class applier '%s'.
>>> Can not re-attach applier",
>>> ccb->mId,
>>> omit->first.c_str(), implName.c_str());
>>> + *discardImplementer = SA_TRUE;
>>> err = SA_AIS_ERR_TRY_AGAIN;
>>> goto done;
>>> }
>>> @@ -13121,6 +13126,7 @@ ImmModel::implementerSet(const IMMSV_OCT
>>> TRACE("TRY_AGAIN: ccb %u is active on
>>> object '%s' "
>>> "bound to object applier '%s'.
>>> Can not re-attach applier",
>>> ccb->mId,
>>> omit->first.c_str(), implName.c_str());
>>> + *discardImplementer = SA_TRUE;
>>> err = SA_AIS_ERR_TRY_AGAIN;
>>> goto done;
>>> }
>>> diff --git a/osaf/services/saf/immsv/immnd/ImmModel.hh
>>> b/osaf/services/saf/immsv/immnd/ImmModel.hh
>>> --- a/osaf/services/saf/immsv/immnd/ImmModel.hh
>>> +++ b/osaf/services/saf/immsv/immnd/ImmModel.hh
>>> @@ -397,7 +397,8 @@ public:
>>> SaUint32T nodeId,
>>> SaUint32T ownerId,
>>> SaUint64T mds_dest,
>>> - SaUint32T implTimeout);
>>> + SaUint32T implTimeout,
>>> + SaBoolT *discardImplementer);
>>>
>>> SaAisErrorT classImplementerSet(
>>> const struct
>>> ImmsvOiImplSetReq* req, diff --git
>>> a/osaf/services/saf/immsv/immnd/immnd_evt.c
>>> b/osaf/services/saf/immsv/immnd/immnd_evt.c
>>> --- a/osaf/services/saf/immsv/immnd/immnd_evt.c
>>> +++ b/osaf/services/saf/immsv/immnd/immnd_evt.c
>>> @@ -9523,6 +9523,7 @@ static void immnd_evt_proc_impl_set_rsp(
>>> NCS_NODE_ID nodeId;
>>> SaUint32T conn;
>>> SaUint32T implId = 0;
>>> + SaBoolT discardImplementer = SA_FALSE;
>>>
>>> osafassert(evt);
>>> osafassert(!originatedAtThisNd || reply_dest ==
>>> cb->immnd_mdest_id); @@ -9539,13 +9540,13 @@ static void
>>> immnd_evt_proc_impl_set_rsp(
>>>
>>> err = immModel_implementerSet(cb,
>>> &(evt->info.implSet.impl_name),
>>> (originatedAtThisNd) ? conn : 0, nodeId, implId,
>>> - reply_dest, evt->info.implSet.oi_timeout);
>>> + reply_dest, evt->info.implSet.oi_timeout,
>>> &discardImplementer);
>>>
>>> if (originatedAtThisNd) { /*Send reply to client from
>>> this ND. */
>>> immnd_client_node_get(cb, clnt_hdl, &cl_node);
>>> - if (cl_node == NULL) {
>>> + if ((cl_node == NULL) || (discardImplementer == SA_TRUE)) {
>>> /* Client was down */
>>> - TRACE_5("Failed to get client node, discarding
>>> implementer id:%u for connection: %u", implId, conn);
>>> + TRACE_5("Discarding implementer id:%u for connection:
>>> %u", implId,
>>> +conn);
>>> memset(&send_evt, '\0', sizeof(IMMSV_EVT));
>>> send_evt.type = IMMSV_EVT_TYPE_IMMD;
>>> send_evt.info.immd.type =
>>> IMMD_EVT_ND2D_DISCARD_IMPL; @@ -9556,8
>>> +9557,12 @@ static void immnd_evt_proc_impl_set_rsp(
>>> /* Mark the implementer as dying to make sure
>>> no upcall is sent to the client.
>>> The implementer will be really discarded
>>> when global-discard message comes */
>>> immModel_discardImplementer(cb, implId,
>>> SA_FALSE, NULL, NULL);
>>> - return;
>>> - } else if (cl_node->mIsStale) {
>>> +
>>> + if(cl_node == NULL) {
>>> + return;
>>> + }
>>> + }
>>> + if (cl_node->mIsStale) {
>>> LOG_WA("IMMND - Client went down so no
>>> response");
>>> return;
>>> }
>>> diff --git a/osaf/services/saf/immsv/immnd/immnd_init.h
>>> b/osaf/services/saf/immsv/immnd/immnd_init.h
>>> --- a/osaf/services/saf/immsv/immnd/immnd_init.h
>>> +++ b/osaf/services/saf/immsv/immnd/immnd_init.h
>>> @@ -225,7 +225,7 @@ extern "C" {
>>> SaAisErrorT
>>> immModel_implementerSet(IMMND_CB *cb, const
>>> IMMSV_OCTET_STRING *implName,
>>> SaUint32T implConn, SaUint32T
>>> implNodeId, SaUint32T implId,
>>> - MDS_DEST mds_dest, SaUint32T implTimeout);
>>> + MDS_DEST mds_dest, SaUint32T implTimeout,
>>> SaBoolT
>>> +*discardImplementer);
>>>
>>> SaAisErrorT
>>> immModel_implementerClear(IMMND_CB *cb, const struct
>>> ImmsvOiImplSetReq *req,
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Opensaf-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel