Hi ,

yes, I agree that the applier connection on  newly sync node is 
documented and this is covered.

If the ticket(description of the ticket needs to be modified) is to fix 
the applier re-connection in veteran nodes,
based on information available in veteran nodes, then the check has to 
be made locally than sending to FEVS(distributed).

so, then the first patch sent for review, in immModel_implIsFree would 
be sufficient.
Moving the checks above " if(impl->mId == 0)" would be good.

Thanks,
Neel.

On Friday 09 October 2015 02:08 PM, Anders Björnerstedt wrote:
> Hi,
>
> I said somewhere at the beginning of this review process that the root 
> cause of the problem
> is the dubious feature of implicit class/implementer-set, in 
> particular for appliers.
>
> Unfortunately, even though this feature is just about useless from the 
> functional perspective,
> it does have a performance advantage for reconnecting Ois/appliers. 
> Regardless if the OI/applier
> relies on the implicit class/object implementer-set or does explicit 
> class/object implementer-set
> the IDEMPOTENCY case is valuable for performance. I mean the case 
> where the OI/applier attaches
> and automatically gets the class/object bindings set. This is valuable 
> even if the OI/applier
> does explicit class/object implementer set. Such a call can be 
> resolved locally (does not need to
> go over fevs) if the local info already indicates that the binding is 
> set. Only if the local info is missing
> in regard to that particular class/object impl/appl set will it need 
> to go distributed. This is more valuable
> than it appears at first look. The main performance issue is during 
> imm-sync when class/object oi/applier set is
> rejected with TRY_AGAIN, *except* if it is already set for that 
> class/object.
>
> An applier is just a special kind of OI and in general we dont want 
> different connection behavior for
> appliers than regular OIs. It complicates documentation, which most 
> users dont read anyway.
> Besides that  there is the backwards compatibility issue.
>
> If there is any scenario where the implicit class/object set could 
> sensibly be used by an application,
> it would be where the class/object-applier has been set up at one node 
> but is ordered by the
> application (directly or indirectly) to be disconnected and 
> reconnected. The best example is an si-swap
> or similar where the applier is moved from one processor to another in 
> a manner coordinated or
> at least known by the application. Because that is the case where the 
> application could conceivably
> rely on implicit class/object oi/applier set, since it could *know* 
> that it was set previously.
>
> Currently just about everything around this works as expected. The 
> exception is where a joining node
> gets synced. It does not get the total applier info synced. Not one 
> application has reported any problem
> with this over the couple of years this mechanism has been in place. 
> The real problem reported by this
> ticket is the final lkine in the problem description:
>
>     PL3 osafimmnd[392]: ER Sync-verify: Established node has different 
> Implementer-id: 5 for name: @whatever, sync says 6.
>
> That should not be possible because the root applier information 
> (name, id, node-location) *is* synced.
> The reason it *does* happen is simply because a case of attach of 
> implicit applier-set and class-applier-set
> gets rejected at veteran nodes due to ccb interference but accepted at 
> a recently synced node that has no
> info on class/object applier status and hence does not detect the 
> ccb-conflict. The implicit mechanism is
> executed even if the application does not rely on it. That is even if 
> the application does explicit class-applier-set,
> this will get executed *after* the basic implementer-set; and the 
> basic implementer-set does the implicit check
> when it finds the class binding already existing.
>
> Zorans patch fixes the problem by only performing the class/obj check 
> at the node where the client resides.
> The local pre-check will work only at veteran nodes and if it catches 
> ccb interference the request is rejected
> locally. If the request is at a freshly synced node with no 
> class-obj-applier bindings then the check will not
> work, i.e. it will not detect any ccb -interference. But neither will 
> the check detect ccb-interference at other
> nodes because at remote nodes the check is NOT DONE. But the 
> applier-clss-ccb interference does not
> matter at other nodes because the applier does not exist there. So the 
> end result is that the applier
> attaches at the synced node and does not get any class/obj callbacks 
> until after the local client explicitly
> sets such class/obj bindings.
>
> So the only feature that does not work after Zorans patch (assuming I 
> have understood it) is the IMPLICT
> attach of clas/object applier at a freshly synced node. But this I 
> claim is covered in the documentation/README:
>
>     Applier bindings are even weaker in this "starting context" sense
>     because applier bindings to object/class are not synced. The only
>     applier data that is synced (after a node restart) is the 
> existence of
>     the applier name and whether ot not any client is currently 
> attached as
>     that applier.
>
> So I think we are covered.
>
> /AndersBj
>
>
>
>
>
> On 10/09/2015 08:55 AM, Neelakanta Reddy wrote:
>> Hi zoran,
>>
>> Comments inline.
>>
>> /Neel.
>> On Thursday 08 October 2015 07:56 PM, Zoran Milinkovic wrote:
>>> Hi Neelakanta,
>>>
>>>   From the sync, a new joined node gets only applier name and node 
>>> id where the applier is created. The new node does not know anything 
>>> about on which objects and classes appliers are set.
>>> That's a difference from implementers where each node knows on which 
>>> objects or classes implementers are set.
>> I agree.
>>> This lack of information for appliers may make problems if already 
>>> used applier name is tried to be set and it fails. On new joined 
>>> node, applier set can succeed. In this case we have inconsistent 
>>> system which ends up in the node reboot.
>> I agree, since the problem leads to node reboot, this ticket must be
>> fixed completely in this release.
>>> Because applier set info (set on objects and classes) resides only 
>>> on the node where the applier is created, the originating node first 
>>> detach the applier locally, and then send discard implementer 
>>> (applier) to all nodes sending IMMD_EVT_ND2D_DISCARD_IMPL to IMMD.
>> Here the meaning of the originating node means from where the request is
>> arrived.
>> There are two cases:
>> case 1: applier set request arrives from the veteran nodes
>>
>> Then  the present immModel_implementerSet (with this patch) will discard
>> the implementer.
>>
>> case 2: applier set request comes from the newly sync nodes
>>
>> In this case since the new node does not have object and class applier
>> information then in the newly synced nodes the applier got connected
>> and in the veteran nodes the applier is not connected. (This is similar
>> to the problem reported in the ticket)
>>> Object and class implementer set don't have anything with 
>>> implementer set. That are two different kind of calls.
>> yes, the class/objectimplementer set are different calls from
>> saImmOiImplementerSet .
>>
>> To generalize for the appliers:
>>
>> 1. In the ImmModel::discardImplementer if it is an applier, remove the
>> applier and class binding (i.e remove the disconnected applier from
>> class info)
>>
>> 2.  allow the connection of implementer(applier case only) even when
>> there are on-going CCBs.
>> applier checks need to be removed in immModel_implementerSet
>>
>> Even though the implementerset call succeeds (appliers are connected).
>> But, the callbacks will be sent only when object/classimplementerset is
>> called.
>>
>> 3. when the object/classimplementerset  is called then information is
>> available to check if there are any ongoing CCBs , then discard the
>> implementer.
>>
>> Eg:
>>
>> The primary middleware usage of applier is AMF service.
>> The AMF service also while connecting as an applier, sets
>> classImplementerset for all AMF classes.
>>
>>
>>> Best regards,
>>> Zoran
>>>
>>>
>>> -----Original Message-----
>>> From: Neelakanta Reddy [mailto:[email protected]]
>>> Sent: Thursday, October 08, 2015 2:02 PM
>>> To: Zoran Milinkovic
>>> Cc: [email protected]
>>> Subject: Re: [PATCH 1 of 1] imm: synchronize applier set on all 
>>> nodes [#1504]
>>>
>>> Hi zoran,
>>>
>>> The below patch does not solve the problem indicated  in the ticket.
>>>
>>> .
>>>
>>>         if (originatedAtThisNd) {    /*Send reply to client from 
>>> this ND. */
>>>             immnd_client_node_get(cb, clnt_hdl, &cl_node);
>>> -        if (cl_node == NULL) {
>>> +        if ((cl_node == NULL) || (discardImplementer == SA_TRUE)) {
>>>
>>>
>>> The above lines will only send IMMD_EVT_ND2D_DISCARD_IMPL when the 
>>> implementer is connected from the originating node.
>>> In case , if the same applier is connected from the newly joined 
>>> node DISCARD_IMPL is not sent.
>>>
>>> The solution is DISCARD_IMPL in immnd_evt_proc_cl_impl_set and 
>>> immnd_evt_proc_obj_impl_set.
>>> Allow creation of applier in immnd_evt_proc_impl_set_rsp(i.e Remove 
>>> the applier checks, so that the applier get connected with 
>>> saImmOiImplementerSet(). If there is any active CCBs then the 
>>> applier will be discarded either in class/object implementer set.
>>>
>>>
>>> /Neel.
>>>
>>> On Wednesday 07 October 2015 08:16 PM, Zoran Milinkovic wrote:
>>>> osaf/services/saf/immsv/immnd/ImmModel.cc  |  12 +++++++++---
>>>>     osaf/services/saf/immsv/immnd/ImmModel.hh  |   3 ++-
>>>>     osaf/services/saf/immsv/immnd/immnd_evt.c  |  15 ++++++++++-----
>>>>     osaf/services/saf/immsv/immnd/immnd_init.h |   2 +-
>>>>     4 files changed, 22 insertions(+), 10 deletions(-)
>>>>
>>>>
>>>> If applier fails on the local node, the global discard implementer 
>>>> message will be broadcasted to all nodes.
>>>>
>>>> diff --git a/osaf/services/saf/immsv/immnd/ImmModel.cc
>>>> b/osaf/services/saf/immsv/immnd/ImmModel.cc
>>>> --- a/osaf/services/saf/immsv/immnd/ImmModel.cc
>>>> +++ b/osaf/services/saf/immsv/immnd/ImmModel.cc
>>>> @@ -1589,7 +1589,7 @@ SaAisErrorT
>>>>     immModel_implementerSet(IMMND_CB *cb, const IMMSV_OCTET_STRING* 
>>>> implName,
>>>>         SaUint32T implConn, SaUint32T implNodeId,
>>>>         SaUint32T implId, MDS_DEST mds_dest,
>>>> -    SaUint32T implTimeout)
>>>> +    SaUint32T implTimeout, SaBoolT *discardImplementer)
>>>>     {
>>>>         return
>>>> ImmModel::instance(&cb->immModel)->implementerSet(implName,
>>>> @@ -1597,7 +1597,8 @@ immModel_implementerSet(IMMND_CB *cb, co
>>>>                 implNodeId,
>>>>                 implId,
>>>>                 (SaUint64T) mds_dest,
>>>> -            implTimeout);
>>>> +            implTimeout,
>>>> +            discardImplementer);
>>>>     }
>>>>         SaAisErrorT
>>>> @@ -13030,12 +13031,15 @@ ImmModel::implementerSet(const IMMSV_OCT
>>>>         SaUint32T nodeId,
>>>>         SaUint32T implementerId,
>>>>         SaUint64T mds_dest,
>>>> -    SaUint32T implTimeout)
>>>> +    SaUint32T implTimeout,
>>>> +    SaBoolT *discardImplementer)
>>>>     {
>>>>         SaAisErrorT err = SA_AIS_OK;
>>>>         CcbVector::iterator i;
>>>>         TRACE_ENTER();
>>>>     +    *discardImplementer = SA_FALSE;
>>>> +
>>>>         if(immNotWritable() && !protocol43Allowed()) {
>>>>             TRACE_LEAVE();
>>>>             return SA_AIS_ERR_TRY_AGAIN; @@ -13108,6 +13112,7 @@
>>>> ImmModel::implementerSet(const IMMSV_OCT
>>>>                                         TRACE("TRY_AGAIN: ccb %u is 
>>>> active on object '%s' "
>>>>                                            "bound to class applier 
>>>> '%s'. Can not re-attach applier",
>>>>                                            ccb->mId,
>>>> omit->first.c_str(), implName.c_str());
>>>> +                                    *discardImplementer = SA_TRUE;
>>>>                                         err = SA_AIS_ERR_TRY_AGAIN;
>>>>                                         goto done;
>>>>                                     }
>>>> @@ -13121,6 +13126,7 @@ ImmModel::implementerSet(const IMMSV_OCT
>>>>                                     TRACE("TRY_AGAIN: ccb %u is 
>>>> active on object '%s' "
>>>>                                           "bound to object applier 
>>>> '%s'. Can not re-attach applier",
>>>>                                            ccb->mId,
>>>> omit->first.c_str(), implName.c_str());
>>>> +                                *discardImplementer = SA_TRUE;
>>>>                                     err = SA_AIS_ERR_TRY_AGAIN;
>>>>                                     goto done;
>>>>                                 }
>>>> diff --git a/osaf/services/saf/immsv/immnd/ImmModel.hh
>>>> b/osaf/services/saf/immsv/immnd/ImmModel.hh
>>>> --- a/osaf/services/saf/immsv/immnd/ImmModel.hh
>>>> +++ b/osaf/services/saf/immsv/immnd/ImmModel.hh
>>>> @@ -397,7 +397,8 @@ public:
>>>>                                            SaUint32T nodeId,
>>>>                                            SaUint32T ownerId,
>>>>                                            SaUint64T mds_dest,
>>>> -                                       SaUint32T implTimeout);
>>>> +                                       SaUint32T implTimeout,
>>>> +                                       SaBoolT *discardImplementer);
>>>>                 SaAisErrorT         classImplementerSet(
>>>>                                                 const struct
>>>> ImmsvOiImplSetReq* req, diff --git
>>>> a/osaf/services/saf/immsv/immnd/immnd_evt.c
>>>> b/osaf/services/saf/immsv/immnd/immnd_evt.c
>>>> --- a/osaf/services/saf/immsv/immnd/immnd_evt.c
>>>> +++ b/osaf/services/saf/immsv/immnd/immnd_evt.c
>>>> @@ -9523,6 +9523,7 @@ static void immnd_evt_proc_impl_set_rsp(
>>>>         NCS_NODE_ID nodeId;
>>>>         SaUint32T conn;
>>>>         SaUint32T implId = 0;
>>>> +    SaBoolT discardImplementer = SA_FALSE;
>>>>             osafassert(evt);
>>>>         osafassert(!originatedAtThisNd || reply_dest ==
>>>> cb->immnd_mdest_id); @@ -9539,13 +9540,13 @@ static void
>>>> immnd_evt_proc_impl_set_rsp(
>>>>             err = immModel_implementerSet(cb, 
>>>> &(evt->info.implSet.impl_name),
>>>>                 (originatedAtThisNd) ? conn : 0, nodeId, implId,
>>>> -            reply_dest, evt->info.implSet.oi_timeout);
>>>> +            reply_dest, evt->info.implSet.oi_timeout, 
>>>> &discardImplementer);
>>>>             if (originatedAtThisNd) {    /*Send reply to client 
>>>> from this ND. */
>>>>             immnd_client_node_get(cb, clnt_hdl, &cl_node);
>>>> -        if (cl_node == NULL) {
>>>> +        if ((cl_node == NULL) || (discardImplementer == SA_TRUE)) {
>>>>                 /* Client was down */
>>>> -            TRACE_5("Failed to get client node, discarding 
>>>> implementer id:%u for connection: %u", implId, conn);
>>>> +            TRACE_5("Discarding implementer id:%u for connection: 
>>>> %u", implId,
>>>> +conn);
>>>>                 memset(&send_evt, '\0', sizeof(IMMSV_EVT));
>>>>                 send_evt.type = IMMSV_EVT_TYPE_IMMD;
>>>>                 send_evt.info.immd.type = 
>>>> IMMD_EVT_ND2D_DISCARD_IMPL; @@ -9556,8
>>>> +9557,12 @@ static void immnd_evt_proc_impl_set_rsp(
>>>>                 /* Mark the implementer as dying to make sure no 
>>>> upcall is sent to the client.
>>>>                    The implementer will be really discarded when 
>>>> global-discard message comes */
>>>>                 immModel_discardImplementer(cb, implId, SA_FALSE, 
>>>> NULL, NULL);
>>>> -            return;
>>>> -        } else if (cl_node->mIsStale) {
>>>> +
>>>> +            if(cl_node == NULL) {
>>>> +                return;
>>>> +            }
>>>> +        }
>>>> +        if (cl_node->mIsStale) {
>>>>                 LOG_WA("IMMND - Client went down so no response");
>>>>                 return;
>>>>             }
>>>> diff --git a/osaf/services/saf/immsv/immnd/immnd_init.h
>>>> b/osaf/services/saf/immsv/immnd/immnd_init.h
>>>> --- a/osaf/services/saf/immsv/immnd/immnd_init.h
>>>> +++ b/osaf/services/saf/immsv/immnd/immnd_init.h
>>>> @@ -225,7 +225,7 @@ extern "C" {
>>>>         SaAisErrorT
>>>>             immModel_implementerSet(IMMND_CB *cb, const 
>>>> IMMSV_OCTET_STRING *implName,
>>>>                         SaUint32T implConn, SaUint32T implNodeId, 
>>>> SaUint32T implId,
>>>> -                    MDS_DEST mds_dest, SaUint32T implTimeout);
>>>> +                    MDS_DEST mds_dest, SaUint32T implTimeout, SaBoolT
>>>> +*discardImplementer);
>>>>             SaAisErrorT
>>>>             immModel_implementerClear(IMMND_CB *cb, const struct
>>>> ImmsvOiImplSetReq *req,
>>
>> ------------------------------------------------------------------------------
>>  
>>
>> _______________________________________________
>> Opensaf-devel mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>


------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to