Hi Neel and Mahesh,

Below you say both 200K objects and 2000K objects, the latter is two million 
objects.
If so that is an order of magnitude above the max allowed number of objects.
I will assume you meant 200k objects.

It seems in this test you are not even testing failover.
I would suggest, If you have not already done so, try repeating the test with:

a) Without the #952 patch but still on the default branch. This to verify that 
this problem was triggered by the #952 patch.
b) With the #952 patch but with much fewer objects, say 2000. If the problem is 
only observed with a larger number of objects

I am assuming here that the problem is deterministically reproducible.

/AndersBj

-----Original Message-----
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: den 3 juli 2015 07:37
To: Anders Björnerstedt; reddy.neelaka...@oracle.com; Zoran Milinkovic
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] imm:checkpoint only FEVS header for sync messages 
[#952] v2

Hi Nell,

On 7/2/2015 2:54 PM, Anders Björnerstedt wrote:
> Ack from me.
> Not tested.

I was trying to test with 200K objects  , I observed some issues please verify 
before pushing .

1) bring up SC-1 active  with 2000K objects
2) bring up PL-3
3) bring up PL-4
4) try to bring up SC-2 as standby
5) you will observe  osafimmnd  restart on payload(s)  and they will never 
re-join


==================================================================================================================================
Jul  3 10:49:27 PL-4 osafamfnd[3651]: NO 
'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' Presence State UNINSTANTIATED => 
INSTANTIATING
Jul  3 10:49:27 PL-4 osafamfwd[3661]: Started
Jul  3 10:49:27 PL-4 osafckptnd[3671]: Started
Jul  3 10:49:27 PL-4 osaflcknd[3681]: Started
Jul  3 10:49:27 PL-4 osafmsgnd[3699]: Started
Jul  3 10:49:27 PL-4 osafimmnd[3624]: NO Implementer connected: 12 
(MsgQueueService132111) <49, 2040f>
Jul  3 10:49:27 PL-4 osafsmfnd[3710]: Started
Jul  3 10:49:27 PL-4 osafamfnd[3651]: NO 
'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' Presence State INSTANTIATING => 
INSTANTIATED
Jul  3 10:49:27 PL-4 osafamfnd[3651]: NO Assigning 
'safSi=NoRed8,safApp=OpenSAF' ACTIVE to 
'safSu=PL-4,safSg=NoRed,safApp=OpenSAF'
Jul  3 10:49:27 PL-4 osafamfnd[3651]: NO Assigned 
'safSi=NoRed8,safApp=OpenSAF' ACTIVE to 
'safSu=PL-4,safSg=NoRed,safApp=OpenSAF'
Jul  3 10:49:27 PL-4 opensafd: OpenSAF(4.5.0 - ) services successfully 
started
done
PL-4:~ # Jul  3 10:49:42 PL-4 kernel: [  568.167588] tipc: Established 
link <1.1.4:eth2-1.1.2:eth2> on network plane B
Jul  3 10:49:42 PL-4 kernel: [  568.168970] tipc: Established link 
<1.1.4:eth0-1.1.2:eth3> on network plane A
Jul  3 10:49:43 PL-4 osafimmnd[3624]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Jul  3 10:50:09 PL-4 osafamfnd[3651]: NO 
'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' component restart probation 
timer started (timeout: 60000000000 ns)
Jul  3 10:50:09 PL-4 osafamfnd[3651]: NO Restarting a component of 
'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
Jul  3 10:50:09 PL-4 osafamfnd[3651]: NO 
'safComp=IMMND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' faulted due to 
'avaDown' : Recovery is 'componentRestart'
Jul  3 10:50:09 PL-4 osafimmnd[3751]: Started
Jul  3 10:50:09 PL-4 osafimmnd[3751]: NO Persistent Back-End capability 
configured, Pbe file:imm.db (suffix may get added)
Jul  3 10:50:09 PL-4 osafimmnd[3751]: NO Fevs count adjusted to 203641 
preLoadPid: 0
Jul  3 10:50:09 PL-4 osafimmnd[3751]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Jul  3 10:50:09 PL-4 osafimmnd[3751]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Jul  3 10:50:09 PL-4 osafimmnd[3751]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Jul  3 10:50:09 PL-4 osafimmnd[3751]: NO NODE STATE-> IMM_NODE_ISOLATED
Jul  3 10:50:13 PL-4 osafamfnd[3651]: NO Restarting a component of 
'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' (comp restart count: 2)
Jul  3 10:50:13 PL-4 osafamfnd[3651]: NO 
'safComp=IMMND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' faulted due to 
'avaDown' : Recovery is 'componentRestart'
Jul  3 10:50:13 PL-4 osafimmnd[3773]: Started
Jul  3 10:50:13 PL-4 osafimmnd[3773]: NO Persistent Back-End capability 
configured, Pbe file:imm.db (suffix may get added)
Jul  3 10:50:13 PL-4 osafimmnd[3773]: NO Fevs count adjusted to 203732 
preLoadPid: 0
Jul  3 10:50:13 PL-4 osafimmnd[3773]: NO SERVER STATE: 
IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
Jul  3 10:50:14 PL-4 osafimmnd[3773]: NO SERVER STATE: 
IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
Jul  3 10:50:14 PL-4 osafimmnd[3773]: NO SERVER STATE: 
IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
Jul  3 10:50:14 PL-4 osafimmnd[3773]: NO NODE STATE-> IMM_NODE_ISOLATED
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfCompBaseType
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfSUBaseType
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfSGBaseType
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfAppBaseType
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfSvcBaseType
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfCSBaseType
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfCompGlobalAttributes
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfCompType
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfCSType
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfCtCsType
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfHealthcheckType
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfSvcType
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfSvcTypeCSTypes
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfSUType
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfSutCompType
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfSGType
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfAppType
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfCluster
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfNode
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfNodeGroup
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfNodeSwBundle
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfApplication
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfSG
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfSI
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfCSI
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfCSIAttribute
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfSU
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfComp
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfHealthcheck
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfCompCsType
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfSIDependency
Jul  3 10:50:26 PL-4 osafimmnd[3773]: NO Sync client discarded 
classimplementer set. Impl-id:13 Class:SaAmfSIRankedSU
Jul  3 10:50:28 PL-4 osafimmnd[3773]: NO NODE STATE-> IMM_NODE_W_AVAILABLE
Jul  3 10:50:29 PL-4 osafimmnd[3773]: NO SERVER STATE: 
IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT
Jul  3 10:50:30 PL-4 osafimmnd[3773]: NO Implementer connected: 14 
(MsgQueueService131599) <0, 2020f>
==================================================================================================================================

-AVM

On 7/2/2015 2:54 PM, Anders Björnerstedt wrote:
> Ack from me.
> Not tested.
> Good work!
>
> One thought that struck me is that the message types:
>
>     IMMND_EVT_A2ND_IMM_FEVS_2
>     IMMD_EVT_ND2D_FEVS_REQ_2
>     IMMND_EVT_D2ND_GLOB_FEVS_REQ_2
>
> should (in some later cleanup) be renamed to reflect that they are only used 
> for imm-sync.
> e.g.   IMMND_EVT_A2ND_IMM_SYNC_FEVS
> Not for this ticket though.
>
> /AndersBj
>
>
> -----Original Message-----
> From: reddy.neelaka...@oracle.com [mailto:reddy.neelaka...@oracle.com]
> Sent: den 1 juli 2015 16:16
> To: Anders Björnerstedt; Zoran Milinkovic; mahesh.va...@oracle.com
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: [PATCH 1 of 1] imm:checkpoint only FEVS header for sync messages 
> [#952] v2
>
>   osaf/services/saf/immsv/immd/immd_evt.c   |  15 ++++++++++++---
>   osaf/services/saf/immsv/immnd/immnd_evt.c |   9 ++++++++-
>   2 files changed, 20 insertions(+), 4 deletions(-)
>
>
> At the time of sync, when check-pointing to standby IMMD for 
> IMMND_EVT_D2ND_GLOB_FEVS_REQ_2, the fevs message buffer will be set to NULL 
> and message size will be set to 0. so, that the MBCSV check-pointing happens 
> only for header.
>
> diff --git a/osaf/services/saf/immsv/immd/immd_evt.c 
> b/osaf/services/saf/immsv/immd/immd_evt.c
> --- a/osaf/services/saf/immsv/immd/immd_evt.c
> +++ b/osaf/services/saf/immsv/immd/immd_evt.c
> @@ -251,7 +251,7 @@ uint32_t immd_evt_proc_fevs_req(IMMD_CB
>       /* Populate & Send the FEVS Event to IMMND */
>       memset(&send_evt, 0, sizeof(IMMSV_EVT));
>       send_evt.type = IMMSV_EVT_TYPE_IMMND;
> -     send_evt.info.immnd.type = (evt->type == IMMD_EVT_ND2D_FEVS_REQ_2)?
> +     send_evt.info.immnd.type = ((evt->type == 
> IMMD_EVT_ND2D_FEVS_REQ_2)||(evt->type == 0))?
>               IMMND_EVT_D2ND_GLOB_FEVS_REQ_2: IMMND_EVT_D2ND_GLOB_FEVS_REQ;
>   
>       if ((evt->type == 0) && (fevs_req->sender_count > 0)) { @@ -266,8 
> +266,8 @@ uint32_t immd_evt_proc_fevs_req(IMMD_CB
>       send_evt.info.immnd.info.fevsReq.msg.size = fevs_req->msg.size;
>       /*Borrow the buffer from the input message instead of copying */
>       send_evt.info.immnd.info.fevsReq.msg.buf = fevs_req->msg.buf;
> -     send_evt.info.immnd.info.fevsReq.isObjSync = (evt->type == 
> IMMD_EVT_ND2D_FEVS_REQ_2)?
> -             (fevs_req->isObjSync):0x0;
> +     send_evt.info.immnd.info.fevsReq.isObjSync = ((evt->type == 
> IMMD_EVT_ND2D_FEVS_REQ_2) ||
> +                     (evt->type == 0 ))? (fevs_req->isObjSync):0x0;
>   
>       TRACE_5("immd_evt_proc_fevs_req send_count:%llu size:%u",
>               send_evt.info.immnd.info.fevsReq.sender_count, 
> send_evt.info.immnd.info.fevsReq.msg.size);
> @@ -280,6 +280,15 @@ uint32_t immd_evt_proc_fevs_req(IMMD_CB
>               mbcp_msg.type = IMMD_A2S_MSG_FEVS;
>               mbcp_msg.info.fevsReq = send_evt.info.immnd.info.fevsReq;
>   
> +             /* FEVS_REQ_2 messages are object sync messages. since this is 
> mbcsv checkpointing
> +                to standby, at the time of sync checkpointing complete fevs 
> event is not required.
> +                Checkpointing the header is sufficient to have the standby 
> SC in
> +sync with the fevs count.*/
> +
> +             if(evt->type == IMMD_EVT_ND2D_FEVS_REQ_2){
> +                     mbcp_msg.info.fevsReq.msg.size = 0;
> +                     mbcp_msg.info.fevsReq.msg.buf = NULL;
> +                     mbcp_msg.info.fevsReq.isObjSync = 0x0;
> +             }
>               /*Checkpoint the message to standby director.
>                  Syncronous call=>wait for ack */
>               proc_rc = immd_mbcsv_sync_update(cb, &mbcp_msg); diff --git 
> a/osaf/services/saf/immsv/immnd/immnd_evt.c 
> b/osaf/services/saf/immsv/immnd/immnd_evt.c
> --- a/osaf/services/saf/immsv/immnd/immnd_evt.c
> +++ b/osaf/services/saf/immsv/immnd/immnd_evt.c
> @@ -8702,7 +8702,7 @@ static uint32_t immnd_evt_proc_fevs_rcv(
>       SaBoolT originatedAtThisNd = (m_IMMSV_UNPACK_HANDLE_LOW(clnt_hdl) == 
> cb->node_id);
>   
>       if (originatedAtThisNd) {
> -             osafassert(!reply_dest || (reply_dest == cb->immnd_mdest_id));
> +             osafassert(!reply_dest || (reply_dest == cb->immnd_mdest_id) ||
> +isObjSync );
>               if (cb->fevs_replies_pending) {
>                       --(cb->fevs_replies_pending);   /*flow control towards 
> IMMD */
>               }
> @@ -8731,6 +8731,12 @@ static uint32_t immnd_evt_proc_fevs_rcv(
>               }
>       }
>   
> +     if ((evt->type == IMMND_EVT_D2ND_GLOB_FEVS_REQ_2) && (msg->size == 0) 
> && (msg->buf == NULL)){
> +             // This is  sync message Re-broadcasted by IMMD standby because 
> of failover
> +             TRACE("Re-broadcasted FEVS at the time of sync");
> +             goto done;
> +     }
> +
>       /*NORMAL CASE: Received the expected in-order message. */
>   
>       SaAisErrorT err = SA_AIS_OK;
> @@ -8749,6 +8755,7 @@ static uint32_t immnd_evt_proc_fevs_rcv(
>               }
>       }
>   
> + done:
>       cb->highestProcessed++;
>       dequeue_outgoing(cb);
>       TRACE_LEAVE();


------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to