One more thing ideally writing checkpoint kind of  operation are NOT 
suggested in  CALLBACKs
what exactly is your requirement ?

Did error handling is done properly ?

-AVM

On 1/8/2016 8:49 AM, A V Mahesh wrote:
> Hi,
>
>>> (so the standby code does not run until the active code is done).
>
> If above the sequence of checkpoint writing , you should be having 
> problem even with 40MB and higher ,
> can you please cross check any system limitation such as /dev/shm/  .. 
> ect.
>
> By the way which Opensaf change set you are using ?
>
> -AVM
>
>
> On 1/8/2016 1:25 AM, Tony Hart wrote:
>> Should also mention that this is using the synchronous API calls.
>>
>>> On Jan 7, 2016, at 10:55 AM, Tony Hart <[email protected]> wrote:
>>>
>>> OpenSAF 4.5.1
>>>
>>> We’re seeing an issue where checkpoints are not syncing between two 
>>> nodes (the data in one is different from the other).   There are two 
>>> separate nodes (A and B) one will have the active instance of the 
>>> process and the other the standby instance.  The checkpoint is 
>>> created, opened and initialized in the active instance’s AMF ACTIVE 
>>> callback.  Then the checkpoint is opened in the standby instances 
>>> AMF standby callback (so the standby code does not run until the 
>>> active code is done).
>>>
>>> NodeA
>>> on_active() {
>>>
>>>     Create a checkpoint with (SA_CKPT_WR_ALL_REPLICAS | 
>>> SA_CKPT_CHECKPOINT_COLLOCATED)
>>>     Initialize the checkpoint data (first 32 bytes is filled with a 
>>> pattern)
>>> }
>>>
>>> NodeB
>>> on_standby() {
>>>     Open the same checkpoint
>>>     Read first 32 bytes and check for fill pattern.
>>> }
>>>
>>> On NodeB what we occasionally see is that the check fails, instead 
>>> of reading the fill pattern it see's zeros.  It doesn’t matter how 
>>> long the checkpoint is left open we never see the fill pattern.
>>>
>>> Hear is a dump of the shared memory file from the two nodes. Our 
>>> data starts at 06448 (0xf33d).  You can see on the standby copy that 
>>> its zero.
>>>
>>> Other checkpoints work fine.  The difference with this one is that 
>>> its much bigger than the others ~20MB, if we increase the size of 
>>> the checkpoint to 40MB we see the failure all the time.  So the 
>>> problem seems to be related to the size of the checkpoint.
>>>
>>> NodeA (active)
>>> $ od -x -N 512 /dev/shm/opensaf_safCkpt\=SwitchMgr__69391_13
>>> 0000000 000d 0000 0000 0000 0013 6173 4366 706b
>>> 0000020 3d74 7753 7469 6863 674d 5f72 0035 0000
>>> 0000040 0000 0000 0000 0000 0000 0000 0000 0000
>>> *
>>> 0000420 0009 0000 0000 0000 0020 02bc 0000 0000
>>> 0000440 5800 f847 000d 0000 0001 0000 0000 0000
>>> 0000460 0020 02bc 0000 0000 001a 0000 0000 0000
>>> 0000500 0004 0000 0000 0000 0000 0000 0001 0000
>>> 0000520 0000 0000 0101 0000 a031 bc91 0f0f 0001
>>> 0000540 0000 0000 0000 0000 0000 0000 0000 0000
>>> *
>>> 0000600 0000 0000 0000 0000 0000 0000 0001 0000
>>> 0000620 0020 02bc 0000 0000 0000 0000 0000 0000
>>> 0000640 7f01 568e 0000 0000 f33d b33f 0578 0000
>>> 0000660 8000 0000 0000 0000 0000 0000 0000 0000
>>> 0000700 0000 0000 0000 0000 0000 0000 0000 0000
>>> *
>>> 0001000
>>>
>>>
>>> NodeB (standby)
>>> $ od -x -N 512 /dev/shm/opensaf_safCkpt\=SwitchMgr__69647_13
>>> 0000000 000d 0000 0000 0000 0013 6173 4366 706b
>>> 0000020 3d74 7753 7469 6863 674d 5f72 0035 0000
>>> 0000040 0000 0000 0000 0000 0000 0000 0000 0000
>>> *
>>> 0000420 0009 0000 0000 0000 0020 02bc 0000 0000
>>> 0000440 5800 f847 000d 0000 0001 0000 0000 0000
>>> 0000460 0020 02bc 0000 0000 001a 0000 0000 0000
>>> 0000500 0004 0000 0000 0000 0000 0000 0001 0000
>>> 0000520 0000 0000 0101 0000 a031 bc91 0f0f 0001
>>> 0000540 0000 0000 0000 0000 0000 0000 0000 0000
>>> *
>>> 0000600 0000 0000 0000 0000 0000 0000 0001 0000
>>> 0000620 0000 0000 0000 0000 0000 0000 0000 0000
>>> *
>>> 0001000
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>  
>>>
>>> _______________________________________________
>>> Opensaf-users mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/opensaf-users
>> ------------------------------------------------------------------------------
>>  
>>
>> _______________________________________________
>> Opensaf-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/opensaf-users
>


------------------------------------------------------------------------------
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to