There are no errors returned from the osaf calls, there are no error logs 
generated.

What problems would writing to the checkpoint in an AMF callback cause?

—
tony

> On Jan 7, 2016, at 10:23 PM, A V Mahesh <[email protected]> wrote:
> 
> 
> One more thing ideally writing checkpoint kind of  operation are NOT 
> suggested in  CALLBACKs
> what exactly is your requirement ?
> 
> Did error handling is done properly ?
> 
> -AVM
> 
> On 1/8/2016 8:49 AM, A V Mahesh wrote:
>> Hi,
>> 
>>>> (so the standby code does not run until the active code is done).
>> 
>> If above the sequence of checkpoint writing , you should be having 
>> problem even with 40MB and higher ,
>> can you please cross check any system limitation such as /dev/shm/  .. 
>> ect.
>> 
>> By the way which Opensaf change set you are using ?
>> 
>> -AVM
>> 
>> 
>> On 1/8/2016 1:25 AM, Tony Hart wrote:
>>> Should also mention that this is using the synchronous API calls.
>>> 
>>>> On Jan 7, 2016, at 10:55 AM, Tony Hart <[email protected]> wrote:
>>>> 
>>>> OpenSAF 4.5.1
>>>> 
>>>> We’re seeing an issue where checkpoints are not syncing between two 
>>>> nodes (the data in one is different from the other).   There are two 
>>>> separate nodes (A and B) one will have the active instance of the 
>>>> process and the other the standby instance.  The checkpoint is 
>>>> created, opened and initialized in the active instance’s AMF ACTIVE 
>>>> callback.  Then the checkpoint is opened in the standby instances 
>>>> AMF standby callback (so the standby code does not run until the 
>>>> active code is done).
>>>> 
>>>> NodeA
>>>> on_active() {
>>>> 
>>>>    Create a checkpoint with (SA_CKPT_WR_ALL_REPLICAS | 
>>>> SA_CKPT_CHECKPOINT_COLLOCATED)
>>>>    Initialize the checkpoint data (first 32 bytes is filled with a 
>>>> pattern)
>>>> }
>>>> 
>>>> NodeB
>>>> on_standby() {
>>>>    Open the same checkpoint
>>>>    Read first 32 bytes and check for fill pattern.
>>>> }
>>>> 
>>>> On NodeB what we occasionally see is that the check fails, instead 
>>>> of reading the fill pattern it see's zeros.  It doesn’t matter how 
>>>> long the checkpoint is left open we never see the fill pattern.
>>>> 
>>>> Hear is a dump of the shared memory file from the two nodes. Our 
>>>> data starts at 06448 (0xf33d).  You can see on the standby copy that 
>>>> its zero.
>>>> 
>>>> Other checkpoints work fine.  The difference with this one is that 
>>>> its much bigger than the others ~20MB, if we increase the size of 
>>>> the checkpoint to 40MB we see the failure all the time.  So the 
>>>> problem seems to be related to the size of the checkpoint.
>>>> 
>>>> NodeA (active)
>>>> $ od -x -N 512 /dev/shm/opensaf_safCkpt\=SwitchMgr__69391_13
>>>> 0000000 000d 0000 0000 0000 0013 6173 4366 706b
>>>> 0000020 3d74 7753 7469 6863 674d 5f72 0035 0000
>>>> 0000040 0000 0000 0000 0000 0000 0000 0000 0000
>>>> *
>>>> 0000420 0009 0000 0000 0000 0020 02bc 0000 0000
>>>> 0000440 5800 f847 000d 0000 0001 0000 0000 0000
>>>> 0000460 0020 02bc 0000 0000 001a 0000 0000 0000
>>>> 0000500 0004 0000 0000 0000 0000 0000 0001 0000
>>>> 0000520 0000 0000 0101 0000 a031 bc91 0f0f 0001
>>>> 0000540 0000 0000 0000 0000 0000 0000 0000 0000
>>>> *
>>>> 0000600 0000 0000 0000 0000 0000 0000 0001 0000
>>>> 0000620 0020 02bc 0000 0000 0000 0000 0000 0000
>>>> 0000640 7f01 568e 0000 0000 f33d b33f 0578 0000
>>>> 0000660 8000 0000 0000 0000 0000 0000 0000 0000
>>>> 0000700 0000 0000 0000 0000 0000 0000 0000 0000
>>>> *
>>>> 0001000
>>>> 
>>>> 
>>>> NodeB (standby)
>>>> $ od -x -N 512 /dev/shm/opensaf_safCkpt\=SwitchMgr__69647_13
>>>> 0000000 000d 0000 0000 0000 0013 6173 4366 706b
>>>> 0000020 3d74 7753 7469 6863 674d 5f72 0035 0000
>>>> 0000040 0000 0000 0000 0000 0000 0000 0000 0000
>>>> *
>>>> 0000420 0009 0000 0000 0000 0020 02bc 0000 0000
>>>> 0000440 5800 f847 000d 0000 0001 0000 0000 0000
>>>> 0000460 0020 02bc 0000 0000 001a 0000 0000 0000
>>>> 0000500 0004 0000 0000 0000 0000 0000 0001 0000
>>>> 0000520 0000 0000 0101 0000 a031 bc91 0f0f 0001
>>>> 0000540 0000 0000 0000 0000 0000 0000 0000 0000
>>>> *
>>>> 0000600 0000 0000 0000 0000 0000 0000 0001 0000
>>>> 0000620 0000 0000 0000 0000 0000 0000 0000 0000
>>>> *
>>>> 0001000
>>>> 
>>>> 
>>>> ------------------------------------------------------------------------------
>>>>  
>>>> 
>>>> _______________________________________________
>>>> Opensaf-users mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-users
>>> ------------------------------------------------------------------------------
>>>  
>>> 
>>> _______________________________________________
>>> Opensaf-users mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/opensaf-users
>> 
> 
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Opensaf-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-users

------------------------------------------------------------------------------
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to