Should also mention that this is using the synchronous API calls.

> On Jan 7, 2016, at 10:55 AM, Tony Hart <[email protected]> wrote:
> 
> OpenSAF 4.5.1
> 
> We’re seeing an issue where checkpoints are not syncing between two nodes 
> (the data in one is different from the other).   There are two separate nodes 
> (A and B) one will have the active instance of the process and the other the 
> standby instance.  The checkpoint is created, opened and initialized in the 
> active instance’s AMF ACTIVE callback.  Then the checkpoint is opened in the 
> standby instances AMF standby callback (so the standby code does not run 
> until the active code is done).
> 
> NodeA
> on_active() {
> 
>    Create a checkpoint with (SA_CKPT_WR_ALL_REPLICAS | 
> SA_CKPT_CHECKPOINT_COLLOCATED)
>    Initialize the checkpoint data (first 32 bytes is filled with a pattern)
> }
> 
> NodeB
> on_standby() {
>    Open the same checkpoint
>    Read first 32 bytes and check for fill pattern.
> }
> 
> On NodeB what we occasionally see is that the check fails, instead of reading 
> the fill pattern it see's zeros.  It doesn’t matter how long the checkpoint 
> is left open we never see the fill pattern.
> 
> Hear is a dump of the shared memory file from the two nodes.  Our data starts 
> at 06448 (0xf33d).  You can see on the standby copy that its zero.
> 
> Other checkpoints work fine.  The difference with this one is that its much 
> bigger than the others ~20MB, if we increase the size of the checkpoint to 
> 40MB we see the failure all the time.  So the problem seems to be related to 
> the size of the checkpoint.
> 
> NodeA (active)
> $ od -x -N 512 /dev/shm/opensaf_safCkpt\=SwitchMgr__69391_13
> 0000000 000d 0000 0000 0000 0013 6173 4366 706b
> 0000020 3d74 7753 7469 6863 674d 5f72 0035 0000
> 0000040 0000 0000 0000 0000 0000 0000 0000 0000
> *
> 0000420 0009 0000 0000 0000 0020 02bc 0000 0000
> 0000440 5800 f847 000d 0000 0001 0000 0000 0000
> 0000460 0020 02bc 0000 0000 001a 0000 0000 0000
> 0000500 0004 0000 0000 0000 0000 0000 0001 0000
> 0000520 0000 0000 0101 0000 a031 bc91 0f0f 0001
> 0000540 0000 0000 0000 0000 0000 0000 0000 0000
> *
> 0000600 0000 0000 0000 0000 0000 0000 0001 0000
> 0000620 0020 02bc 0000 0000 0000 0000 0000 0000
> 0000640 7f01 568e 0000 0000 f33d b33f 0578 0000
> 0000660 8000 0000 0000 0000 0000 0000 0000 0000
> 0000700 0000 0000 0000 0000 0000 0000 0000 0000
> *
> 0001000
> 
> 
> NodeB (standby)
> $ od -x -N 512 /dev/shm/opensaf_safCkpt\=SwitchMgr__69647_13
> 0000000 000d 0000 0000 0000 0013 6173 4366 706b
> 0000020 3d74 7753 7469 6863 674d 5f72 0035 0000
> 0000040 0000 0000 0000 0000 0000 0000 0000 0000
> *
> 0000420 0009 0000 0000 0000 0020 02bc 0000 0000
> 0000440 5800 f847 000d 0000 0001 0000 0000 0000
> 0000460 0020 02bc 0000 0000 001a 0000 0000 0000
> 0000500 0004 0000 0000 0000 0000 0000 0001 0000
> 0000520 0000 0000 0101 0000 a031 bc91 0f0f 0001
> 0000540 0000 0000 0000 0000 0000 0000 0000 0000
> *
> 0000600 0000 0000 0000 0000 0000 0000 0001 0000
> 0000620 0000 0000 0000 0000 0000 0000 0000 0000
> *
> 0001000
> 
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Opensaf-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-users

------------------------------------------------------------------------------
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to