Should also mention that this is using the synchronous API calls. > On Jan 7, 2016, at 10:55 AM, Tony Hart <[email protected]> wrote: > > OpenSAF 4.5.1 > > We’re seeing an issue where checkpoints are not syncing between two nodes > (the data in one is different from the other). There are two separate nodes > (A and B) one will have the active instance of the process and the other the > standby instance. The checkpoint is created, opened and initialized in the > active instance’s AMF ACTIVE callback. Then the checkpoint is opened in the > standby instances AMF standby callback (so the standby code does not run > until the active code is done). > > NodeA > on_active() { > > Create a checkpoint with (SA_CKPT_WR_ALL_REPLICAS | > SA_CKPT_CHECKPOINT_COLLOCATED) > Initialize the checkpoint data (first 32 bytes is filled with a pattern) > } > > NodeB > on_standby() { > Open the same checkpoint > Read first 32 bytes and check for fill pattern. > } > > On NodeB what we occasionally see is that the check fails, instead of reading > the fill pattern it see's zeros. It doesn’t matter how long the checkpoint > is left open we never see the fill pattern. > > Hear is a dump of the shared memory file from the two nodes. Our data starts > at 06448 (0xf33d). You can see on the standby copy that its zero. > > Other checkpoints work fine. The difference with this one is that its much > bigger than the others ~20MB, if we increase the size of the checkpoint to > 40MB we see the failure all the time. So the problem seems to be related to > the size of the checkpoint. > > NodeA (active) > $ od -x -N 512 /dev/shm/opensaf_safCkpt\=SwitchMgr__69391_13 > 0000000 000d 0000 0000 0000 0013 6173 4366 706b > 0000020 3d74 7753 7469 6863 674d 5f72 0035 0000 > 0000040 0000 0000 0000 0000 0000 0000 0000 0000 > * > 0000420 0009 0000 0000 0000 0020 02bc 0000 0000 > 0000440 5800 f847 000d 0000 0001 0000 0000 0000 > 0000460 0020 02bc 0000 0000 001a 0000 0000 0000 > 0000500 0004 0000 0000 0000 0000 0000 0001 0000 > 0000520 0000 0000 0101 0000 a031 bc91 0f0f 0001 > 0000540 0000 0000 0000 0000 0000 0000 0000 0000 > * > 0000600 0000 0000 0000 0000 0000 0000 0001 0000 > 0000620 0020 02bc 0000 0000 0000 0000 0000 0000 > 0000640 7f01 568e 0000 0000 f33d b33f 0578 0000 > 0000660 8000 0000 0000 0000 0000 0000 0000 0000 > 0000700 0000 0000 0000 0000 0000 0000 0000 0000 > * > 0001000 > > > NodeB (standby) > $ od -x -N 512 /dev/shm/opensaf_safCkpt\=SwitchMgr__69647_13 > 0000000 000d 0000 0000 0000 0013 6173 4366 706b > 0000020 3d74 7753 7469 6863 674d 5f72 0035 0000 > 0000040 0000 0000 0000 0000 0000 0000 0000 0000 > * > 0000420 0009 0000 0000 0000 0020 02bc 0000 0000 > 0000440 5800 f847 000d 0000 0001 0000 0000 0000 > 0000460 0020 02bc 0000 0000 001a 0000 0000 0000 > 0000500 0004 0000 0000 0000 0000 0000 0001 0000 > 0000520 0000 0000 0101 0000 a031 bc91 0f0f 0001 > 0000540 0000 0000 0000 0000 0000 0000 0000 0000 > * > 0000600 0000 0000 0000 0000 0000 0000 0001 0000 > 0000620 0000 0000 0000 0000 0000 0000 0000 0000 > * > 0001000 > > > ------------------------------------------------------------------------------ > _______________________________________________ > Opensaf-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/opensaf-users
------------------------------------------------------------------------------ _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
