Hi Alex,

On 1/8/2014 4:16 AM, Alex Jones wrote:
>       I can create and write all the sections.  And from another node
> I run saCkptCheckpointStatusGet, and the information all looks good.
> Everything is there.  I see no errors from any CKPT API calls.
>
>       The problem comes when I call saCkptActiveReplicaSet from this
> other node.  After I do this, saCkptCheckpointStatusGet now returns
> all the same information except the number of sections is no longer
> 7500 but 0.  If I do this test with 50,000 sections only about 3,000
> entries get synced.  And iterating through the sections shows that
> there are only 3,000 sections.

Let us go one by one , let me fist address your initial issue , where 
you Opened checkpoint on  both node fist and  create and write all the 
sections on node A , then call saCkptActiveReplicaSet from this other 
node B and the saCkptCheckpointStatusGet on node B returns all the same 
information except the number of sections is no longer 7500 but 0.

Is this problem reproducible with 
sectionCreationAttributes.expirationTime set to SA_TIME_ONE_DAY ?

Let me fist resolve your initial problem , this fix will resolve 
synchronization issue between the nodes ,
so most of the issue will be resolved.

-AVM

On 1/7/2014 5:54 PM, Alex Jones wrote:
> AVM,
>
>     I get SA_AIS_ERR_TIMEOUT even when I pass SA_TIME_END as the 
> timeout value.  Is this not a bug?  the synchronous CheckpointOpen 
> call doesn't work at all in this scenario.  It never succeeds.
>
>     I can reproduce the problem with 
> sectionCreationAttributes.expirationTime set to SA_TIME_ONE_DAY.
>
>     You should be able to reproduce the problem with the code I sent 
> in the last e-mail.
>
> Alex
>
> On 01/06/2014 10:31 PM, A V Mahesh wrote:
>> Hi Alex,
>>
>> CheckpointOpen call failing with SA_AIS_ERR_TIMEOUT   NOT a bug , it 
>> is expected if you pass  less time out value `timeout = 1000000000`
>> to saCkptCheckpointOpen(....,timeout ...) call ,when ckpt has very 
>> large data/section. just increasing timeout will avoids the 
>> SA_AIS_ERR_TIMEOUT.
>>
>> Let us focus on your original issue/scenario, are you able to 
>> reproduce the  problem with sectionCreationAttributes.expirationTime 
>> with SA_TIME_ONE_DAY ?
>>
>> -AVM
>>
>> On 1/7/2014 1:17 AM, Alex Jones wrote:
>>> AVM,
>>>
>>>     I've been playing around with your test program, and have gotten 
>>> it to fail.
>>>
>>>     I made the following changes:
>>>
>>>  1. Change init_dataX to be 1024k bytes, so that you are
>>>     initializing the section to be 1024k.
>>>  2. Also, don't start the program on node B until A has finished
>>>     writing/creating all the sections.
>>>  3. Before hitting the enter key on node B, wait for the OpenAsync
>>>     call to finish.
>>>
>>>     You might notice the CheckpointOpen call failing now with 
>>> SA_AIS_ERR_TIMEOUT.  I had to turn this into OpenAsync, and add a 
>>> thread to process CkptDispatch messages.  This uncovers another bug 
>>> in OpenAsync.  I've attached the mods to your program here.
>>>
>>>    The OpenAsync callback will be called twice, both times with 
>>> error == SA_AIS_ERR_TIMEOUT.  If I call OpenAsync again when I get 
>>> this error, the next callback returns success, but the callback gets 
>>> called twice with success and with two different checkpoint handles!
>>>
>>> Alex
>>>
>>>
>>> On 01/06/2014 06:18 AM, A V Mahesh wrote:
>>>> Hi Alex,
>>>>
>>>> I have  created 10K sections  ( please find the attached test
>>>> application  `Alex_test_node_A_app.c`  & `Alex_test_node_B_app.c ` )
>>>> with your specified scenario & configuration and I haven't observed any
>>>> issue with  sections  on another node.
>>>>
>>>> Try to reproduce the problem on your setup & let me know the result .
>>>>
>>>> One more importent point how much did you configured
>>>> `sectionCreationAttributes.expirationTime `  ?
>>>> I configured  SA_TIME_ONE_DAY.
>>>>
>>>> Steps to rung the application :
>>>>
>>>> ===================================================================================================================
>>>>
>>>> Compile :
>>>>
>>>> NODE-A# gcc Alex_test_node_A_app.c -o checkpoint_A -lSaCkpt
>>>> NODE-A# gcc Alex_test_node_B_app.c -o checkpoint_B -lSaCkpt
>>>>
>>>>
>>>> Run :
>>>>
>>>> 1) saCkptCheckpointOpen On node A
>>>>
>>>> NODE-A# ./checkpoint_A
>>>>
>>>> CPSV:CPA:ONsaCkptSectionCreate  Waiting to Create Sections
>>>> safCkpt=test_checkpoint_name1,safApp=safCkptService....
>>>> saCkptSectionCreate Press <Enter> key to continue...
>>>>
>>>> .
>>>> 2) saCkptCheckpointOpen() same ckpt On node B
>>>>
>>>> NODE-B# ./checkpoint_B
>>>>
>>>> CPSV:CPA:ONsaCkptSectionIterationInitialize Waiting to read Sections
>>>> safCkpt=test_checkpoint_name1,safApp=safCkptService....
>>>> saCkptActiveReplicaSet saCkptSectionIterationInitialize Press <Enter>
>>>> key to continue...
>>>>
>>>>
>>>> 3) saCkptSectionCreate() On node A  and read saCkptCheckpointStatusGet()
>>>>
>>>> NODE-A#
>>>>    checkpointStatus.numberOfSections : 10000
>>>>    checkpointStatus.memoryUsed :756000
>>>>     checkpointCreationAttributes.creationFlags;10
>>>>    checkpointCreationAttributes.checkpointSize;10240000
>>>>    checkpointCreationAttributes.retentionDuration;60000000000
>>>>    checkpointCreationAttributes.maxSections;10000
>>>>    checkpointCreationAttributes.maxSectionSize;1024
>>>>    checkpointCreationAttributes.maxSectionIdSize;64
>>>>    ================================
>>>> saCkptCheckpointUnlink / saCkptCheckpointClose / saCkptFinalize Press
>>>> <Enter> key to continue...
>>>> saCkptCheckpoint Press <Enter> key to continue...
>>>>
>>>>
>>>> 4) saCkptActiveReplicaSet() & On node B  and saCkptCheckpointStatusGet()
>>>>
>>>> NODE-B#
>>>>    checkpointStatus.numberOfSections : 10000
>>>>    checkpointStatus.memoryUsed :756000
>>>>     checkpointCreationAttributes.creationFlags;10
>>>>    checkpointCreationAttributes.checkpointSize;10240000
>>>>    checkpointCreationAttributes.retentionDuration;60000000000
>>>>    checkpointCreationAttributes.maxSections;10000
>>>>    checkpointCreationAttributes.maxSectionSize;1024
>>>>    checkpointCreationAttributes.maxSectionIdSize;64
>>>>
>>>>    saCkptCheckpointUnlink / saCkptCheckpointClose / saCkptFinalize Press
>>>> <Enter> key to continue...
>>>>    saCkptCheckpoint Press <Enter> key to continue..
>>>>
>>>> ================================================================================================================================
>>>>
>>>> -AVM
>>>>
>>>>
>>>> On 1/6/2014 12:32 PM, A V Mahesh wrote:
>>>>> Hi Alex,
>>>>>
>>>>> We never tested the  7500 sections , will test & and let you know ,
>>>>> can you please share your test application ,
>>>>>   that allow us to respond quick.
>>>>>
>>>>> -AVM
>>>>>
>>>>> On 1/3/2014 8:23 PM, Alex Jones wrote:
>>>>>> Hello All,
>>>>>>
>>>>>>       I'm experimenting with the checkpoint service, and some things
>>>>>> don't appear to work.
>>>>>>
>>>>>>       The saCkptActiveReplicaSet and
>>>>>> saCkptCheckpointSynchronize[Async] don't appear to work when the
>>>>>> checkpoint has section numbers greater than around 5500.
>>>>>>
>>>>>>       I've created a checkpoint with 7500 sections, each section being
>>>>>> 1024 bytes.  The checkpoint is co-located and the "active replica"
>>>>>> bit is set.
>>>>>>
>>>>>>       I can create and write all the sections.  And from another node
>>>>>> I run saCkptCheckpointStatusGet, and the information all looks good.
>>>>>> Everything is there.  I see no errors from any CKPT API calls.
>>>>>>
>>>>>>       The problem comes when I call saCkptActiveReplicaSet from this
>>>>>> other node.  After I do this, saCkptCheckpointStatusGet now returns
>>>>>> all the same information except the number of sections is no longer
>>>>>> 7500 but 0.  If I do this test with 50,000 sections only about 3,000
>>>>>> entries get synced.  And iterating through the sections shows that
>>>>>> there are only 3,000 sections.
>>>>>>
>>>>>>       Calling saCkptCheckpointSynchronize[Async] in this situation has
>>>>>> no effect, either.
>>>>>>
>>>>>>       After looking through the code I see a comment in
>>>>>> cpnd_evt_proc_ckpt_arep_set that says "/* ###TBD sync up is missing
>>>>>> with old active if now this fellow is becoming active. */"  So, it
>>>>>> doesn't appear that syncing is being done in the
>>>>>> saCkptActiveReplicaSet, which it should be.
>>>>>>
>>>>>>       Can someone comment?
>>>>>>
>>>>>>       I'm going to fix this and post a patch unless someone else is
>>>>>> already working on it, but I didn't see a bug for it.
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>>
>>>>>> Rapidly troubleshoot problems before they affect your business. Most IT
>>>>>> organizations don't have a clear picture of how application performance
>>>>>> affects their revenue. With AppDynamics, you get 100% visibility into
>>>>>> your
>>>>>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of
>>>>>> AppDynamics Pro!
>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
>>>>>>   
>>>>>>
>>>>>> _______________________________________________
>>>>>> Opensaf-devel mailing list
>>>>>> Opensaf-devel@lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>>>
>>
>

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to