AVM,

     I get SA_AIS_ERR_TIMEOUT even when I pass SA_TIME_END as the 
timeout value.  Is this not a bug?  the synchronous CheckpointOpen call 
doesn't work at all in this scenario.  It never succeeds.

     I can reproduce the problem with 
sectionCreationAttributes.expirationTime set to SA_TIME_ONE_DAY.

     You should be able to reproduce the problem with the code I sent in 
the last e-mail.

Alex

On 01/06/2014 10:31 PM, A V Mahesh wrote:
> Hi Alex,
>
> CheckpointOpen call failing with SA_AIS_ERR_TIMEOUT   NOT a bug , it 
> is expected if you pass  less time out value `timeout = 1000000000`
> to saCkptCheckpointOpen(....,timeout ...) call ,when ckpt has very 
> large data/section. just increasing timeout will avoids the 
> SA_AIS_ERR_TIMEOUT.
>
> Let us focus on your original issue/scenario, are you able to 
> reproduce the  problem with sectionCreationAttributes.expirationTime 
> with SA_TIME_ONE_DAY ?
>
> -AVM
>
> On 1/7/2014 1:17 AM, Alex Jones wrote:
>> AVM,
>>
>>     I've been playing around with your test program, and have gotten 
>> it to fail.
>>
>>     I made the following changes:
>>
>>  1. Change init_dataX to be 1024k bytes, so that you are initializing
>>     the section to be 1024k.
>>  2. Also, don't start the program on node B until A has finished
>>     writing/creating all the sections.
>>  3. Before hitting the enter key on node B, wait for the OpenAsync
>>     call to finish.
>>
>>     You might notice the CheckpointOpen call failing now with 
>> SA_AIS_ERR_TIMEOUT.  I had to turn this into OpenAsync, and add a 
>> thread to process CkptDispatch messages.  This uncovers another bug 
>> in OpenAsync.  I've attached the mods to your program here.
>>
>>    The OpenAsync callback will be called twice, both times with error 
>> == SA_AIS_ERR_TIMEOUT.  If I call OpenAsync again when I get this 
>> error, the next callback returns success, but the callback gets 
>> called twice with success and with two different checkpoint handles!
>>
>> Alex
>>
>>
>> On 01/06/2014 06:18 AM, A V Mahesh wrote:
>>> Hi Alex,
>>>
>>> I have  created 10K sections  ( please find the attached test
>>> application  `Alex_test_node_A_app.c`  & `Alex_test_node_B_app.c ` )
>>> with your specified scenario & configuration and I haven't observed any
>>> issue with  sections  on another node.
>>>
>>> Try to reproduce the problem on your setup & let me know the result .
>>>
>>> One more importent point how much did you configured
>>> `sectionCreationAttributes.expirationTime `  ?
>>> I configured  SA_TIME_ONE_DAY.
>>>
>>> Steps to rung the application :
>>>
>>> ===================================================================================================================
>>>
>>> Compile :
>>>
>>> NODE-A# gcc Alex_test_node_A_app.c -o checkpoint_A -lSaCkpt
>>> NODE-A# gcc Alex_test_node_B_app.c -o checkpoint_B -lSaCkpt
>>>
>>>
>>> Run :
>>>
>>> 1) saCkptCheckpointOpen On node A
>>>
>>> NODE-A# ./checkpoint_A
>>>
>>> CPSV:CPA:ONsaCkptSectionCreate  Waiting to Create Sections
>>> safCkpt=test_checkpoint_name1,safApp=safCkptService....
>>> saCkptSectionCreate Press <Enter> key to continue...
>>>
>>> .
>>> 2) saCkptCheckpointOpen() same ckpt On node B
>>>
>>> NODE-B# ./checkpoint_B
>>>
>>> CPSV:CPA:ONsaCkptSectionIterationInitialize Waiting to read Sections
>>> safCkpt=test_checkpoint_name1,safApp=safCkptService....
>>> saCkptActiveReplicaSet saCkptSectionIterationInitialize Press <Enter>
>>> key to continue...
>>>
>>>
>>> 3) saCkptSectionCreate() On node A  and read saCkptCheckpointStatusGet()
>>>
>>> NODE-A#
>>>    checkpointStatus.numberOfSections : 10000
>>>    checkpointStatus.memoryUsed :756000
>>>     checkpointCreationAttributes.creationFlags;10
>>>    checkpointCreationAttributes.checkpointSize;10240000
>>>    checkpointCreationAttributes.retentionDuration;60000000000
>>>    checkpointCreationAttributes.maxSections;10000
>>>    checkpointCreationAttributes.maxSectionSize;1024
>>>    checkpointCreationAttributes.maxSectionIdSize;64
>>>    ================================
>>> saCkptCheckpointUnlink / saCkptCheckpointClose / saCkptFinalize Press
>>> <Enter> key to continue...
>>> saCkptCheckpoint Press <Enter> key to continue...
>>>
>>>
>>> 4) saCkptActiveReplicaSet() & On node B  and saCkptCheckpointStatusGet()
>>>
>>> NODE-B#
>>>    checkpointStatus.numberOfSections : 10000
>>>    checkpointStatus.memoryUsed :756000
>>>     checkpointCreationAttributes.creationFlags;10
>>>    checkpointCreationAttributes.checkpointSize;10240000
>>>    checkpointCreationAttributes.retentionDuration;60000000000
>>>    checkpointCreationAttributes.maxSections;10000
>>>    checkpointCreationAttributes.maxSectionSize;1024
>>>    checkpointCreationAttributes.maxSectionIdSize;64
>>>
>>>    saCkptCheckpointUnlink / saCkptCheckpointClose / saCkptFinalize Press
>>> <Enter> key to continue...
>>>    saCkptCheckpoint Press <Enter> key to continue..
>>>
>>> ================================================================================================================================
>>>
>>> -AVM
>>>
>>>
>>> On 1/6/2014 12:32 PM, A V Mahesh wrote:
>>>> Hi Alex,
>>>>
>>>> We never tested the  7500 sections , will test & and let you know ,
>>>> can you please share your test application ,
>>>>   that allow us to respond quick.
>>>>
>>>> -AVM
>>>>
>>>> On 1/3/2014 8:23 PM, Alex Jones wrote:
>>>>> Hello All,
>>>>>
>>>>>       I'm experimenting with the checkpoint service, and some things
>>>>> don't appear to work.
>>>>>
>>>>>       The saCkptActiveReplicaSet and
>>>>> saCkptCheckpointSynchronize[Async] don't appear to work when the
>>>>> checkpoint has section numbers greater than around 5500.
>>>>>
>>>>>       I've created a checkpoint with 7500 sections, each section being
>>>>> 1024 bytes.  The checkpoint is co-located and the "active replica"
>>>>> bit is set.
>>>>>
>>>>>       I can create and write all the sections.  And from another node
>>>>> I run saCkptCheckpointStatusGet, and the information all looks good.
>>>>> Everything is there.  I see no errors from any CKPT API calls.
>>>>>
>>>>>       The problem comes when I call saCkptActiveReplicaSet from this
>>>>> other node.  After I do this, saCkptCheckpointStatusGet now returns
>>>>> all the same information except the number of sections is no longer
>>>>> 7500 but 0.  If I do this test with 50,000 sections only about 3,000
>>>>> entries get synced.  And iterating through the sections shows that
>>>>> there are only 3,000 sections.
>>>>>
>>>>>       Calling saCkptCheckpointSynchronize[Async] in this situation has
>>>>> no effect, either.
>>>>>
>>>>>       After looking through the code I see a comment in
>>>>> cpnd_evt_proc_ckpt_arep_set that says "/* ###TBD sync up is missing
>>>>> with old active if now this fellow is becoming active. */"  So, it
>>>>> doesn't appear that syncing is being done in the
>>>>> saCkptActiveReplicaSet, which it should be.
>>>>>
>>>>>       Can someone comment?
>>>>>
>>>>>       I'm going to fix this and post a patch unless someone else is
>>>>> already working on it, but I didn't see a bug for it.
>>>>>
>>>>> Alex
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>>
>>>>> Rapidly troubleshoot problems before they affect your business. Most IT
>>>>> organizations don't have a clear picture of how application performance
>>>>> affects their revenue. With AppDynamics, you get 100% visibility into
>>>>> your
>>>>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of
>>>>> AppDynamics Pro!
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
>>>>>   
>>>>>
>>>>> _______________________________________________
>>>>> Opensaf-devel mailing list
>>>>> Opensaf-devel@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>>
>

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to