AVM, I get SA_AIS_ERR_TIMEOUT even when I pass SA_TIME_END as the timeout value. Is this not a bug? the synchronous CheckpointOpen call doesn't work at all in this scenario. It never succeeds.
I can reproduce the problem with sectionCreationAttributes.expirationTime set to SA_TIME_ONE_DAY. You should be able to reproduce the problem with the code I sent in the last e-mail. Alex On 01/06/2014 10:31 PM, A V Mahesh wrote: > Hi Alex, > > CheckpointOpen call failing with SA_AIS_ERR_TIMEOUT NOT a bug , it > is expected if you pass less time out value `timeout = 1000000000` > to saCkptCheckpointOpen(....,timeout ...) call ,when ckpt has very > large data/section. just increasing timeout will avoids the > SA_AIS_ERR_TIMEOUT. > > Let us focus on your original issue/scenario, are you able to > reproduce the problem with sectionCreationAttributes.expirationTime > with SA_TIME_ONE_DAY ? > > -AVM > > On 1/7/2014 1:17 AM, Alex Jones wrote: >> AVM, >> >> I've been playing around with your test program, and have gotten >> it to fail. >> >> I made the following changes: >> >> 1. Change init_dataX to be 1024k bytes, so that you are initializing >> the section to be 1024k. >> 2. Also, don't start the program on node B until A has finished >> writing/creating all the sections. >> 3. Before hitting the enter key on node B, wait for the OpenAsync >> call to finish. >> >> You might notice the CheckpointOpen call failing now with >> SA_AIS_ERR_TIMEOUT. I had to turn this into OpenAsync, and add a >> thread to process CkptDispatch messages. This uncovers another bug >> in OpenAsync. I've attached the mods to your program here. >> >> The OpenAsync callback will be called twice, both times with error >> == SA_AIS_ERR_TIMEOUT. If I call OpenAsync again when I get this >> error, the next callback returns success, but the callback gets >> called twice with success and with two different checkpoint handles! >> >> Alex >> >> >> On 01/06/2014 06:18 AM, A V Mahesh wrote: >>> Hi Alex, >>> >>> I have created 10K sections ( please find the attached test >>> application `Alex_test_node_A_app.c` & `Alex_test_node_B_app.c ` ) >>> with your specified scenario & configuration and I haven't observed any >>> issue with sections on another node. >>> >>> Try to reproduce the problem on your setup & let me know the result . >>> >>> One more importent point how much did you configured >>> `sectionCreationAttributes.expirationTime ` ? >>> I configured SA_TIME_ONE_DAY. >>> >>> Steps to rung the application : >>> >>> =================================================================================================================== >>> >>> Compile : >>> >>> NODE-A# gcc Alex_test_node_A_app.c -o checkpoint_A -lSaCkpt >>> NODE-A# gcc Alex_test_node_B_app.c -o checkpoint_B -lSaCkpt >>> >>> >>> Run : >>> >>> 1) saCkptCheckpointOpen On node A >>> >>> NODE-A# ./checkpoint_A >>> >>> CPSV:CPA:ONsaCkptSectionCreate Waiting to Create Sections >>> safCkpt=test_checkpoint_name1,safApp=safCkptService.... >>> saCkptSectionCreate Press <Enter> key to continue... >>> >>> . >>> 2) saCkptCheckpointOpen() same ckpt On node B >>> >>> NODE-B# ./checkpoint_B >>> >>> CPSV:CPA:ONsaCkptSectionIterationInitialize Waiting to read Sections >>> safCkpt=test_checkpoint_name1,safApp=safCkptService.... >>> saCkptActiveReplicaSet saCkptSectionIterationInitialize Press <Enter> >>> key to continue... >>> >>> >>> 3) saCkptSectionCreate() On node A and read saCkptCheckpointStatusGet() >>> >>> NODE-A# >>> checkpointStatus.numberOfSections : 10000 >>> checkpointStatus.memoryUsed :756000 >>> checkpointCreationAttributes.creationFlags;10 >>> checkpointCreationAttributes.checkpointSize;10240000 >>> checkpointCreationAttributes.retentionDuration;60000000000 >>> checkpointCreationAttributes.maxSections;10000 >>> checkpointCreationAttributes.maxSectionSize;1024 >>> checkpointCreationAttributes.maxSectionIdSize;64 >>> ================================ >>> saCkptCheckpointUnlink / saCkptCheckpointClose / saCkptFinalize Press >>> <Enter> key to continue... >>> saCkptCheckpoint Press <Enter> key to continue... >>> >>> >>> 4) saCkptActiveReplicaSet() & On node B and saCkptCheckpointStatusGet() >>> >>> NODE-B# >>> checkpointStatus.numberOfSections : 10000 >>> checkpointStatus.memoryUsed :756000 >>> checkpointCreationAttributes.creationFlags;10 >>> checkpointCreationAttributes.checkpointSize;10240000 >>> checkpointCreationAttributes.retentionDuration;60000000000 >>> checkpointCreationAttributes.maxSections;10000 >>> checkpointCreationAttributes.maxSectionSize;1024 >>> checkpointCreationAttributes.maxSectionIdSize;64 >>> >>> saCkptCheckpointUnlink / saCkptCheckpointClose / saCkptFinalize Press >>> <Enter> key to continue... >>> saCkptCheckpoint Press <Enter> key to continue.. >>> >>> ================================================================================================================================ >>> >>> -AVM >>> >>> >>> On 1/6/2014 12:32 PM, A V Mahesh wrote: >>>> Hi Alex, >>>> >>>> We never tested the 7500 sections , will test & and let you know , >>>> can you please share your test application , >>>> that allow us to respond quick. >>>> >>>> -AVM >>>> >>>> On 1/3/2014 8:23 PM, Alex Jones wrote: >>>>> Hello All, >>>>> >>>>> I'm experimenting with the checkpoint service, and some things >>>>> don't appear to work. >>>>> >>>>> The saCkptActiveReplicaSet and >>>>> saCkptCheckpointSynchronize[Async] don't appear to work when the >>>>> checkpoint has section numbers greater than around 5500. >>>>> >>>>> I've created a checkpoint with 7500 sections, each section being >>>>> 1024 bytes. The checkpoint is co-located and the "active replica" >>>>> bit is set. >>>>> >>>>> I can create and write all the sections. And from another node >>>>> I run saCkptCheckpointStatusGet, and the information all looks good. >>>>> Everything is there. I see no errors from any CKPT API calls. >>>>> >>>>> The problem comes when I call saCkptActiveReplicaSet from this >>>>> other node. After I do this, saCkptCheckpointStatusGet now returns >>>>> all the same information except the number of sections is no longer >>>>> 7500 but 0. If I do this test with 50,000 sections only about 3,000 >>>>> entries get synced. And iterating through the sections shows that >>>>> there are only 3,000 sections. >>>>> >>>>> Calling saCkptCheckpointSynchronize[Async] in this situation has >>>>> no effect, either. >>>>> >>>>> After looking through the code I see a comment in >>>>> cpnd_evt_proc_ckpt_arep_set that says "/* ###TBD sync up is missing >>>>> with old active if now this fellow is becoming active. */" So, it >>>>> doesn't appear that syncing is being done in the >>>>> saCkptActiveReplicaSet, which it should be. >>>>> >>>>> Can someone comment? >>>>> >>>>> I'm going to fix this and post a patch unless someone else is >>>>> already working on it, but I didn't see a bug for it. >>>>> >>>>> Alex >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> Rapidly troubleshoot problems before they affect your business. Most IT >>>>> organizations don't have a clear picture of how application performance >>>>> affects their revenue. With AppDynamics, you get 100% visibility into >>>>> your >>>>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of >>>>> AppDynamics Pro! >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk >>>>> >>>>> >>>>> _______________________________________________ >>>>> Opensaf-devel mailing list >>>>> Opensaf-devel@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel >> > ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel