Hi Anders,

comments  below.

/Neel.

On Tuesday 08 October 2013 02:37 PM, Anders Bjornerstedt wrote:
> About this:
>>> The slave PBE can not be able to do classimplementerset
>>>
>>> Oct  4 15:05:58 Slot-4 osafimmnd[3039]: NO ERR_TRY_AGAIN: ccb 9657 
>>> is active on object cscfRdn=75387 of class neNumber. Can not add 
>>> class applier
>>> Oct  4 15:06:04 Slot-4 last message repeated 10 times
>>> Oct  4 15:06:04 Slot-4 osafimmpbed: ER saImmOiClassImplementerSet 
>>> for neNumber failed 6
>>> Oct  4 15:06:04 Slot-4 osafimmnd[3039]: NO Implementer locally 
>>> disconnected. Marking it as doomed 131 <429, 2020f> (@OpenSafImmPBE)
>>> Oct  4 15:06:04 Slot-4 osafimmnd[3039]: NO Implementer locally 
>>> disconnected. Marking it as doomed 132 <430, 2020f> (OsafImmPbeRt_B)
>>> Oct  4 15:06:04 Slot-4 osafimmnd[3039]: NO Implementer disconnected 
>>> 131 <429, 2020f> (@OpenSafImmPBE)
>>> Oct  4 15:06:04 Slot-4 osafimmnd[3039]: NO Implementer disconnected 
>>> 132 <430, 2020f> (OsafImmPbeRt_B)
>>> Oct  4 15:06:04 Slot-4 osafimmnd[3039]: WA SLAVE PBE process has 
>>> apparently died at non coord
>>> Oct  4 15:06:04 Slot-4 osafimmnd[3039]: NO STARTING SLAVE PBE process.
>> Not a serious problem, assuming it does not happen often.
>> That is, this is a performance problem.
>> The slave will restart and should hopefully succeed in initailizing 
>> the next time.
>> New CCBs will not generate when the imm is not persistent writable 
>> and the imm is not
>> persistent writable in 2PBE wehn not both PBEs are available.
>> So this problem should dissapear once ccb 9657 has been aborted.
> Just realized that something similar to the above probably could cause 
> problems.
> In the above scenario I assume you had some kind of loop generating 
> ccbs repeatedly
> (in sequence) where they are applied.  The above ccb would then get 
> aborted inside
> its attempt to apply and then next ccb would be rejected at the 
> oepration level (before apply).
>
The use case is having a loop, which generated CCBs repeatedly.
> But if you simply had a lingering CCB, not being applied, jsut lingering.
> Think of an operator starting something and then going for coffee.
> Then that *would* currently prevent the slave from rejoining.
> I need to add a mechnism similar to that done in imm-sync, where on-going
> (non critical) ccbs are geiven a period of grace and then aborted from 
> below by the imm.
> No period of grace would be involved in this new case since all such 
> non critical ccbs
> are doomed anyway. Another possible solution and probably simpler is 
> to allow
> the 2PBE-applier to attach even when there is an on going ccb, i.e. 
> relax the above
> guard for only this 2PBE applier.
Allowing 2PBE-applier will be one of the good approach.

I have one more comment:

Allowing calls for each ccb operations also to 2PBE-applier, will avoid 
setting of classImplementer for each config class for 2PBE-applier.

> That of course means that the slave PBE would risk having
> missed receving some operations included in that on-going non critical 
> ccb.
> But this wouold be caught in the apply of that ccb. The prepare 
> protocol between the
> PBEs would timneout and the ccb get aborted because ther oepration 
> count would never
> be complete at the slave.
>
> /AndersBj


------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to