Re: [Pvfs2-developers] terminating state machines

Phil Carns Wed, 26 Jul 2006 14:52:54 -0700

Sam Lang wrote:

On Jul 26, 2006, at 3:41 PM, Walter B. Ligon III wrote:
Yeah, the idea is that the SM code would call the job function.Depending on the state actions to do it seems like asking fortrouble, all the details that have to be kept up with.
Actually, there are already job structs used by the SM code, now I'vehad to add a context id to the smcb and there will be job calls. Ithink you are right though, the amount of dependency is pretty small.
As for the job funcs I think I'd need one new one to post the parentjob, establishing a counter. The child job would look up thecounter, decrement, and if zero, call job_null to relaunch theparent, or just
replicate what job_null does, whatever seem the easiest.
I would rather see the parent get relaunched by the normal job testcode by putting itself in the job completion queue once its finished.This could happen in a job_sm_test call like I suggested in my previousemail. Also, instead of a counter that a test function would check,and the child state machines would have to decrement, I'd prefer theparent job keep an array of child state machines (it does this anyway,no?) and check each element in the array for completion of the statemachine. That way the children aren't competing to lock the same stateto notify of completion, the parent just checks each one.

There doesn't need to be any locking- the main server thread onlyexecutes one state function or one transition at a time. The counteralso doesn't need to be visible- it could be hidden inside the job call,which could lock or not lock as it sees fit.

The parent also couldn't be the one checking the elements in an arraylike that - it would have to be done from within the job code somewhere(which I think you described in your previous email). That means thatsomewhere in the job code (or request scheduler, etc.) something willhave to do the following on every job_testcontext() call:


for each active sm
        for each child within that sm
                check state

Which could get expensive depending on how extensively we use thechild/parallel sm model.

The implicit call is the child's call when it terminates. Theparent's call could be implicit too, or done by the state action.
Doesn't this require child state machines to only function in the childstate machine context? I'd prefer to just have generic state machinesthat can be used as a child state machine or as a top-level state machine.

I would prefer that too :) Is this going to work Walt? It would benice if the state machine processing code handled transparentlytriggering different termination functions depending on whether it was atop level sm or not without the state functions themselves knowing anybetter.

As of this moment we really haven't taken any pains to keep the SMindependent from the job system, in fact you have to have the jobsystem to drive things, so in some sense its not really an issue.
I vote for making the interfaces as separate as possible. If someoneelse wants to use the state machine code somewhere else, it would benice to allow them to take it as-is (mpich2 guys were talking aboutusing it, but I think they ended up doing something else). Also,independent layers make testing and debugging easier in my view.
In the current code, the sm_p is passed through to the job descriptoras a void*, and we just cast back to a sm_p in the while loop that doesthe job_testcontext and then drives the state machines again. The useof job_status does bring in the job code into the state machine code,but it seems like mostly only the error_code field is used within thestate actions, and the rest of that structure could be independent ofthe state machine code.
-sam
Any more commends?  (Sam I hope this address some of yours)

Walt

Phil Carns wrote:
Walter B. Ligon III wrote:
OK, guys, I have another issue I want input on. When child SMsterminate they have to notify their parent. The parent has to waitfor all the children to terminate. So I've been thinking to usethe job subsystem for this: the parent would post a job to wait forN children,
and each child would post a job, the last one releasing the parent.
Now I see two ways to implement this - one is to implement thisdirectly in the state machine code. The parent simply stopsrunning (because it does not schedule a job yet returns DEFERRED).Each child decrements a counter, and when it hits 0 the parent isrestarted. This is a little ugly because the waiting parent is notbeing held on any list or queue (up to now all waiting SMs are inthe job subsystem), also the last terminating child becomes theparent as it starts executing the parent code. Things can getweird when one SM starts children that start children, and so on.
Now the other way to implement this is with the job subsystem as Isuggested above. Much cleaner except for one thing: up to now thestate machine subsystem has had no dependency at all on the jobsubsystem. If we do it this way, this function only works with thejob system intact. I'd prefer not to do this, but it does seem thecleanest, most logical means.
I like the job approach. I guess this is an extra dependencybecause the sms would be calling these particular job functionsimplicitly, rather than relying on the state functions to handlethose posts and releases? We definitely haven't done that before,but at least in this case the job function that the sminfrastructure would be depending on is the simplest one in thearsenal :) It shouldn't be hard for someone to reimplement thatparticular functionality if they wanted to use the state machinemechanism in another project.If you weren't planning on these job calls to be implicit, then I'mnot sure where the extra dependency is- we already use jobs totrigger all of the other "normal" transitions.This reminded me of a question, though- is there going to be astandard mechanism for the children to report each of theirindependent error codes to the parent sm? Or do the children needto just keep a reference to the parent sm structure and manuallyfill in an array or something?I guess I have a broader question of how data that the childrengenerate (like a handle value or an attr structure) gets transferredto the parent. Does the parent copy this stuff from the child afterthe child finishes, or does the child copy it to the parent beforeit exits? I think we talked about this before at some point but Iforgot what the plan is. It would be nice if we made the developerdefine macros or something to dictate what the input parameters needto be filled in when invoking a child and what output parameters canbe retrieved when it finishes. Otherwise it starts getting trickyto remember what fields need to be set in the sm structure beforekicking something off.
-Phil
-Phil
--
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] terminating state machines

Reply via email to