Re: [Pvfs2-developers] terminating state machines

Sam Lang Wed, 26 Jul 2006 14:47:10 -0700


On Jul 26, 2006, at 4:37 PM, Walter B. Ligon III wrote:

Sam Lang wrote:
On Jul 26, 2006, at 3:41 PM, Walter B. Ligon III wrote:
Yeah, the idea is that the SM code would call the job function.Depending on the state actions to do it seems like asking fortrouble, all the details that have to be kept up with.
Actually, there are already job structs used by the SM code, nowI've had to add a context id to the smcb and there will be jobcalls. I think you are right though, the amount of dependencyis pretty small.
As for the job funcs I think I'd need one new one to post theparent job, establishing a counter. The child job would look upthe counter, decrement, and if zero, call job_null to relaunchthe parent, or just
replicate what job_null does, whatever seem the easiest.
I would rather see the parent get relaunched by the normal jobtest code by putting itself in the job completion queue once itsfinished.
That's what I'm talking about.
This could happen in a job_sm_test call like I suggested in myprevious email. Also, instead of a counter that a test functionwould check, and the child state machines would have todecrement, I'd prefer the parent job keep an array of child statemachines (it does this anyway, no?) and check each element in thearray for completion of the state machine. That way the childrenaren't competing to lock the same state to notify of completion,the parent just checks each one.
That's going to be tricky, and probably would perform worse than acounter. The primary problem being that the parent isn't running,so it can't really check anything.

Its not running, but it could work similar to the request schedulercode. A job_sm_post would add the sm job to a pending queue, and thejob_sm_test could be called in job_testcontext, just likePINT_request_scheduler_testworld is called. The job_sm_test callwould check the pending sm job queue (look at each one and check allthe children SMs for completion). Once an sm job is completed, itgets added to the job completion queue, and the while loop thatdrives the state machines will start it up again.

The implicit call is the child's call when it terminates. Theparent's call could be implicit too, or done by the state action.
Doesn't this require child state machines to only function in thechild state machine context? I'd prefer to just have genericstate machines that can be used as a child state machine or as atop-level state machine.
No, not at all. When all state machines terminate they check tosee if they have a parent (SMs started directly as a result of asyscall or request have a NULL parent) and if so they then enterinto the routine to see if they are the last child, and if so theyrelease the parent.

That seems like a needless check. Many state machines don't haveparent's after all. Why not just keep the direction from parent tochild, instead of requiring children to keep a backpointer to theparent?

As of this moment we really haven't taken any pains to keep theSM independent from the job system, in fact you have to have thejob system to drive things, so in some sense its not really anissue.
I vote for making the interfaces as separate as possible. Ifsomeone else wants to use the state machine code somewhere else,it would be nice to allow them to take it as-is (mpich2 guys weretalking about using it, but I think they ended up doing somethingelse). Also, independent layers make testing and debuggingeasier in my view.
I agree, that's why I asked the question. Again, I could do itwithout the job layer at all and quite easily, but if I want theparent to pop out of the job_test call, then I'm going to have tocall some things in the job interface. I could leave it to the SMprogrammer to do that but then the SM really doesn't have acomplete implementation, half of what it does depends on the SMprogrammer. As it is there's already stuff that has to be providedas infrastructure to use the SM, and that's going to includesomething that wakes the SMs when they are done with their currenttask - which is currently the job system, so this isn't adding much.

Just to clarify, I'm only arguing that the state machine code beindependent of the job code (not vice-versa). Adding job_sm_post andjob_sm_test functions that look at state machine pointers shouldprevent the need for state machines to know about jobs.


-sam

In the current code, the sm_p is passed through to the jobdescriptor as a void*, and we just cast back to a sm_p in thewhile loop that does the job_testcontext and then drives thestate machines again. The use of job_status does bring in thejob code into the state machine code, but it seems like mostlyonly the error_code field is used within the state actions, andthe rest of that structure could be independent of the statemachine code.
Yeah, again, that's pretty much what I'm proposing. I don't thinkwe're saying much different.
Walt
-sam
Any more commends?  (Sam I hope this address some of yours)

Walt

Phil Carns wrote:
Walter B. Ligon III wrote:
OK, guys, I have another issue I want input on. When childSMs terminate they have to notify their parent. The parenthas to wait for all the children to terminate. So I've beenthinking to use the job subsystem for this: the parent wouldpost a job to wait for N children,and each child would post a job, the last one releasing theparent.
Now I see two ways to implement this - one is to implementthis directly in the state machine code. The parent simplystops running (because it does not schedule a job yet returnsDEFERRED). Each child decrements a counter, and when it hits0 the parent is restarted. This is a little ugly because thewaiting parent is not being held on any list or queue (up tonow all waiting SMs are in the job subsystem), also the lastterminating child becomes the parent as it starts executingthe parent code. Things can get weird when one SM startschildren that start children, and so on.
Now the other way to implement this is with the job subsystemas I suggested above. Much cleaner except for one thing: upto now the state machine subsystem has had no dependency atall on the job subsystem. If we do it this way, this functiononly works with the job system intact. I'd prefer not to dothis, but it does seem the cleanest, most logical means.
I like the job approach. I guess this is an extra dependencybecause the sms would be calling these particular job functionsimplicitly, rather than relying on the state functions tohandle those posts and releases? We definitely haven't donethat before, but at least in this case the job function thatthe sm infrastructure would be depending on is the simplest onein the arsenal :) It shouldn't be hard for someone toreimplement that particular functionality if they wanted to usethe state machine mechanism in another project.If you weren't planning on these job calls to be implicit, thenI'm not sure where the extra dependency is- we already use jobsto trigger all of the other "normal" transitions.This reminded me of a question, though- is there going to be astandard mechanism for the children to report each of theirindependent error codes to the parent sm? Or do the childrenneed to just keep a reference to the parent sm structure andmanually fill in an array or something?I guess I have a broader question of how data that the childrengenerate (like a handle value or an attr structure) getstransferred to the parent. Does the parent copy this stufffrom the child after the child finishes, or does the child copyit to the parent before it exits? I think we talked aboutthis before at some point but I forgot what the plan is. Itwould be nice if we made the developer define macros orsomething to dictate what the input parameters need to befilled in when invoking a child and what output parameters canbe retrieved when it finishes. Otherwise it starts gettingtricky to remember what fields need to be set in the smstructure before kicking something off.
-Phil
-Phil
--
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
--
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University


_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] terminating state machines

Reply via email to