On Jul 26, 2006, at 3:41 PM, Walter B. Ligon III wrote:
Yeah, the idea is that the SM code would call the job function.
Depending on the state actions to do it seems like asking for
trouble, all the details that have to be kept up with.
Actually, there are already job structs used by the SM code, now
I've had to add a context id to the smcb and there will be job
calls. I think you are right though, the amount of dependency is
pretty small.
As for the job funcs I think I'd need one new one to post the
parent job, establishing a counter. The child job would look up
the counter, decrement, and if zero, call job_null to relaunch the
parent, or just
replicate what job_null does, whatever seem the easiest.
I would rather see the parent get relaunched by the normal job test
code by putting itself in the job completion queue once its
finished. This could happen in a job_sm_test call like I suggested
in my previous email. Also, instead of a counter that a test
function would check, and the child state machines would have to
decrement, I'd prefer the parent job keep an array of child state
machines (it does this anyway, no?) and check each element in the
array for completion of the state machine. That way the children
aren't competing to lock the same state to notify of completion, the
parent just checks each one.
The implicit call is the child's call when it terminates. The
parent's call could be implicit too, or done by the state action.
Doesn't this require child state machines to only function in the
child state machine context? I'd prefer to just have generic state
machines that can be used as a child state machine or as a top-level
state machine.
As of this moment we really haven't taken any pains to keep the SM
independent from the job system, in fact you have to have the job
system to drive things, so in some sense its not really an issue.
I vote for making the interfaces as separate as possible. If someone
else wants to use the state machine code somewhere else, it would be
nice to allow them to take it as-is (mpich2 guys were talking about
using it, but I think they ended up doing something else). Also,
independent layers make testing and debugging easier in my view.
In the current code, the sm_p is passed through to the job descriptor
as a void*, and we just cast back to a sm_p in the while loop that
does the job_testcontext and then drives the state machines again.
The use of job_status does bring in the job code into the state
machine code, but it seems like mostly only the error_code field is
used within the state actions, and the rest of that structure could
be independent of the state machine code.
-sam
Any more commends? (Sam I hope this address some of yours)
Walt
Phil Carns wrote:
Walter B. Ligon III wrote:
OK, guys, I have another issue I want input on. When child SMs
terminate they have to notify their parent. The parent has to
wait for all the children to terminate. So I've been thinking to
use the job subsystem for this: the parent would post a job to
wait for N children,
and each child would post a job, the last one releasing the parent.
Now I see two ways to implement this - one is to implement this
directly in the state machine code. The parent simply stops
running (because it does not schedule a job yet returns
DEFERRED). Each child decrements a counter, and when it hits 0
the parent is restarted. This is a little ugly because the
waiting parent is not being held on any list or queue (up to now
all waiting SMs are in the job subsystem), also the last
terminating child becomes the parent as it starts executing the
parent code. Things can get weird when one SM starts children
that start children, and so on.
Now the other way to implement this is with the job subsystem as
I suggested above. Much cleaner except for one thing: up to now
the state machine subsystem has had no dependency at all on the
job subsystem. If we do it this way, this function only works
with the job system intact. I'd prefer not to do this, but it
does seem the cleanest, most logical means.
I like the job approach. I guess this is an extra dependency
because the sms would be calling these particular job functions
implicitly, rather than relying on the state functions to handle
those posts and releases? We definitely haven't done that before,
but at least in this case the job function that the sm
infrastructure would be depending on is the simplest one in the
arsenal :) It shouldn't be hard for someone to reimplement that
particular functionality if they wanted to use the state machine
mechanism in another project.
If you weren't planning on these job calls to be implicit, then
I'm not sure where the extra dependency is- we already use jobs to
trigger all of the other "normal" transitions.
This reminded me of a question, though- is there going to be a
standard mechanism for the children to report each of their
independent error codes to the parent sm? Or do the children need
to just keep a reference to the parent sm structure and manually
fill in an array or something?
I guess I have a broader question of how data that the children
generate (like a handle value or an attr structure) gets
transferred to the parent. Does the parent copy this stuff from
the child after the child finishes, or does the child copy it to
the parent before it exits? I think we talked about this before
at some point but I forgot what the plan is. It would be nice if
we made the developer define macros or something to dictate what
the input parameters need to be filled in when invoking a child
and what output parameters can be retrieved when it finishes.
Otherwise it starts getting tricky to remember what fields need to
be set in the sm structure before kicking something off.
-Phil
-Phil
--
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers