Re: [Pvfs2-developers] terminating state machines

2006-07-27 Thread Phil Carns


Hmm...I had been thinking about a flow implementation that used the  new 
concurrent state machine code...it sounds like that's a bad idea  
because the testing and restarting would take too long to switch  
between bmi and trove?  We use the post/test model through pvfs2  
though, so maybe I don't understand the issue.


I don't think that is bad idea.  There were really two seperate but 
related problems in one of the older flow protocol implementations, I 
can try to describe them a little more here if I can remember:


- explicitly tracking and testing each trove and bmi operation: It 
basically kept arrays that listed pending trove and bmi ops, and would 
call testsome() to service them.  This was a problem because the time it 
took to keep running up and down those arrays (when building them at the 
flow level, or when testing them at the trove/bmi level).  The solution 
is to just use testcontext() and let trove/bmi tell you when something 
finishes without managing extra state.


- thread switch time: the architecture here was set up at one time to 
have one thread pushing the test functions for bmi, another thread 
pushing the test functions for trove, while another thread was 
processing the flow and posting new operations.  The problem here is 
that it (at the time) took too long to jump between the pushing 
threads and the processing thread when an operation finished that 
should trigger progress on the flow. This led to the thread-mgr.c code 
and associated callbacks.  The callbacks actually drive the flow 
progress and post new operations.  That means that the same thread that 
pushes testcontext() gets to trigger the next post, without waiting on 
the latency of waking up a different thread to do something (using 
condition variable etc.).  I managed to reuse the thread-mgr for the job 
code as well, so that one testcontext() call triggers callbacks to both 
the job and flow interfaces.


I don't think either of the above issues precludes different flow 
protocol implementations, and they are really kind of orthogonal to 
whether state machines are used or not.  The first issue is solved just 
by using testcontext() rather than manually tracking operations.


The second issue could be solved in a variety of ways, some of which may 
 be better than what we have now.  The callback approach is effecient 
enough, but is hard to debug.  Of course it is also possible that the 
thread switch (ie. condition signal) latency is low enough nowadays that 
you don't even need to worry about it anymore.  I last looked at this 
problem before NPTL arrived on the scene.


At any rate I think a state machine based flow protocol could dodge 
issue #2 by either:

- lucking out with a faster modern thread implementation
- being smarter about how thread work is divided up
- using callbacks as we do now, and making the state machine mechanism 
thread safe so that it can be driven directly from those callbacks 
rather than from a testcontext() work loop


On a related note, it is important to remember that trove has its own 
internal thread also- so on the trove push side (depending on your 
design) you could have to worry about a chain of 2 threads that have to 
be woken up to get something done at completion time.  The trove part of 
that chain can't be avoided without changing the API.


Sorry about the tangent here, but I figured I may as well share some 
warnings about things to look out for here.  I think it would be good to 
have a cleaner flow protocol implementation.


I think I'm lost now.  What do you mean by replace?  The states are  
still isolated, jobs trigger the transitions, only one state action  
gets executed at a time, there still may be a time gap between  
completion of any given child and when the parent picks up  processing 
again, and there are still frames.  I think both  approaches will look 
the same when running unless I missed  something.  If Walt puts a 
longjmp() in there we can both hit him  over the head.



Heh.  Don't give him ideas! ;-)

I was operating under the constraint that a state machine can only  post 
a job for itself.  If I understand the current plan correctly,  using 
job_null in the child state machine to post a job for the  parent breaks 
that constraint, and so in some sense is a replace (the  job_null 
actually takes the parent smcb pointer).  I think you're  probably right 
that its not a big difference either way, its just  cleaner in my head 
to only have state machines posting jobs for  themselves.


I see what you are saying.  I guess it depends on how you look at it.  I 
had kind of started thinking of the jobs as a signalling mechanism since 
they are the construct that signals as state machine to make its next 
transition.  The job_null() approach just makes it so that a child state 
machine is what triggers this particular signal, rather than a 
bmi/trove/dev/req_sched/flow component.  I know this is a change in the 
model and adds a dependency that 

Re: [Pvfs2-developers] terminating state machines

2006-07-27 Thread Walter B. Ligon III



Phil Carns wrote:
I think I'm getting voted down here, so I should probably just  
shutup, but I don't think in practice we're going to have that many  
child state machines that iterating through the list is at all  
costly.  I'm arguing for simpler mechanisms that fit in with the job  
subsystem over something more fancy and possibly slightly better  
performing.



Well, as far as the number of SMs goes, I would rather not risk it.  I 
still hope this is lightweight enough that we could eventually use it in 
more places that would generate a lot of children (like a re-architected 
sys-io implementation), though I don't know if that will pan out in 
practice.  I got bitten by a similar assumption in the flow protocol- it 
used to track all of its posted operations for testing rather than 
relying on someone to notify it of completion.  Admittedly the flow 
protocol is a more obvious case and I should have known better, but at 
the time it seemed reasonable :)


I think that the way that you describe would work fine too, but it  
would require a little more active work to check the status of the  
array of child SMs and would require more code to keep track of them.



Probably a bit more code yes, but it seems cleaner than keeping  
around backpointers and checking for parents.  Instead of driving all  
state machines from one place, this event notification scheme  
essentially replaces the last child state machine with the parent,  
which seems like a bit of hack and harder to debug.



I think I'm lost now.  What do you mean by replace?  The states are 
still isolated, jobs trigger the transitions, only one state action gets 
executed at a time, there still may be a time gap between completion of 
any given child and when the parent picks up processing again, and there 
are still frames.  I think both approaches will look the same when 
running unless I missed something.  If Walt puts a longjmp() in there we 
can both hit him over the head.


What? What?  How else would I do it?  ;-)



I think having a pointer to the parent actually improves debugability 
(though I'm not sure this approach actually requires it, all you really 
need is either a job descriptor or a pointer to a counter).  If I have a 
state machine that does something bad or gets stuck it would be nice to 
be able to work backwards to find out who invoked it, without having to 
search for it in a seperate data structure.


I don't mean to keep struggling with this issue- I honestly think that 
both approaches are pretty good, and if Walt implements it the way I 
think he is going to, then 95% of developers won't notice the difference 
anyway.  At this point I am mostly hammering away to make sure I am not 
missing a larger issue...


-Phil


--
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University
___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


Re: [Pvfs2-developers] terminating state machines

2006-07-27 Thread Walter B. Ligon III



Sam Lang wrote:


On Jul 26, 2006, at 6:16 PM, Phil Carns wrote:

I think I'm getting voted down here, so I should probably just   
shutup, but I don't think in practice we're going to have that  many  
child state machines that iterating through the list is at  all  
costly.  I'm arguing for simpler mechanisms that fit in with  the 
job  subsystem over something more fancy and possibly slightly  
better  performing.



Well, as far as the number of SMs goes, I would rather not risk  it.  
I still hope this is lightweight enough that we could  eventually use 
it in more places that would generate a lot of  children (like a 
re-architected sys-io implementation), though I  don't know if that 
will pan out in practice.  I got bitten by a  similar assumption in 
the flow protocol- it used to track all of  its posted operations for 
testing rather than relying on someone to  notify it of completion.  
Admittedly the flow protocol is a more  obvious case and I should have 
known better, but at the time it  seemed reasonable :)




Hmm...I had been thinking about a flow implementation that used the  new 
concurrent state machine code...it sounds like that's a bad idea  
because the testing and restarting would take too long to switch  
between bmi and trove?  We use the post/test model through pvfs2  
though, so maybe I don't understand the issue.


I think that the way that you describe would work fine too, but  it  
would require a little more active work to check the status  of the  
array of child SMs and would require more code to keep  track of them.



Probably a bit more code yes, but it seems cleaner than keeping   
around backpointers and checking for parents.  Instead of driving  
all  state machines from one place, this event notification  scheme  
essentially replaces the last child state machine with the  parent,  
which seems like a bit of hack and harder to debug.



I think I'm lost now.  What do you mean by replace?  The states are  
still isolated, jobs trigger the transitions, only one state action  
gets executed at a time, there still may be a time gap between  
completion of any given child and when the parent picks up  processing 
again, and there are still frames.  I think both  approaches will look 
the same when running unless I missed  something.  If Walt puts a 
longjmp() in there we can both hit him  over the head.



Heh.  Don't give him ideas! ;-)

I was operating under the constraint that a state machine can only  post 
a job for itself.  If I understand the current plan correctly,  using 
job_null in the child state machine to post a job for the  parent breaks 
that constraint, and so in some sense is a replace (the  job_null 
actually takes the parent smcb pointer).  I think you're  probably right 
that its not a big difference either way, its just  cleaner in my head 
to only have state machines posting jobs for  themselves.


I think having a pointer to the parent actually improves  debugability 
(though I'm not sure this approach actually requires  it, all you 
really need is either a job descriptor or a pointer to  a counter).  
If I have a state machine that does something bad or  gets stuck it 
would be nice to be able to work backwards to find  out who invoked 
it, without having to search for it in a seperate  data structure.


I don't mean to keep struggling with this issue- I honestly think  
that both approaches are pretty good, and if Walt implements it the  
way I think he is going to, then 95% of developers won't notice the  
difference anyway.  At this point I am mostly hammering away to  make 
sure I am not missing a larger issue...



Walt probably got more discussion than he bargained for, but at the  
least, lively discussion keeps me awake in the afternoon ;-).


-sam



-Phil



Good discussion.  Phil has convinced me the level of dependency is low, 
and unless I completely misunderstand Sam, the complexity of the parent 
pointer/job_null approach is a lot less than the alternative, and I like 
low complexity.  I also think debugging will be simpler.  So that's 
where I'm going.


I'll hae to think of other topics to get you guys going form time to 
time!  ;-)


Now off to figure out a way to use setjmp/longjmp in my implementation!

Walt
--
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University
___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


Re: [Pvfs2-developers] terminating state machines

2006-07-27 Thread Phil Carns


Thanks for the detailed explanation Phil.  I hadn't thought about the  
context switches that might slow down flow.  I was primarily thinking  
of something that would be cleaner, and easier to modify and test for  
different scenarios.  If at some point I get around to playing with a  
flow impl that uses the concurrent state machine framework, I'll open  
up the discussion again to avoid any of the pitfalls you described.


Cleaner and easier to modify would be great!

I just remembered that there are a couple of test programs in the tree 
to look at the thread context switch overhead, in case they are helpful 
to figure out if it is still a concern:


pvfs2/test/io/job/thread-bench2.c
pvfs2/test/io/job/thread-bench3.c

One of those just goes through a bunch of iterations relaying a 
condition across threads to see how long it takes.  The second one does 
the same thing, except with 2 relays instead of one (to mimic the trove 
side of things).  I haven't run these on a decent machine in years.  I 
will also add a disclaimer that the test programs are old and quite 
possibly wrong :)


We also have the benefit of your small-io optimization now too, so it 
isn't quite as critical as it used to be for the flow to be able to keep 
the latency down on small transfers any more...


-Phil
___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


[Pvfs2-developers] terminating state machines

2006-07-26 Thread Walter B. Ligon III


OK, guys, I have another issue I want input on.  When child SMs 
terminate they have to notify their parent.  The parent has to wait for 
all the children to terminate.  So I've been thinking to use the job 
subsystem for this: the parent would post a job to wait for N children,

and each child would post a job, the last one releasing the parent.

Now I see two ways to implement this - one is to implement this directly 
in the state machine code.  The parent simply stops running (because it 
does not schedule a job yet returns DEFERRED).  Each child decrements a 
counter, and when it hits 0 the parent is restarted.  This is a little 
ugly because the waiting parent is not being held on any list or queue 
(up to now all waiting SMs are in the job subsystem), also the last 
terminating child becomes the parent as it starts executing the parent 
code.  Things can get weird when one SM starts children that start 
children, and so on.


Now the other way to implement this is with the job subsystem as I 
suggested above.  Much cleaner except for one thing:  up to now the 
state machine subsystem has had no dependency at all on the job 
subsystem.  If we do it this way, this function only works with the job 
system intact.  I'd prefer not to do this, but it does seem the 
cleanest, most logical means.


Comments?

Walt
--
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University
___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


Re: [Pvfs2-developers] terminating state machines

2006-07-26 Thread Phil Carns

Walter B. Ligon III wrote:


OK, guys, I have another issue I want input on.  When child SMs 
terminate they have to notify their parent.  The parent has to wait for 
all the children to terminate.  So I've been thinking to use the job 
subsystem for this: the parent would post a job to wait for N children,

and each child would post a job, the last one releasing the parent.

Now I see two ways to implement this - one is to implement this directly 
in the state machine code.  The parent simply stops running (because it 
does not schedule a job yet returns DEFERRED).  Each child decrements a 
counter, and when it hits 0 the parent is restarted.  This is a little 
ugly because the waiting parent is not being held on any list or queue 
(up to now all waiting SMs are in the job subsystem), also the last 
terminating child becomes the parent as it starts executing the parent 
code.  Things can get weird when one SM starts children that start 
children, and so on.


Now the other way to implement this is with the job subsystem as I 
suggested above.  Much cleaner except for one thing:  up to now the 
state machine subsystem has had no dependency at all on the job 
subsystem.  If we do it this way, this function only works with the job 
system intact.  I'd prefer not to do this, but it does seem the 
cleanest, most logical means.


I like the job approach.  I guess this is an extra dependency because 
the sms would be calling these particular job functions implicitly, 
rather than relying on the state functions to handle those posts and 
releases?  We definitely haven't done that before, but at least in this 
case the job function that the sm infrastructure would be depending on 
is the simplest one in the arsenal :)  It shouldn't be hard for someone 
to reimplement that particular functionality if they wanted to use the 
state machine mechanism in another project.


If you weren't planning on these job calls to be implicit, then I'm not 
sure where the extra dependency is- we already use jobs to trigger all 
of the other normal transitions.


This reminded me of a question, though- is there going to be a standard 
mechanism for the children to report each of their independent error 
codes to the parent sm?  Or do the children need to just keep a 
reference to the parent sm structure and manually fill in an array or 
something?


I guess I have a broader question of how data that the children generate 
(like a handle value or an attr structure) gets transferred to the 
parent.  Does the parent copy this stuff from the child after the child 
finishes, or does the child copy it to the parent before it exits?I 
think we talked about this before at some point but I forgot what the 
plan is.  It would be nice if we made the developer define macros or 
something to dictate what the input parameters need to be filled in when 
invoking a child and what output parameters can be retrieved when it 
finishes.  Otherwise it starts getting tricky to remember what fields 
need to be set in the sm structure before kicking something off.


-Phil

-Phil
___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


Re: [Pvfs2-developers] terminating state machines

2006-07-26 Thread Sam Lang


On Jul 26, 2006, at 12:37 PM, Walter B. Ligon III wrote:



OK, guys, I have another issue I want input on.  When child SMs  
terminate they have to notify their parent.  The parent has to wait  
for all the children to terminate.  So I've been thinking to use  
the job subsystem for this: the parent would post a job to wait for  
N children,

and each child would post a job, the last one releasing the parent.

Now I see two ways to implement this - one is to implement this  
directly in the state machine code.  The parent simply stops  
running (because it does not schedule a job yet returns DEFERRED).   
Each child decrements a counter, and when it hits 0 the parent is  
restarted.  This is a little ugly because the waiting parent is not  
being held on any list or queue (up to now all waiting SMs are in  
the job subsystem), also the last terminating child becomes the  
parent as it starts executing the parent code.  Things can get  
weird when one SM starts children that start children, and so on.


Now the other way to implement this is with the job subsystem as I  
suggested above.  Much cleaner except for one thing:  up to now the  
state machine subsystem has had no dependency at all on the job  
subsystem.  If we do it this way, this function only works with the  
job system intact.  I'd prefer not to do this, but it does seem the  
cleanest, most logical means.




I don't see why the two have to be dependent for this to work.  Do  
you mean by the parent posting a job, the state machine stepping code  
would handling the actual posting?  I was assuming that the parent  
state action could just call job_concurrent_sm_post (or whatever its  
called).


Could it be similar to the request scheduler job posting code?  The  
parent state action could call job_concurrent_sm_post with an array  
of the child sms, which just calls sm_post and adds the parent sm and  
its array to an operation queue.  Then a job_concurrent_sm_test  
function could test for completion of a parent sm by looking at all  
the sms in the array to see if they completed.  The job_testcontext  
code would have to be modified of course (maybe rework the  
do_one_test_cycle_req_sched function to also test parent sm jobs),  
but all of that still seems to be independent of the state machine  
code (i.e. someone could use the state machine code separately and  
drive state machines using something other than the job framework).   
I don't know if all that makes sense in the context of the changes  
you've made, but that's what I had in mind when I suggested posting a  
job for the parent.


-sam


Comments?

Walt
--
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University
___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers



___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


Re: [Pvfs2-developers] terminating state machines

2006-07-26 Thread Walter B. Ligon III



Phil Carns wrote:

Walter B. Ligon III wrote:



OK, guys, I have another issue I want input on.  When child SMs 
terminate they have to notify their parent.  The parent has to wait 
for all the children to terminate.  So I've been thinking to use the 
job subsystem for this: the parent would post a job to wait for N 
children,

and each child would post a job, the last one releasing the parent.

Now I see two ways to implement this - one is to implement this 
directly in the state machine code.  The parent simply stops running 
(because it does not schedule a job yet returns DEFERRED).  Each child 
decrements a counter, and when it hits 0 the parent is restarted.  
This is a little ugly because the waiting parent is not being held on 
any list or queue (up to now all waiting SMs are in the job 
subsystem), also the last terminating child becomes the parent as it 
starts executing the parent code.  Things can get weird when one SM 
starts children that start children, and so on.


Now the other way to implement this is with the job subsystem as I 
suggested above.  Much cleaner except for one thing:  up to now the 
state machine subsystem has had no dependency at all on the job 
subsystem.  If we do it this way, this function only works with the 
job system intact.  I'd prefer not to do this, but it does seem the 
cleanest, most logical means.



I like the job approach.  I guess this is an extra dependency because 
the sms would be calling these particular job functions implicitly, 
rather than relying on the state functions to handle those posts and 
releases?  We definitely haven't done that before, but at least in this 
case the job function that the sm infrastructure would be depending on 
is the simplest one in the arsenal :)  It shouldn't be hard for someone 
to reimplement that particular functionality if they wanted to use the 
state machine mechanism in another project.


If you weren't planning on these job calls to be implicit, then I'm not 
sure where the extra dependency is- we already use jobs to trigger all 
of the other normal transitions.


This reminded me of a question, though- is there going to be a standard 
mechanism for the children to report each of their independent error 
codes to the parent sm?  Or do the children need to just keep a 
reference to the parent sm structure and manually fill in an array or 
something?


I guess I have a broader question of how data that the children generate 
(like a handle value or an attr structure) gets transferred to the 
parent.  Does the parent copy this stuff from the child after the child 
finishes, or does the child copy it to the parent before it exits?I 
think we talked about this before at some point but I forgot what the 
plan is.  It would be nice if we made the developer define macros or 
something to dictate what the input parameters need to be filled in when 
invoking a child and what output parameters can be retrieved when it 
finishes.  Otherwise it starts getting tricky to remember what fields 
need to be set in the sm structure before kicking something off.




Phil, first your questions:  The parent will push a frame onto a stack 
for each child it is starting.  A frame is everything that used to be in 
either a s_op or sm_p on the server or client, except for the stuff that 
actually runs the SM (now in an smcb).  The parent can pass in anything 
it wants by filling in the fields appropriately.  When each child runs 
that struct will appear to be its current frame.  Each child can leave 
that frame in any condition it wants, with any values of buffers the 
child wants to leave for the parent.  After the children are done the 
parent can pop each frame off the stack and do what it wants with it. 
Thus there is plenty of flexibility on how you want to handle passing 
things in and out, all under control of the server or client code.


As for providing macros for setting up and tearing down frames, we can 
certainly do that.  I'm not sure hoe much that really helps, but we can 
do it.


Now, an implementation question - one approach to this job/counter thing 
is to have two job calls, one for the parent, and one of the children. 
Another approach is for the parent to simple set a counter and not call 
anything.  The children come along, decrement the count, and if zero, 
call job_null() to awaken the parent.  Requires no modification in the 
job layer, minimizes dependency.  What do you think?  Should the job 
layer have more of a roll, or keep it minimum?


Walt

--
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University
___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


Re: [Pvfs2-developers] terminating state machines

2006-07-26 Thread Sam Lang


On Jul 26, 2006, at 3:41 PM, Walter B. Ligon III wrote:

Yeah, the idea is that the SM code would call the job function.  
Depending on the state actions to do it seems like asking for  
trouble, all the details that have to be kept up with.


Actually, there are already job structs used by the SM code, now  
I've had to add a context id to the smcb and there will be job  
calls.  I think you are right though, the amount of dependency is  
pretty small.


As for the job funcs I think I'd need one new one to post the  
parent job, establishing a counter.  The child job would look up  
the counter, decrement, and if zero, call job_null to relaunch the  
parent, or just

replicate what job_null does, whatever seem the easiest.



I would rather see the parent get relaunched by the normal job test  
code by putting itself in the job completion queue once its  
finished.  This could happen in a job_sm_test call like I suggested  
in my previous email.  Also, instead of a counter that a test  
function would check, and the child state machines would have to  
decrement, I'd prefer the parent job keep an array of child state  
machines (it does this anyway, no?) and check each element in the  
array for completion of the state machine.  That way the children  
aren't competing to lock the same state to notify of completion, the  
parent just checks each one.


The implicit call is the child's call when it terminates.  The  
parent's call could be implicit too, or done by the state action.


Doesn't this require child state machines to only function in the  
child state machine context?  I'd prefer to just have generic state  
machines that can be used as a child state machine or as a top-level  
state machine.




As of this moment we really haven't taken any pains to keep the SM  
independent from the job system, in fact you have to have the job  
system to drive things, so in some sense its not really an issue.


I vote for making the interfaces as separate as possible.  If someone  
else wants to use the state machine code somewhere else, it would be  
nice to allow them to take it as-is (mpich2 guys were talking about  
using it, but I think they ended up doing something else).  Also,  
independent layers make testing and debugging easier in my view.


In the current code, the sm_p is passed through to the job descriptor  
as a void*, and we just cast back to a sm_p in the while loop that  
does the job_testcontext and then drives the state machines again.   
The use of job_status does bring in the job code into the state  
machine code,  but it seems like mostly only the error_code field is  
used within the state actions, and the rest of that structure could  
be independent of the state machine code.


-sam



Any more commends?  (Sam I hope this address some of yours)

Walt

Phil Carns wrote:

Walter B. Ligon III wrote:


OK, guys, I have another issue I want input on.  When child SMs  
terminate they have to notify their parent.  The parent has to  
wait for all the children to terminate.  So I've been thinking to  
use the job subsystem for this: the parent would post a job to  
wait for N children,

and each child would post a job, the last one releasing the parent.

Now I see two ways to implement this - one is to implement this  
directly in the state machine code.  The parent simply stops  
running (because it does not schedule a job yet returns  
DEFERRED).  Each child decrements a counter, and when it hits 0  
the parent is restarted.  This is a little ugly because the  
waiting parent is not being held on any list or queue (up to now  
all waiting SMs are in the job subsystem), also the last  
terminating child becomes the parent as it starts executing the  
parent code.  Things can get weird when one SM starts children  
that start children, and so on.


Now the other way to implement this is with the job subsystem as  
I suggested above.  Much cleaner except for one thing:  up to now  
the state machine subsystem has had no dependency at all on the  
job subsystem.  If we do it this way, this function only works  
with the job system intact.  I'd prefer not to do this, but it  
does seem the cleanest, most logical means.
I like the job approach.  I guess this is an extra dependency  
because the sms would be calling these particular job functions  
implicitly, rather than relying on the state functions to handle  
those posts and releases?  We definitely haven't done that before,  
but at least in this case the job function that the sm  
infrastructure would be depending on is the simplest one in the  
arsenal :)  It shouldn't be hard for someone to reimplement that  
particular functionality if they wanted to use the state machine  
mechanism in another project.
If you weren't planning on these job calls to be implicit, then  
I'm not sure where the extra dependency is- we already use jobs to  
trigger all of the other normal transitions.
This reminded me of a question, though- is there 

Re: [Pvfs2-developers] terminating state machines

2006-07-26 Thread Phil Carns


Phil, first your questions:  The parent will push a frame onto a stack 
for each child it is starting.  A frame is everything that used to be in 
either a s_op or sm_p on the server or client, except for the stuff that 
actually runs the SM (now in an smcb).  The parent can pass in anything 
it wants by filling in the fields appropriately.  When each child runs 
that struct will appear to be its current frame.  Each child can leave 
that frame in any condition it wants, with any values of buffers the 
child wants to leave for the parent.  After the children are done the 
parent can pop each frame off the stack and do what it wants with it. 
Thus there is plenty of flexibility on how you want to handle passing 
things in and out, all under control of the server or client code.


Sounds great.

As for providing macros for setting up and tearing down frames, we can 
certainly do that.  I'm not sure hoe much that really helps, but we can 
do it.


I think it would be nice to help prevent programmer error.  The same 
thing was done with the protocol request structures (see all the 
PINT_SERVREQ_*_FILL macros used in the client sms).  If you have a 
macro, then neglecting to pass in one of the required input fields 
results in a compiler error.  Otherwise the compiler can't help to tell 
you if you have set all of the frame fields that you were supposed to 
set.  There is no technical advantage, it just makes setting the fields 
a little more foolproof.


Same goes for the output of a frame after completion, although I'm not 
sure what the macro would look like there, or if it is possible. 
Probably a given frame will have several fields - some are input, some 
are output, some are scratch area for the state functions, etc.  Someone 
coming along later trying to reuse the SM may not know (without some 
tricky code digging) which fields are the output fields that it can 
count on to be correctly filled in after completion.   For example, 
maybe there is a field called parent_handle in there- is it filled in? 
 If so, is it guaranteed to be filled in, or did I just happen to get 
it this time because of the steps path the sm took?   I don't know what 
the best way is to make this explicit, maybe some kind of macro, maybe 
putting a special prefix on the names of the output fields, any other 
ideas?  Maybe we just use comments :)


Now, an implementation question - one approach to this job/counter thing 
is to have two job calls, one for the parent, and one of the children. 
Another approach is for the parent to simple set a counter and not call 
anything.  The children come along, decrement the count, and if zero, 
call job_null() to awaken the parent.  Requires no modification in the 
job layer, minimizes dependency.  What do you think?  Should the job 
layer have more of a roll, or keep it minimum?


Not a big deal to me either way. Especially if all of these calls are 
implicit in the state processing code - no one is really going to see 
them normally anyway.


-Phil
___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


Re: [Pvfs2-developers] terminating state machines

2006-07-26 Thread Phil Carns




I don't see why the two have to be dependent for this to work.  Do  you 
mean by the parent posting a job, the state machine stepping code  would 
handling the actual posting?  I was assuming that the parent  state 
action could just call job_concurrent_sm_post (or whatever its  called).


Could it be similar to the request scheduler job posting code?  The  
parent state action could call job_concurrent_sm_post with an array  of 
the child sms, which just calls sm_post and adds the parent sm and  its 
array to an operation queue.  Then a job_concurrent_sm_test  function 
could test for completion of a parent sm by looking at all  the sms in 
the array to see if they completed.  The job_testcontext  code would 
have to be modified of course (maybe rework the  
do_one_test_cycle_req_sched function to also test parent sm jobs),  but 
all of that still seems to be independent of the state machine  code 
(i.e. someone could use the state machine code separately and  drive 
state machines using something other than the job framework).   I don't 
know if all that makes sense in the context of the changes  you've made, 
but that's what I had in mind when I suggested posting a  job for the 
parent.


I think I follow what you are describing, but I am not entirely sure. 
If so, I think there is one advantage to the approach that Walt has been 
hashing out thus far.  I think that what Walt is describing is 
event-driven, in a sense.  No one has to actively look to see if all of 
the children have finished.  Instead, the children each send 
notification (by calling a release function or manually decrementing a 
counter) in their completion function, with the parent eventually 
getting a single notification (representing all of the children) through 
the existing job completion queue mechanism.


I think that the way that you describe would work fine too, but it would 
require a little more active work to check the status of the array of 
child SMs and would require more code to keep track of them.


I think you are right though, that you could pull off your version 
without the the children actually having to make a job_* call.


-Phil
___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


Re: [Pvfs2-developers] terminating state machines

2006-07-26 Thread Walter B. Ligon III



Phil Carns wrote:


Phil, first your questions:  The parent will push a frame onto a 
stack for each child it is starting.  A frame is everything that used 
to be in either a s_op or sm_p on the server or client, except for the 
stuff that actually runs the SM (now in an smcb).  The parent can pass 
in anything it wants by filling in the fields appropriately.  When 
each child runs that struct will appear to be its current frame.  
Each child can leave that frame in any condition it wants, with any 
values of buffers the child wants to leave for the parent.  After the 
children are done the parent can pop each frame off the stack and do 
what it wants with it. Thus there is plenty of flexibility on how you 
want to handle passing things in and out, all under control of the 
server or client code.



Sounds great.

As for providing macros for setting up and tearing down frames, we can 
certainly do that.  I'm not sure hoe much that really helps, but we 
can do it.



I think it would be nice to help prevent programmer error.  The same 
thing was done with the protocol request structures (see all the 
PINT_SERVREQ_*_FILL macros used in the client sms).  If you have a 
macro, then neglecting to pass in one of the required input fields 
results in a compiler error.  Otherwise the compiler can't help to tell 
you if you have set all of the frame fields that you were supposed to 
set.  There is no technical advantage, it just makes setting the fields 
a little more foolproof.


Same goes for the output of a frame after completion, although I'm not 
sure what the macro would look like there, or if it is possible. 
Probably a given frame will have several fields - some are input, some 
are output, some are scratch area for the state functions, etc.  Someone 
coming along later trying to reuse the SM may not know (without some 
tricky code digging) which fields are the output fields that it can 
count on to be correctly filled in after completion.   For example, 
maybe there is a field called parent_handle in there- is it filled in? 
 If so, is it guaranteed to be filled in, or did I just happen to get it 
this time because of the steps path the sm took?   I don't know what the 
best way is to make this explicit, maybe some kind of macro, maybe 
putting a special prefix on the names of the output fields, any other 
ideas?  Maybe we just use comments :)


OK, I see what you mean.  I think that's kind of a syntax level thing - 
IOW I think if affects the underlying mechanism.  So yeah, we should 
have that and we'll work on that once the mechanism works.




Now, an implementation question - one approach to this job/counter 
thing is to have two job calls, one for the parent, and one of the 
children. Another approach is for the parent to simple set a counter 
and not call anything.  The children come along, decrement the count, 
and if zero, call job_null() to awaken the parent.  Requires no 
modification in the job layer, minimizes dependency.  What do you 
think?  Should the job layer have more of a roll, or keep it minimum?



Not a big deal to me either way. Especially if all of these calls are 
implicit in the state processing code - no one is really going to see 
them normally anyway.


OK, I think everyone has weighed in on this, and I think I'll use the 
minmal method.  The only real diff is Sam's preference not to use a 
counter.  We can go around on that, but I'm leaning towards a counter, 
at least for the initial implementation.


-Phil


--
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University
___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


Re: [Pvfs2-developers] terminating state machines

2006-07-26 Thread Phil Carns

Sam Lang wrote:


On Jul 26, 2006, at 3:41 PM, Walter B. Ligon III wrote:

Yeah, the idea is that the SM code would call the job function.  
Depending on the state actions to do it seems like asking for  
trouble, all the details that have to be kept up with.


Actually, there are already job structs used by the SM code, now  I've 
had to add a context id to the smcb and there will be job  calls.  I 
think you are right though, the amount of dependency is  pretty small.


As for the job funcs I think I'd need one new one to post the  parent 
job, establishing a counter.  The child job would look up  the 
counter, decrement, and if zero, call job_null to relaunch the  
parent, or just

replicate what job_null does, whatever seem the easiest.


I would rather see the parent get relaunched by the normal job test  
code by putting itself in the job completion queue once its  finished.  
This could happen in a job_sm_test call like I suggested  in my previous 
email.  Also, instead of a counter that a test  function would check, 
and the child state machines would have to  decrement, I'd prefer the 
parent job keep an array of child state  machines (it does this anyway, 
no?) and check each element in the  array for completion of the state 
machine.  That way the children  aren't competing to lock the same state 
to notify of completion, the  parent just checks each one.


There doesn't need to be any locking- the main server thread only 
executes one state function or one transition at a time.  The counter 
also doesn't need to be visible- it could be hidden inside the job call, 
which could lock or not lock as it sees fit.


The parent also couldn't be the one checking the elements in an array 
like that - it would have to be done from within the job code somewhere 
(which I think you described in your previous email).  That means that 
somewhere in the job code (or request scheduler, etc.) something will 
have to do the following on every job_testcontext() call:


for each active sm
for each child within that sm
check state

Which could get expensive depending on how extensively we use the 
child/parallel sm model.




The implicit call is the child's call when it terminates.  The  
parent's call could be implicit too, or done by the state action.


Doesn't this require child state machines to only function in the  child 
state machine context?  I'd prefer to just have generic state  machines 
that can be used as a child state machine or as a top-level  state machine.


I would prefer that too :)  Is this going to work Walt?  It would be 
nice if the state machine processing code handled transparently 
triggering different termination functions depending on whether it was a 
top level sm or not without the state functions themselves knowing any 
better.


As of this moment we really haven't taken any pains to keep the SM  
independent from the job system, in fact you have to have the job  
system to drive things, so in some sense its not really an issue.



I vote for making the interfaces as separate as possible.  If someone  
else wants to use the state machine code somewhere else, it would be  
nice to allow them to take it as-is (mpich2 guys were talking about  
using it, but I think they ended up doing something else).  Also,  
independent layers make testing and debugging easier in my view.


In the current code, the sm_p is passed through to the job descriptor  
as a void*, and we just cast back to a sm_p in the while loop that  does 
the job_testcontext and then drives the state machines again.   The use 
of job_status does bring in the job code into the state  machine code,  
but it seems like mostly only the error_code field is  used within the 
state actions, and the rest of that structure could  be independent of 
the state machine code.


-sam



Any more commends?  (Sam I hope this address some of yours)

Walt

Phil Carns wrote:


Walter B. Ligon III wrote:



OK, guys, I have another issue I want input on.  When child SMs  
terminate they have to notify their parent.  The parent has to  wait 
for all the children to terminate.  So I've been thinking to  use 
the job subsystem for this: the parent would post a job to  wait for 
N children,

and each child would post a job, the last one releasing the parent.

Now I see two ways to implement this - one is to implement this  
directly in the state machine code.  The parent simply stops  
running (because it does not schedule a job yet returns  DEFERRED).  
Each child decrements a counter, and when it hits 0  the parent is 
restarted.  This is a little ugly because the  waiting parent is not 
being held on any list or queue (up to now  all waiting SMs are in 
the job subsystem), also the last  terminating child becomes the 
parent as it starts executing the  parent code.  Things can get 
weird when one SM starts children  that start children, and so on.


Now the other way to implement this is with the job subsystem as  I 

Re: [Pvfs2-developers] terminating state machines

2006-07-26 Thread Sam Lang


On Jul 26, 2006, at 5:06 PM, Phil Carns wrote:



I don't see why the two have to be dependent for this to work.   
Do  you mean by the parent posting a job, the state machine  
stepping code  would handling the actual posting?  I was assuming  
that the parent  state action could just call  
job_concurrent_sm_post (or whatever its  called).
Could it be similar to the request scheduler job posting code?   
The  parent state action could call job_concurrent_sm_post with an  
array  of the child sms, which just calls sm_post and adds the  
parent sm and  its array to an operation queue.  Then a  
job_concurrent_sm_test  function could test for completion of a  
parent sm by looking at all  the sms in the array to see if they  
completed.  The job_testcontext  code would have to be modified of  
course (maybe rework the  do_one_test_cycle_req_sched function to  
also test parent sm jobs),  but all of that still seems to be  
independent of the state machine  code (i.e. someone could use the  
state machine code separately and  drive state machines using  
something other than the job framework).   I don't know if all  
that makes sense in the context of the changes  you've made, but  
that's what I had in mind when I suggested posting a  job for the  
parent.


I think I follow what you are describing, but I am not entirely  
sure. If so, I think there is one advantage to the approach that  
Walt has been hashing out thus far.  I think that what Walt is  
describing is event-driven, in a sense.  No one has to actively  
look to see if all of the children have finished.  Instead, the  
children each send notification (by calling a release function or  
manually decrementing a counter) in their completion function, with  
the parent eventually getting a single notification (representing  
all of the children) through the existing job completion queue  
mechanism.


I think I'm getting voted down here, so I should probably just  
shutup, but I don't think in practice we're going to have that many  
child state machines that iterating through the list is at all  
costly.  I'm arguing for simpler mechanisms that fit in with the job  
subsystem over something more fancy and possibly slightly better  
performing.




I think that the way that you describe would work fine too, but it  
would require a little more active work to check the status of the  
array of child SMs and would require more code to keep track of them.


Probably a bit more code yes, but it seems cleaner than keeping  
around backpointers and checking for parents.  Instead of driving all  
state machines from one place, this event notification scheme  
essentially replaces the last child state machine with the parent,  
which seems like a bit of hack and harder to debug.


-sam



I think you are right though, that you could pull off your version  
without the the children actually having to make a job_* call.


-Phil



___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


Re: [Pvfs2-developers] terminating state machines

2006-07-26 Thread Sam Lang


On Jul 26, 2006, at 6:16 PM, Phil Carns wrote:

I think I'm getting voted down here, so I should probably just   
shutup, but I don't think in practice we're going to have that  
many  child state machines that iterating through the list is at  
all  costly.  I'm arguing for simpler mechanisms that fit in with  
the job  subsystem over something more fancy and possibly slightly  
better  performing.


Well, as far as the number of SMs goes, I would rather not risk  
it.  I still hope this is lightweight enough that we could  
eventually use it in more places that would generate a lot of  
children (like a re-architected sys-io implementation), though I  
don't know if that will pan out in practice.  I got bitten by a  
similar assumption in the flow protocol- it used to track all of  
its posted operations for testing rather than relying on someone to  
notify it of completion.  Admittedly the flow protocol is a more  
obvious case and I should have known better, but at the time it  
seemed reasonable :)




Hmm...I had been thinking about a flow implementation that used the  
new concurrent state machine code...it sounds like that's a bad idea  
because the testing and restarting would take too long to switch  
between bmi and trove?  We use the post/test model through pvfs2  
though, so maybe I don't understand the issue.


I think that the way that you describe would work fine too, but  
it  would require a little more active work to check the status  
of the  array of child SMs and would require more code to keep  
track of them.


Probably a bit more code yes, but it seems cleaner than keeping   
around backpointers and checking for parents.  Instead of driving  
all  state machines from one place, this event notification  
scheme  essentially replaces the last child state machine with the  
parent,  which seems like a bit of hack and harder to debug.


I think I'm lost now.  What do you mean by replace?  The states are  
still isolated, jobs trigger the transitions, only one state action  
gets executed at a time, there still may be a time gap between  
completion of any given child and when the parent picks up  
processing again, and there are still frames.  I think both  
approaches will look the same when running unless I missed  
something.  If Walt puts a longjmp() in there we can both hit him  
over the head.



Heh.  Don't give him ideas! ;-)

I was operating under the constraint that a state machine can only  
post a job for itself.  If I understand the current plan correctly,  
using job_null in the child state machine to post a job for the  
parent breaks that constraint, and so in some sense is a replace (the  
job_null actually takes the parent smcb pointer).  I think you're  
probably right that its not a big difference either way, its just  
cleaner in my head to only have state machines posting jobs for  
themselves.


I think having a pointer to the parent actually improves  
debugability (though I'm not sure this approach actually requires  
it, all you really need is either a job descriptor or a pointer to  
a counter).  If I have a state machine that does something bad or  
gets stuck it would be nice to be able to work backwards to find  
out who invoked it, without having to search for it in a seperate  
data structure.


I don't mean to keep struggling with this issue- I honestly think  
that both approaches are pretty good, and if Walt implements it the  
way I think he is going to, then 95% of developers won't notice the  
difference anyway.  At this point I am mostly hammering away to  
make sure I am not missing a larger issue...


Walt probably got more discussion than he bargained for, but at the  
least, lively discussion keeps me awake in the afternoon ;-).


-sam



-Phil



___
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers