Sam Lang wrote:
On Jul 26, 2006, at 6:16 PM, Phil Carns wrote:
I think I'm getting voted down here, so I should probably just
shutup, but I don't think in practice we're going to have that many
child state machines that iterating through the list is at all
costly. I'm arguing for simpler mechanisms that fit in with the
job subsystem over something more fancy and possibly slightly
better performing.
Well, as far as the number of SMs goes, I would rather not risk it.
I still hope this is lightweight enough that we could eventually use
it in more places that would generate a lot of children (like a
re-architected sys-io implementation), though I don't know if that
will pan out in practice. I got bitten by a similar assumption in
the flow protocol- it used to track all of its posted operations for
testing rather than relying on someone to notify it of completion.
Admittedly the flow protocol is a more obvious case and I should have
known better, but at the time it seemed reasonable :)
Hmm...I had been thinking about a flow implementation that used the new
concurrent state machine code...it sounds like that's a bad idea
because the testing and restarting would take too long to switch
between bmi and trove? We use the post/test model through pvfs2
though, so maybe I don't understand the issue.
I think that the way that you describe would work fine too, but it
would require a little more active work to check the status of the
array of child SMs and would require more code to keep track of them.
Probably a bit more code yes, but it seems cleaner than keeping
around backpointers and checking for parents. Instead of driving
all state machines from one place, this event notification scheme
essentially replaces the last child state machine with the parent,
which seems like a bit of hack and harder to debug.
I think I'm lost now. What do you mean by replace? The states are
still isolated, jobs trigger the transitions, only one state action
gets executed at a time, there still may be a time gap between
completion of any given child and when the parent picks up processing
again, and there are still frames. I think both approaches will look
the same when running unless I missed something. If Walt puts a
longjmp() in there we can both hit him over the head.
Heh. Don't give him ideas! ;-)
I was operating under the constraint that a state machine can only post
a job for itself. If I understand the current plan correctly, using
job_null in the child state machine to post a job for the parent breaks
that constraint, and so in some sense is a replace (the job_null
actually takes the parent smcb pointer). I think you're probably right
that its not a big difference either way, its just cleaner in my head
to only have state machines posting jobs for themselves.
I think having a pointer to the parent actually improves debugability
(though I'm not sure this approach actually requires it, all you
really need is either a job descriptor or a pointer to a counter).
If I have a state machine that does something bad or gets stuck it
would be nice to be able to work backwards to find out who invoked
it, without having to search for it in a seperate data structure.
I don't mean to keep struggling with this issue- I honestly think
that both approaches are pretty good, and if Walt implements it the
way I think he is going to, then 95% of developers won't notice the
difference anyway. At this point I am mostly hammering away to make
sure I am not missing a larger issue...
Walt probably got more discussion than he bargained for, but at the
least, lively discussion keeps me awake in the afternoon ;-).
-sam
-Phil
Good discussion. Phil has convinced me the level of dependency is low,
and unless I completely misunderstand Sam, the complexity of the parent
pointer/job_null approach is a lot less than the alternative, and I like
low complexity. I also think debugging will be simpler. So that's
where I'm going.
I'll hae to think of other topics to get you guys going form time to
time! ;-)
Now off to figure out a way to use setjmp/longjmp in my implementation!
Walt
--
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers