On Jan 6, 2009, at 9:40 PM, Rob Ross <rr...@mcs.anl.gov> wrote:
the fact that zoidfs is blocking is irrelevant to how the server
implements servicing the calls. -- rob
The server implements servicing the calls by calling zoidfs functions,
which in turn call PVFS..
-sam
On Jan 6, 2009, at 9:15 PM, Sam Lang wrote:
On Jan 6, 2009, at 7:51 PM, Rob Ross <rr...@mcs.anl.gov> wrote:
Hi Sam,
My take on your email was that you were combining the two issues,
so I wanted to make sure that we were in agreement that the
alternative API was preferred (not that I think we should
necessarily do anything about it at the moment). I'm glad we are
in agreement.
The terms "scheduling" and "priority" are being tossed around here
in a way that I don't think is appropriate. The current
textcontext does neither prioritization nor scheduling, and
neither would the proposed modified API (as described thus far).
The current BMI behavior is more like a bug than anything else,
although changing the behavior at this point would require some
significant regression testing.
testcontext is setting a priority, I can only assume it's a desired
priority for our servers.. A separate thread in our server that
called testunexpected and fired off the state machines would be
fairly straightforward and prevent any starvation that might occur.
In other words, we want the behavior, I just disagree with the
notion that the behavior should be set by bmi tcp.
The API is an orthogonal issue.
The I/O forwarding system probably ought to use the non-blocking
PVFS calls so that it can better deal with this scenario anyway,
right?
zoidfs is a blocking API.
-sam
Rob
On Jan 6, 2009, at 5:54 PM, Sam Lang wrote:
On Jan 6, 2009, at 5:03 PM, Rob Ross wrote:
I think if we had this alternative design and one wanted to have
different priorities, one would look for messages under
different contexts as you say. But when you don't care about
priority, it would be nice to be able to get everything in one
call.
I think you're arguing for a single testcontext function, instead
of the testcontext/testunexpected split. I agree with that, but
Phil and I are arguing about something else. Where should
scheduling decisions be made? Within a BMI method, or by the API
consumer? I'm arguing for the latter. Changing the API to be
more consistent or user friendly doesn't affect where we choose
to set the priority.
-sam
Rob
On Jan 6, 2009, at 4:57 PM, Sam Lang wrote:
Changing the API as you describe would actually bring back the
original problem. As is, the BMI_tcp_testcontext call knows
that there are unexpected messages waiting, so it returns
immediately (expecting a call to testunexpected to follow).
This is a specific policy hard-coded in the tcp method.
With just a single testcontext call and all expected and
unexpected messages going to that context, the tcp code would
have to put all the unexpected messages at the top of the
context to give them priority. This would fix the particular
problem that Nawab has, but its still dictating policy (which
messages get priority) from within the particular BMI method.
I agree that forcing the application to define the policy (with
threads or timeouts) is moving the problem elsewhere, but its
moving the problem to where it belongs. Its our pvfs server
that wants unexpected messages to have priority, the bmi code
itself shouldn't dictate that priority. We could define
interfaces to BMI that allow the policy to be set, but that's
even further from where we are now.
-sam
On Jan 6, 2009, at 2:52 PM, Rob Ross wrote:
Yeah a special named context for unexpected message would be a
clean way to have done things... -- Rob
On Jan 6, 2009, at 2:49 PM, Phil Carns wrote:
Yeah, I don't particularly like adding special cases either.
I feel like making the consumer play with timeouts or use an
extra thread would be just as much of a hack/workaround,
though. Its just moving the problem elsewhere.
Fundamentally it seems more like a BMI API flaw. It would
have made more sense (for example) if unexpected messages
were assigned to a specific context and the testunexpected()
and testcontext() functions were combined. The consumer
could then use a single test call to retrieve both unexpected
and normal messages at once if they are in the same context
(as in the pvfs2-server use case). Testing on a different
context would ignore the presence of unexpected messages (as
in the problem triggering use case here).
There are other ways to deal with it, that's just an
example. We just need the API to better express the
intention of the caller (preferably in one function) so that
BMI doesn't have to optimize by guessing about what else is
going on.
That is more work than just adding a flag, though :) It
probably depends on if we think the use case is going to be
around long enough to justify tweaking the API.
-Phil
Sam Lang wrote:
I've committed the set_info fix for this. I'm not crazy
about it, but it should work for now. In the long term, we
should probably move away from method specific hacks like
this. I.e. it should be up to the API consumer (our server)
to adjust timeouts or call testunexpected in a separate
thread.
Nawab, in the zoidfs init code after initializing BMI you
need to call:
int check = 0;
BMI_set_info(0, BMI_TCP_CHECK_UNEXPECTED, &check);
-sam
On Dec 23, 2008, at 2:01 PM, Phil Carns wrote:
Sam Lang wrote:
Hi All,
I think Nawab has found a bug (or untested code path) in
the BMI tcp method. He's running a daemon that both
receives unexpected requests (as a server), and receives
expected responses (as a client).
In the BMI_testcontext call, if there aren't any completed
(expected) operations, and there are completed unexpected
receives, we return immediately, assuming that
BMI_testunexpected will be called in turn. I think the
idea here is that we want to keep our latency down for
unexpected messages, instead of doing work on expected
messages while unexpected messages are waiting in the
hopper. But the daemon is single threaded, and making
blocking PVFS_sys_* calls, so we essentially spin forever
calling BMI_testcontext over and over.
I'm not sure of the best way to fix this. Easy fixes
would be to remove the check for completed unexpected
receives, and/or do tcp_do_work for a shorter timeout.
It seems like we have a special case for blocking
PVFS_sys_* calls. We want to ignore unexpected receives
just in that case, and actually call tcp_do_work. In
other contexts, I think we want the behavior that we have
now, where we assume that a BMI_testunexpected call will
follow a BMI_testcontext call. We could modify the
testcontext call to take a separate parameter, but that
seems messy. We might also be able to handle this with
separate BMI contexts somehow...
I haven't dug in the code yet to see if I see any more
elegant way to handle it, but I wanted to mention that if
you want to add a special flag to toggle the behavior, it
might be better to just set it globally with the set_info()
function rather than modifying the testcontext() api. That
way you don't have to change any of the other BMI methods.
There are already a couple of similar set_info() calls to
toggle BMI behavior for different use cases.
-Phil
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers