Re: [Pvfs2-developers] bmi testcontext/testunexpected

Sam Lang Tue, 06 Jan 2009 19:51:53 -0800


On Jan 6, 2009, at 9:40 PM, Rob Ross <rr...@mcs.anl.gov> wrote:

the fact that zoidfs is blocking is irrelevant to how the serverimplements servicing the calls. -- rob

The server implements servicing the calls by calling zoidfs functions,which in turn call PVFS..

-sam

On Jan 6, 2009, at 9:15 PM, Sam Lang wrote:
On Jan 6, 2009, at 7:51 PM, Rob Ross <rr...@mcs.anl.gov> wrote:
Hi Sam,
My take on your email was that you were combining the two issues,so I wanted to make sure that we were in agreement that thealternative API was preferred (not that I think we shouldnecessarily do anything about it at the moment). I'm glad we arein agreement.
The terms "scheduling" and "priority" are being tossed around herein a way that I don't think is appropriate. The currenttextcontext does neither prioritization nor scheduling, andneither would the proposed modified API (as described thus far).The current BMI behavior is more like a bug than anything else,although changing the behavior at this point would require somesignificant regression testing.
testcontext is setting a priority, I can only assume it's a desiredpriority for our servers.. A separate thread in our server thatcalled testunexpected and fired off the state machines would befairly straightforward and prevent any starvation that might occur.
In other words, we want the behavior, I just disagree with thenotion that the behavior should be set by bmi tcp.
The API is an orthogonal issue.
The I/O forwarding system probably ought to use the non-blockingPVFS calls so that it can better deal with this scenario anyway,right?
zoidfs is a blocking API.
-sam
Rob

On Jan 6, 2009, at 5:54 PM, Sam Lang wrote:
On Jan 6, 2009, at 5:03 PM, Rob Ross wrote:
I think if we had this alternative design and one wanted to havedifferent priorities, one would look for messages underdifferent contexts as you say. But when you don't care aboutpriority, it would be nice to be able to get everything in onecall.
I think you're arguing for a single testcontext function, insteadof the testcontext/testunexpected split. I agree with that, butPhil and I are arguing about something else. Where shouldscheduling decisions be made? Within a BMI method, or by the APIconsumer? I'm arguing for the latter. Changing the API to bemore consistent or user friendly doesn't affect where we chooseto set the priority.
-sam
Rob

On Jan 6, 2009, at 4:57 PM, Sam Lang wrote:
Changing the API as you describe would actually bring back theoriginal problem. As is, the BMI_tcp_testcontext call knowsthat there are unexpected messages waiting, so it returnsimmediately (expecting a call to testunexpected to follow).This is a specific policy hard-coded in the tcp method.
With just a single testcontext call and all expected andunexpected messages going to that context, the tcp code wouldhave to put all the unexpected messages at the top of thecontext to give them priority. This would fix the particularproblem that Nawab has, but its still dictating policy (whichmessages get priority) from within the particular BMI method.
I agree that forcing the application to define the policy (withthreads or timeouts) is moving the problem elsewhere, but itsmoving the problem to where it belongs. Its our pvfs serverthat wants unexpected messages to have priority, the bmi codeitself shouldn't dictate that priority. We could defineinterfaces to BMI that allow the policy to be set, but that'seven further from where we are now.
-sam

On Jan 6, 2009, at 2:52 PM, Rob Ross wrote:
Yeah a special named context for unexpected message would be aclean way to have done things... -- Rob
On Jan 6, 2009, at 2:49 PM, Phil Carns wrote:
Yeah, I don't particularly like adding special cases either.
I feel like making the consumer play with timeouts or use anextra thread would be just as much of a hack/workaround,though. Its just moving the problem elsewhere.
Fundamentally it seems more like a BMI API flaw. It wouldhave made more sense (for example) if unexpected messageswere assigned to a specific context and the testunexpected()and testcontext() functions were combined. The consumercould then use a single test call to retrieve both unexpectedand normal messages at once if they are in the same context(as in the pvfs2-server use case). Testing on a differentcontext would ignore the presence of unexpected messages (asin the problem triggering use case here).
There are other ways to deal with it, that's just anexample. We just need the API to better express theintention of the caller (preferably in one function) so thatBMI doesn't have to optimize by guessing about what else isgoing on.
That is more work than just adding a flag, though :) Itprobably depends on if we think the use case is going to bearound long enough to justify tweaking the API.
-Phil

Sam Lang wrote:
I've committed the set_info fix for this. I'm not crazyabout it, but it should work for now. In the long term, weshould probably move away from method specific hacks likethis. I.e. it should be up to the API consumer (our server)to adjust timeouts or call testunexpected in a separatethread.Nawab, in the zoidfs init code after initializing BMI youneed to call:
int check = 0;
BMI_set_info(0, BMI_TCP_CHECK_UNEXPECTED, &check);
-sam
On Dec 23, 2008, at 2:01 PM, Phil Carns wrote:
Sam Lang wrote:
Hi All,
I think Nawab has found a bug (or untested code path) inthe BMI tcp method. He's running a daemon that bothreceives unexpected requests (as a server), and receivesexpected responses (as a client).In the BMI_testcontext call, if there aren't any completed(expected) operations, and there are completed unexpectedreceives, we return immediately, assuming thatBMI_testunexpected will be called in turn. I think theidea here is that we want to keep our latency down forunexpected messages, instead of doing work on expectedmessages while unexpected messages are waiting in thehopper. But the daemon is single threaded, and makingblocking PVFS_sys_* calls, so we essentially spin forevercalling BMI_testcontext over and over.I'm not sure of the best way to fix this. Easy fixeswould be to remove the check for completed unexpectedreceives, and/or do tcp_do_work for a shorter timeout.It seems like we have a special case for blockingPVFS_sys_* calls. We want to ignore unexpected receivesjust in that case, and actually call tcp_do_work. Inother contexts, I think we want the behavior that we havenow, where we assume that a BMI_testunexpected call willfollow a BMI_testcontext call. We could modify thetestcontext call to take a separate parameter, but thatseems messy. We might also be able to handle this withseparate BMI contexts somehow...
I haven't dug in the code yet to see if I see any moreelegant way to handle it, but I wanted to mention that ifyou want to add a special flag to toggle the behavior, itmight be better to just set it globally with the set_info()function rather than modifying the testcontext() api. Thatway you don't have to change any of the other BMI methods.There are already a couple of similar set_info() calls totoggle BMI behavior for different use cases.
-Phil
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] bmi testcontext/testunexpected

Reply via email to