[EMAIL PROTECTED] wrote on Tue, 20 Mar 2007 11:12 -0600:
> According to the log, you're getting IBV_WC_WR_FLUSH returned by the 
> check_cq fuction which does all the polling for openIB.
> The IB spec says this about the error:
> "Work Request Flushed Error - A Work Request was in process or 
> outstanding when the QP transitioned into the Error State."
> 
> It doesnt go any further into the details of this error, but generally 
> whenever the QP is sent into an error state,
> it is considered to be a fatal error by most of the IB community. 
> (correct me if I'm wrong, please)
> This leads me to believe that you may still have underlying network 
> problems.
> Have you been able to successfully run the various openIB test programs 
> like ibv_rc_pingpong() or possibly tried the latest NetPIPE release 
> which has openIB support (it may not give a pretty answer other than 
> crashing if you have network problems though :-/ )
> 
> If the network ends up not being the problem, we've got a serious 
> problem here in the code, as we should never be putting the QP into 
> erroneous states.
> 
> Also, pete, the spec doesnt say anything about having async errors being 
> flagged for an error like this, is this a case where we might be able to 
> get useful information about the QP before or as it goes into an error 
> state via async events?

Concur.  Network problems.  Server had a network error, the client
noticed and flushed the pending receive.  Then told you about it and
exited.

Take a look at the server log and see if it registered any complaints.
And try some long-running network-level tests to see if you can find
any problems there, as Kyle suggests.

                -- Pete

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to