[EMAIL PROTECTED] wrote on Tue, 20 Mar 2007 11:12 -0600:
> According to the log, you're getting IBV_WC_WR_FLUSH returned by the
> check_cq fuction which does all the polling for openIB.
> The IB spec says this about the error:
> "Work Request Flushed Error - A Work Request was in process or
> outstanding when the QP transitioned into the Error State."
>
> It doesnt go any further into the details of this error, but generally
> whenever the QP is sent into an error state,
> it is considered to be a fatal error by most of the IB community.
> (correct me if I'm wrong, please)
> This leads me to believe that you may still have underlying network
> problems.
> Have you been able to successfully run the various openIB test programs
> like ibv_rc_pingpong() or possibly tried the latest NetPIPE release
> which has openIB support (it may not give a pretty answer other than
> crashing if you have network problems though :-/ )
>
> If the network ends up not being the problem, we've got a serious
> problem here in the code, as we should never be putting the QP into
> erroneous states.
>
> Also, pete, the spec doesnt say anything about having async errors being
> flagged for an error like this, is this a case where we might be able to
> get useful information about the QP before or as it goes into an error
> state via async events?
Concur. Network problems. Server had a network error, the client
noticed and flushed the pending receive. Then told you about it and
exited.
Take a look at the server log and see if it registered any complaints.
And try some long-running network-level tests to see if you can find
any problems there, as Kyle suggests.
-- Pete
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users