There can be several problems:
- The retry count is too small - try to put max number - 7
- Maybe the timeout is too small - so the HCA start to perform retry too much - try to enlarge it to 21
- Can be that the PSN between two sides is not synchronized
- The link fail
- The QP in the other side was closed or moved to error
 
In case this error occurs at the beginning of the application then it can indicate that the QP configuration is wrong.
 
Tziporet
-----Original Message-----
From: Sreenivasulu Pulichintala [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 09, 2004 12:49 PM
To: [EMAIL PROTECTED]
Subject: RE: [openib-general] VAPI_RETRY_EXC_ERR

The corresponding IB maro is - IB_COMP_RETRY_EXC_ERR

 

-----Original Message-----
From: Sreenivasulu Pulichintala
Sent: Tuesday, November 09, 2004 3:56 PM
To: [EMAIL PROTECTED]
Subject: [openib-general] VAPI_RETRY_EXC_ERR

 

HI,

 

I use MPICH 1.2.5 and MVAPICH 0.9.2 stack and when I run some of my fortran applications, some times my application crashes producing the following error –

 

===

Got completion with error, code=VAPI_RETRY_EXC_ERR, vendor code=81
mpi_latency: mpid/ch_vapi/viacheck.c:2109: viutil_spinandwaitcq: Assertion `sc->status == VAPI_SUCCESS' failed.
Timeout alarm signaled^M
Cleaning up all processes ...done.^M
Killed by signal 15.^M^M
==
 =
 
In what possible cases I get this error? Is it because of RESYNC?
 
Any help in this regard is highly appreciated.
 
Thanks
Sree
 

 

_______________________________________________
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to