There
can be several problems:
- The
retry count is too small - try to put max number - 7
-
Maybe the timeout is too small - so the HCA start to perform retry too much -
try to enlarge it to 21
- Can
be that the PSN between two sides is not synchronized
- The
link fail
- The
QP in the other side was closed or moved to error
In
case this error occurs at the beginning of the application then it can
indicate that the QP configuration is wrong.
Tziporet
The corresponding IB
maro is - IB_COMP_RETRY_EXC_ERR
-----Original
Message----- From:
Sreenivasulu Pulichintala Sent: Tuesday, November 09, 2004 3:56
PM To:
[EMAIL PROTECTED] Subject: [openib-general]
VAPI_RETRY_EXC_ERR
HI,
I use MPICH 1.2.5 and MVAPICH 0.9.2 stack
and when I run some of my fortran applications, some times my application
crashes producing the following error –
=== Got completion with error, code=VAPI_RETRY_EXC_ERR, vendor code=81 mpi_latency: mpid/ch_vapi/viacheck.c:2109: viutil_spinandwaitcq: Assertion `sc->status == VAPI_SUCCESS' failed. Timeout alarm signaled^M Cleaning up all processes ...done.^M Killed by signal 15.^M^M ==
= In what possible cases I get this error? Is it because of RESYNC? Any help in this regard is highly appreciated. Thanks Sree
|
_______________________________________________
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general