Eugene,

This error indicate that somehow we're accessing the QP while the QP is in 
"down" state. As the asynchronous thread is the one that see this error, I 
wonder if it doesn't look for some information about a QP that has been 
destroyed by the main thread (as this only occurs in MPI_Finalize).

Can you look in the syslog to see if there is any additional info related to 
this issue there?

  George.



"All the books in the world contain no more information than is broadcast as 
video in a single large American city in a single year. Not all bits have equal 
value.". -- Carl Sagan

On Dec 30, 2010, at 20:43, Eugene Loh <eugene....@oracle.com> wrote:

> I was running a bunch of np=4 test programs over two nodes.  Occasionally, 
> *one* of the codes would see an IBV_EVENT_QP_ACCESS_ERR during 
> MPI_Finalize().  I traced the code and ran another program that mimicked the 
> particular MPI calls made by that program.  This other program, too, would 
> occasionally trigger this error.  I never saw the problem with other tests.  
> Rate of incidence could go from consecutive runs (I saw this once) to 1:100s 
> (more typically) to even less frequently -- I've had 1000s of consecutive 
> runs with no problems.  (The tests run a few seconds apiece.)  The traffic 
> pattern is sends from non-zero ranks to rank 0, with root-0 gathers, and lots 
> of Allgathers.  The largest messages are 1000bytes.  It appears the problem 
> is always seen on rank 3.
> 
> Now, I wouldn't mind someone telling me, based on that little information, 
> what the problem is here, but I guess I don't expect that.  What I am asking 
> is what IBV_EVENT_QP_ACCESS_ERR means.  Again, it's seen during MPI_Finalize. 
>  The async thread is seeing this.  What is this error trying to tell me?
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to