Prentice Bisbal wrote:
John Hearns wrote:

2009/6/18 Prentice Bisbal <[email protected] <mailto:[email protected]>>

    John Hearns wrote:
    > Can you log into node36 and run ibstat or ibstatus?
    >

Looks good to me!
Links are up and it sees a subnet manager. As Greg says, looks like
something wonky in the script which is reporting
the node status??

It's actually an MPI job (HPL using OpenMPI) which is reporting the
problem.

The head scratching continues...


Hi Prentice, list

Just in case you haven't seen this ...
Are you using OpenMPI 1.3.0 or 1.3.1?
Those versions have a memory leak bug when using IB.
The solution for the memory leak is to upgrade to 1.3.2.
A workaround is to use -mca mpi_leave_pinned=0.
See:

http://www.open-mpi.org/community/lists/announce/2009/04/0030.php
https://svn.open-mpi.org/trac/ompi/ticket/1853

My HPL with OpenMPI 1.3.1 crashed when using lots of memory.
I upgraded to 1.3.2, which fixed the problem,
and I haven't looked at the error messages,
so your problem may be different.
However, memory leaks can produce weird errors, hard to diagnose.

My $0.02.

Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to