Re: [OMPI users] BLCR + Qlogic infiniband

2012-12-11 Thread Josh Hursey
With that configure string, Open MPI should fail in configure if it does
not find the BLCR libraries. Note that this does not check to make sure the
BLCR is loaded as a module in the kernel (you will need to check that
manually).

The ompi_info command will also show you if C/R is enabled and will show
you if the 'blcr' 'crs' module in the listing at the end. That is probably
the best way to see if the build includes this support.


On Tue, Dec 4, 2012 at 4:43 AM, William Hay  wrote:

>
>
>
> On 28 November 2012 11:14, William Hay  wrote:
>
>> I'm trying to build openmpi with support for BLCR plus qlogic infiniband
>> (plus grid engine).  Everything seems to compile OK and checkpoints are
>> taken but whenever I try to restore a checkpoint I get the following error:
>> - do_mmap(, 2aaab18c7000, 1000, ...) failed:
>> ffea
>> - mmap failed: /dev/ipath
>> - thaw_threads returned error, aborting. -22
>> - thaw_threads returned error, aborting. -22
>> Restart failed: Invalid argument
>>
>> This occurs whether I specify psm or openib as the btl.
>>
>> This looks like the sort of thing I would expect to be handled by the
>> blcr supporting code in openmpi.  So I guess I have a couple ofquestions.
>> 1)Are Infiniband and BLCR support in openmpi compatible?
>> 2)Are there any special tricks necessary to get them working together.
>>
>> A third question occurred to me that may be relevant.  How do I verify
> that my openmpi install has blcr support built in?  I would have thought
> this would mean that either mpiexec or binaries built with mpicc would have
> libcr linked in.  However running ldd doesn't report this in either case.
>  I'm setting LD_PRELOAD to point to it but I would have thought openmpi
> would need to register a callback with blcr and it would be easier to do
> this if the library were linked in rather than trying to detect whether it
> has been LD_PRELOADed.  I'm building with the following options:
> ./configure --prefix=/home/ccaawih/openmpi-blcr --with-openib
> --without-psm --with-blcr=/usr --with-blcr-libdir=/usr/lib64 --with-ft=cr
> --enable-ft-thread --enable-mpi-threads --with-sge
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Joshua Hursey
Assistant Professor of Computer Science
University of Wisconsin-La Crosse
http://cs.uwlax.edu/~jjhursey


Re: [OMPI users] BLCR + Qlogic infiniband

2012-12-04 Thread William Hay
On 28 November 2012 11:14, William Hay  wrote:

> I'm trying to build openmpi with support for BLCR plus qlogic infiniband
> (plus grid engine).  Everything seems to compile OK and checkpoints are
> taken but whenever I try to restore a checkpoint I get the following error:
> - do_mmap(, 2aaab18c7000, 1000, ...) failed:
> ffea
> - mmap failed: /dev/ipath
> - thaw_threads returned error, aborting. -22
> - thaw_threads returned error, aborting. -22
> Restart failed: Invalid argument
>
> This occurs whether I specify psm or openib as the btl.
>
> This looks like the sort of thing I would expect to be handled by the blcr
> supporting code in openmpi.  So I guess I have a couple ofquestions.
> 1)Are Infiniband and BLCR support in openmpi compatible?
> 2)Are there any special tricks necessary to get them working together.
>
> A third question occurred to me that may be relevant.  How do I verify
that my openmpi install has blcr support built in?  I would have thought
this would mean that either mpiexec or binaries built with mpicc would have
libcr linked in.  However running ldd doesn't report this in either case.
 I'm setting LD_PRELOAD to point to it but I would have thought openmpi
would need to register a callback with blcr and it would be easier to do
this if the library were linked in rather than trying to detect whether it
has been LD_PRELOADed.  I'm building with the following options:
./configure --prefix=/home/ccaawih/openmpi-blcr --with-openib --without-psm
--with-blcr=/usr --with-blcr-libdir=/usr/lib64 --with-ft=cr
--enable-ft-thread --enable-mpi-threads --with-sge


Re: [OMPI users] BLCR + Qlogic infiniband

2012-11-30 Thread Josh Hursey
The openib BTL and BLCR support in Open MPI were working about a year ago
(when I last checked). The psm BTL is not supported at the moment though.

>From the error, I suspect that we are not fully closing the openib btl
driver before the checkpoint thus when we try to restart it is looking for
a resource that is no longer present. I created a ticket for us to
investigate further if you want to follow it:
  https://svn.open-mpi.org/trac/ompi/ticket/3417

Unfortunately, I do not know who is currently supporting that code path (I
might pick it back up at some point, but cannot promise anything in the
near future). But I will keep an eye on the ticket and see what I can do.
If it is what I think it is, then it should not take too much work to get
it working again.

-- Josh

On Wed, Nov 28, 2012 at 5:14 AM, William Hay  wrote:

> I'm trying to build openmpi with support for BLCR plus qlogic infiniband
> (plus grid engine).  Everything seems to compile OK and checkpoints are
> taken but whenever I try to restore a checkpoint I get the following error:
> - do_mmap(, 2aaab18c7000, 1000, ...) failed:
> ffea
> - mmap failed: /dev/ipath
> - thaw_threads returned error, aborting. -22
> - thaw_threads returned error, aborting. -22
> Restart failed: Invalid argument
>
> This occurs whether I specify psm or openib as the btl.
>
> This looks like the sort of thing I would expect to be handled by the blcr
> supporting code in openmpi.  So I guess I have a couple ofquestions.
> 1)Are Infiniband and BLCR support in openmpi compatible?
> 2)Are there any special tricks necessary to get them working together.
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Joshua Hursey
Assistant Professor of Computer Science
University of Wisconsin-La Crosse
http://cs.uwlax.edu/~jjhursey


[OMPI users] BLCR + Qlogic infiniband

2012-11-28 Thread William Hay
I'm trying to build openmpi with support for BLCR plus qlogic infiniband
(plus grid engine).  Everything seems to compile OK and checkpoints are
taken but whenever I try to restore a checkpoint I get the following error:
- do_mmap(, 2aaab18c7000, 1000, ...) failed:
ffea
- mmap failed: /dev/ipath
- thaw_threads returned error, aborting. -22
- thaw_threads returned error, aborting. -22
Restart failed: Invalid argument

This occurs whether I specify psm or openib as the btl.

This looks like the sort of thing I would expect to be handled by the blcr
supporting code in openmpi.  So I guess I have a couple ofquestions.
1)Are Infiniband and BLCR support in openmpi compatible?
2)Are there any special tricks necessary to get them working together.