Can you try with the current trunk head (r24296)?
I just committed a fix for the C/R functionality in which restarts were getting 
stuck. This will likely affect the migration functionality, but I have not had 
an opportunity to test just yet.

Another thing to check is that prelink is turned off on all of your machines.
  https://upc-bugs.lbl.gov//blcr/doc/html/FAQ.html#prelink

Let me know if the problem persists, and I'll dig into a bit more.

Thanks,
Josh

On Jan 24, 2011, at 11:37 AM, Hugo Meyer wrote:

> Hello @ll
> 
> I've got a problem when i try to use the ompi-migrate command.  
> 
> What i'm doing is execute for example the next application in one node of a 
> cluster (both process wil run on the same node):
> 
> mpirun -np 2 -am ft-enable-cr ./whoami 10 10
> 
> Then in the same node i try to migrate the processes to another node:
> 
> ompi-migrate -x node9 -t node3 14914
> 
> And then i get this message:
> 
> [clus9:15620] *** Process received signal ***
> [clus9:15620] Signal: Segmentation fault (11)
> [clus9:15620] Signal code: Address not mapped (1)
> [clus9:15620] Failing at address: (nil)
> [clus9:15620] [ 0] /lib64/libpthread.so.0 [0x2aaaac0b8d40]
> [clus9:15620] *** End of error message ***
> Segmentation fault
> 
> I assume that maybe there is something wrong with the thread level, but i 
> have configured the open-mpi like this:
> 
> ../configure --prefix=/home/hmeyer/desarrollo/ompi-code/binarios/ 
> --enable-debug --enable-debug-symbols --enable-trace --with-ft=cr 
> --disable-ipv6 --enable-opal-multi-threads --enable-ft-thread --without-hwloc 
> --disable-vt --with-blcr=/soft/blcr-0.8.2/ 
> --with-blcr-libdir=/soft/blcr-0.8.2/lib/
> 
> The checkpoint and restart works fine, but when i restore an application that 
> has more than one process, this one is restored and executed until the last 
> line before MPI_FINALIZE(), but the processes never finalize, i assume that 
> they never call the MPI_FINALIZE(), but with one process ompi-checkpoint and 
> ompi-restart work great.
> 
> Best regards.
> 
> Hugo Meyer
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

------------------------------------
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey


Reply via email to