Re: [OMPI devel] Vprotocol pessimist - Open MPI 1.4.1 and 1.4.2a1r22558

2010-02-24 Thread Aurélien Bouteiller
Hi, 

The instructions you found are now obsolete. I'll update them, thank you for 
pointing out.

The new procedure to use uncoordinated checkpoint is now 
mpirun -mca vprotocol pessimist -mca pml ob1,v [regular arguments]. 

The version available in trunk does not support actual restart due to lack of 
runtime support, and is limited to performance evaluation of FT cost without 
failures. There is an ongoing proposal to include such support in the main 
branch. However, we do have a branched version of Open MPI including all the 
necessary support that I can be provided on request. Please also consider that 
this is an ongoing research effort that has not yet matured enough to be used 
in a production environment. 

Aurelien Bouteiller
--
Dr. Aurelien Bouteiller
Innovative Computing Laboratory at the University of Tennessee



Le 6 févr. 2010 à 10:21, Caciano Machado a écrit :
> Hi,
> 
> I'm following the instructions found at
> https://svn.open-mpi.org/trac/ompi/wiki/EventLog_CR to run an
> application with the vprotocol pessimist enabled. I believe that I'm
> doing something wrong but I can't figure out the problem.
> 
> I have compiled Open MPI 1.4.1 and 1.4.2a1r22558 with the parameters:
> ./configure --prefix=/usr/local/openmpi-v/ --with-ft=cr
> --with-blcr=/usr/local/blcr/
> 
> Here is my configuration file:
> vprotocol_pessimist_priority=10
> pml_base_verbose=10
> pbl_v_verbose=500
> 
> The command line:
> mpirun -am /etc/v -np 2 -machinefile /etc/machinefile ep.B.8
> 
> And the mpirun output:
> ##3
> [xiru-10:03440] mca: base: components_open: Looking for pml components
> [xiru-10:03440] mca: base: components_open: opening pml components
> [xiru-10:03440] mca: base: components_open: found loaded component cm
> [xiru-10:03440] mca: base: components_open: component cm has no
> register function
> [xiru-10:03440] mca: base: component_find: unable to open
> /usr/local/openmpi-v/lib/openmpi/mca_mtl_mx: perhaps a missing symbol,
> or compiled for a different version of Open MPI? (ignored)
> 
> [xiru-10:03440] mca: base: components_open: component cm open function
> successful
> [xiru-10:03440] mca: base: components_open: found loaded component crcpw
> [xiru-10:03440] mca: base: components_open: component crcpw has no
> register function
> [xiru-10:03440] mca: base: components_open: component crcpw open
> function successful
> [xiru-10:03440] mca: base: components_open: found loaded component csum
> [xiru-10:03440] mca: base: components_open: component csum has no
> register function
> [xiru-10:03440] mca: base: component_find: unable to open
> /usr/local/openmpi-v/lib/openmpi/mca_btl_mx: perhaps a missing symbol,
> or compiled for a different version of Open MPI? (ignored)
> [xiru-10:03440] mca: base: components_open: component csum open
> function successful
> [xiru-10:03440] mca: base: components_open: found loaded component ob1
> [xiru-10:03440] mca: base: components_open: component ob1 has no
> register function
> [xiru-10:03440] mca: base: components_open: component ob1 open
> function successful
> [xiru-10:03440] mca: base: components_open: found loaded component v
> [xiru-10:03440] mca: base: components_open: component v has no register 
> function
> [xiru-10:03440] mca: base: components_open: component v open function 
> successful
> --
> [[65326,1],0]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> 
> Module: OpenFabrics (openib)
>  Host: xiru-10.portoalegre.grenoble.grid5000.fr
> 
> Another transport will be used instead, although this may result in
> lower performance.
> --
> [xiru-10:03440] select: initializing pml component cm
> [xiru-10:03440] select: init returned failure for component cm
> [xiru-10:03440] select: component crcpw not in the include list
> [xiru-10:03440] select: component csum not in the include list
> [xiru-10:03440] select: initializing pml component ob1
> [xiru-10:03440] select: init returned priority 20
> [xiru-10:03440] select: component v not in the include list
> [xiru-10:03440] selected ob1 best priority 20
> [xiru-10:03440] select: component ob1 selected
> [xiru-10:03440] mca: base: close: component cm closed
> [xiru-10:03440] mca: base: close: unloading component cm
> [xiru-10:03440] mca: base: close: component crcpw closed
> [xiru-10:03440] mca: base: close: unloading component crcpw
> [xiru-10:03440] mca: base: close: component csum closed
> [xiru-10:03440] mca: base: close: unloading component csum
> [xiru-10:03440] mca: base: close: component v closed
> [xiru-10:03440] mca: base: close: unloading component v
> ...
> 
> #3
> 
> It seems that the vprotocol module is not loading properly. Does
> anyone have a solution to 

[OMPI devel] Vprotocol pessimist - Open MPI 1.4.1 and 1.4.2a1r22558

2010-02-06 Thread Caciano Machado
Hi,

I'm following the instructions found at
https://svn.open-mpi.org/trac/ompi/wiki/EventLog_CR to run an
application with the vprotocol pessimist enabled. I believe that I'm
doing something wrong but I can't figure out the problem.

I have compiled Open MPI 1.4.1 and 1.4.2a1r22558 with the parameters:
./configure --prefix=/usr/local/openmpi-v/ --with-ft=cr
--with-blcr=/usr/local/blcr/

Here is my configuration file:
vprotocol_pessimist_priority=10
pml_base_verbose=10
pbl_v_verbose=500

The command line:
mpirun -am /etc/v -np 2 -machinefile /etc/machinefile ep.B.8

And the mpirun output:
##3
[xiru-10:03440] mca: base: components_open: Looking for pml components
[xiru-10:03440] mca: base: components_open: opening pml components
[xiru-10:03440] mca: base: components_open: found loaded component cm
[xiru-10:03440] mca: base: components_open: component cm has no
register function
[xiru-10:03440] mca: base: component_find: unable to open
/usr/local/openmpi-v/lib/openmpi/mca_mtl_mx: perhaps a missing symbol,
or compiled for a different version of Open MPI? (ignored)

[xiru-10:03440] mca: base: components_open: component cm open function
successful
[xiru-10:03440] mca: base: components_open: found loaded component crcpw
[xiru-10:03440] mca: base: components_open: component crcpw has no
register function
[xiru-10:03440] mca: base: components_open: component crcpw open
function successful
[xiru-10:03440] mca: base: components_open: found loaded component csum
[xiru-10:03440] mca: base: components_open: component csum has no
register function
[xiru-10:03440] mca: base: component_find: unable to open
/usr/local/openmpi-v/lib/openmpi/mca_btl_mx: perhaps a missing symbol,
or compiled for a different version of Open MPI? (ignored)
[xiru-10:03440] mca: base: components_open: component csum open
function successful
[xiru-10:03440] mca: base: components_open: found loaded component ob1
[xiru-10:03440] mca: base: components_open: component ob1 has no
register function
[xiru-10:03440] mca: base: components_open: component ob1 open
function successful
[xiru-10:03440] mca: base: components_open: found loaded component v
[xiru-10:03440] mca: base: components_open: component v has no register function
[xiru-10:03440] mca: base: components_open: component v open function successful
--
[[65326,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: xiru-10.portoalegre.grenoble.grid5000.fr

Another transport will be used instead, although this may result in
lower performance.
--
[xiru-10:03440] select: initializing pml component cm
[xiru-10:03440] select: init returned failure for component cm
[xiru-10:03440] select: component crcpw not in the include list
[xiru-10:03440] select: component csum not in the include list
[xiru-10:03440] select: initializing pml component ob1
[xiru-10:03440] select: init returned priority 20
[xiru-10:03440] select: component v not in the include list
[xiru-10:03440] selected ob1 best priority 20
[xiru-10:03440] select: component ob1 selected
[xiru-10:03440] mca: base: close: component cm closed
[xiru-10:03440] mca: base: close: unloading component cm
[xiru-10:03440] mca: base: close: component crcpw closed
[xiru-10:03440] mca: base: close: unloading component crcpw
[xiru-10:03440] mca: base: close: component csum closed
[xiru-10:03440] mca: base: close: unloading component csum
[xiru-10:03440] mca: base: close: component v closed
[xiru-10:03440] mca: base: close: unloading component v
...

#3

It seems that the vprotocol module is not loading properly. Does
anyone have a solution to run Open MPI with this module?

Regards,
Caciano Machado