Re: [OMPI users] How to build OMPI with Checkpoint/restart.

2009-09-17 Thread Joshua Hursey


On Sep 16, 2009, at 8:30 AM, Marcin Stolarek wrote:


Hi,

It seems I solved my problem. Root of the error was, that I haven't  
loaded blcr module. So I couldn't checkpoint even one therad  
application.


I am glad to hear that you have things working now.


However I stil can't find MCA:blcr in ompi_all -info, It's working.


This may have been a red-herring, sorry. I think ompi_info will only  
show the 'none' component due to the way it searches for components in  
the system. This is a bug how in the CRS selection logic plays with  
ompi_info. I will take a note/file a bug to look into fixing it.  
Unfortunately I do not have a work around other than looking in the  
install directory for the mca_crs_blcr.so file.


-- Josh



marcin

2009/9/15 Marcin Stolarek 
Hi,

I've done everythink from the beginig.:

rm  -r $ompi_install
make clean
make
make install

In $ompi_install, I've got files you mentioned:
mstol@halo2:/home/guests/mstol/openmpi/lib/openmp# ls mca_crs_bl*
mca_crs_blcr.la  mca_crs_blcr.so

but, when I try:
# ompi_info -all | grep "crs:"
mstol@halo2:/home/guests/mstol/openmpi/openmpi-1.3.3# ompi_info -- 
all | grep "crs:"

MCA crs: none (MCA v2.0, API v2.0, Component v1.3.3)
MCA crs: parameter "crs_base_verbose" (current  
value: "0", data source: default value)
MCA crs: parameter "crs" (current value: "none",  
data source: default value)
MCA crs: parameter  
"crs_none_select_warning" (current value: "0", data source: default  
value)
MCA crs: parameter "crs_none_priority" (current  
value: "0", data source: default value)


I don't have crs: blcr component.

marcin

2009/9/14 Josh Hursey 

The config.log looked fine, so I think you have fixed the configure  
problem that you previously posted about.


Though the config.log indicates that the BLCR component is scheduled  
for compile, ompi_info does not indicate that it is available. I  
suspect that the error below is because the CRS could not find any  
CRS components to select (though there should have been an error  
displayed indicating as such).


I would check your Open MPI installation to make sure that it is the  
one that you configured with. Specifically I would check to make  
sure that in the installation location there are the following files:

$install_dir/lib/openmpi/mca_crs_blcr.so
$install_dir/lib/openmpi/mca_crs_blcr.la

If that checks out, then I would remove the old installation  
directory and try reinstalling fresh.


Let me know how it goes.

-- Josh



On Sep 13, 2009, at 5:49 AM, Marcin Stolarek wrote:

I've tryed another time.  Here is what I get when trying to run  
using-1.4a1r21964 :


(terminus:~) mstol% mpirun --am ft-enable-cr ./a.out
--
It looks like opal_init failed for some reason; your parallel  
process is

likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

opal_cr_init() failed failed
--> Returned value -1 instead of OPAL_SUCCESS
--
[terminus:06120] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file  
runtime/orte_

init.c at line 79
--
It looks like MPI_INIT failed for some reason; your parallel process  
is

likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or  
environment

problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

ompi_mpi_init: orte_init failed
--> Returned "Error" (-1) instead of "Success" (0)
--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[terminus:6120] Abort before MPI_INIT completed successfully; not  
able to guaran

tee that all other processes were killed!
--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--

I've included config.log and ompi_info --all output in attacment
LD_LIBRARY_PATH is set correctly.
Any idea?

marcin





2009/9/12 Marcin Stolarek 
Hi,
I'm trying  to compile OpenMPI with  checkpoint restart via BLCR.  
I'm not sure which path shoul I set as a value of --with-blcr option.

I'm using 1.3.3 release, which version of BLCR should I use?

I've compiled the 

Re: [OMPI users] How to build OMPI with Checkpoint/restart.

2009-09-16 Thread Marcin Stolarek
Hi,

It seems I solved my problem. Root of the error was, that I haven't loaded
blcr module. So I couldn't checkpoint even one therad application.
However I stil can't find MCA:blcr in ompi_all -info, It's working.

marcin

2009/9/15 Marcin Stolarek 

> Hi,
>
> I've done everythink from the beginig.:
>
> rm  -r $ompi_install
> make clean
> make
> make install
>
> In $ompi_install, I've got files you mentioned:
> mstol@halo2:/home/guests/mstol/openmpi/lib/openmp# ls mca_crs_bl*
> mca_crs_blcr.la  mca_crs_blcr.so
>
> but, when I try:
> # ompi_info -all | grep "crs:"
> mstol@halo2:/home/guests/mstol/openmpi/openmpi-1.3.3# ompi_info --all |
> grep "crs:"
>  MCA crs: none (MCA v2.0, API v2.0, Component v1.3.3)
>  MCA crs: parameter "crs_base_verbose" (current value: "0",
> data source: default value)
>  MCA crs: parameter "crs" (current value: "none", data
> source: default value)
>  MCA crs: parameter "crs_none_select_warning" (current
> value: "0", data source: default value)
>  MCA crs: parameter "crs_none_priority" (current value:
> "0", data source: default value)
>
> I don't have crs: blcr component.
>
> marcin
>
> 2009/9/14 Josh Hursey 
>
> The config.log looked fine, so I think you have fixed the configure problem
>> that you previously posted about.
>>
>> Though the config.log indicates that the BLCR component is scheduled for
>> compile, ompi_info does not indicate that it is available. I suspect that
>> the error below is because the CRS could not find any CRS components to
>> select (though there should have been an error displayed indicating as
>> such).
>>
>> I would check your Open MPI installation to make sure that it is the one
>> that you configured with. Specifically I would check to make sure that in
>> the installation location there are the following files:
>>  $install_dir/lib/openmpi/mca_crs_blcr.so
>>  $install_dir/lib/openmpi/mca_crs_blcr.la
>>
>> If that checks out, then I would remove the old installation directory and
>> try reinstalling fresh.
>>
>> Let me know how it goes.
>>
>> -- Josh
>>
>>
>>
>> On Sep 13, 2009, at 5:49 AM, Marcin Stolarek wrote:
>>
>>  I've tryed another time.  Here is what I get when trying to run
>>> using-1.4a1r21964 :
>>>
>>> (terminus:~) mstol% mpirun --am ft-enable-cr ./a.out
>>>
>>> --
>>> It looks like opal_init failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during opal_init; some of which are due to configuration or
>>> environment problems.  This failure appears to be an internal failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>>
>>>  opal_cr_init() failed failed
>>>  --> Returned value -1 instead of OPAL_SUCCESS
>>>
>>> --
>>> [terminus:06120] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file
>>> runtime/orte_
>>> init.c at line 79
>>>
>>> --
>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during MPI_INIT; some of which are due to configuration or
>>> environment
>>> problems.  This failure appears to be an internal failure; here's some
>>> additional information (which may only be relevant to an Open MPI
>>> developer):
>>>
>>>  ompi_mpi_init: orte_init failed
>>>  --> Returned "Error" (-1) instead of "Success" (0)
>>>
>>> --
>>> *** An error occurred in MPI_Init
>>> *** before MPI was initialized
>>> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>>> [terminus:6120] Abort before MPI_INIT completed successfully; not able to
>>> guaran
>>> tee that all other processes were killed!
>>>
>>> --
>>> mpirun noticed that the job aborted, but has no info as to the process
>>> that caused that situation.
>>>
>>> --
>>>
>>> I've included config.log and ompi_info --all output in attacment
>>> LD_LIBRARY_PATH is set correctly.
>>> Any idea?
>>>
>>> marcin
>>>
>>>
>>>
>>>
>>>
>>> 2009/9/12 Marcin Stolarek 
>>> Hi,
>>> I'm trying  to compile OpenMPI with  checkpoint restart via BLCR. I'm not
>>> sure which path shoul I set as a value of --with-blcr option.
>>> I'm using 1.3.3 release, which version of BLCR should I use?
>>>
>>> I've compiled the newest version of BLCR with --prefix=$BLCR, and I've
>>> putten as a option to openmpi configure --with-blcr=$BLCR, but I recived:
>>>
>>>
>>> configure:76646: checking if MCA component crs:blcr can 

Re: [OMPI users] How to build OMPI with Checkpoint/restart.

2009-09-15 Thread Marcin Stolarek
Hi,

I've done everythink from the beginig.:

rm  -r $ompi_install
make clean
make
make install

In $ompi_install, I've got files you mentioned:
mstol@halo2:/home/guests/mstol/openmpi/lib/openmp# ls mca_crs_bl*
mca_crs_blcr.la  mca_crs_blcr.so

but, when I try:
# ompi_info -all | grep "crs:"
mstol@halo2:/home/guests/mstol/openmpi/openmpi-1.3.3# ompi_info --all | grep
"crs:"
 MCA crs: none (MCA v2.0, API v2.0, Component v1.3.3)
 MCA crs: parameter "crs_base_verbose" (current value: "0",
data source: default value)
 MCA crs: parameter "crs" (current value: "none", data
source: default value)
 MCA crs: parameter "crs_none_select_warning" (current
value: "0", data source: default value)
 MCA crs: parameter "crs_none_priority" (current value: "0",
data source: default value)

I don't have crs: blcr component.

marcin

2009/9/14 Josh Hursey 

> The config.log looked fine, so I think you have fixed the configure problem
> that you previously posted about.
>
> Though the config.log indicates that the BLCR component is scheduled for
> compile, ompi_info does not indicate that it is available. I suspect that
> the error below is because the CRS could not find any CRS components to
> select (though there should have been an error displayed indicating as
> such).
>
> I would check your Open MPI installation to make sure that it is the one
> that you configured with. Specifically I would check to make sure that in
> the installation location there are the following files:
>  $install_dir/lib/openmpi/mca_crs_blcr.so
>  $install_dir/lib/openmpi/mca_crs_blcr.la
>
> If that checks out, then I would remove the old installation directory and
> try reinstalling fresh.
>
> Let me know how it goes.
>
> -- Josh
>
>
>
> On Sep 13, 2009, at 5:49 AM, Marcin Stolarek wrote:
>
>  I've tryed another time.  Here is what I get when trying to run
>> using-1.4a1r21964 :
>>
>> (terminus:~) mstol% mpirun --am ft-enable-cr ./a.out
>> --
>> It looks like opal_init failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during opal_init; some of which are due to configuration or
>> environment problems.  This failure appears to be an internal failure;
>> here's some additional information (which may only be relevant to an
>> Open MPI developer):
>>
>>  opal_cr_init() failed failed
>>  --> Returned value -1 instead of OPAL_SUCCESS
>> --
>> [terminus:06120] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file
>> runtime/orte_
>> init.c at line 79
>> --
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or
>> environment
>> problems.  This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>>  ompi_mpi_init: orte_init failed
>>  --> Returned "Error" (-1) instead of "Success" (0)
>> --
>> *** An error occurred in MPI_Init
>> *** before MPI was initialized
>> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>> [terminus:6120] Abort before MPI_INIT completed successfully; not able to
>> guaran
>> tee that all other processes were killed!
>> --
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --
>>
>> I've included config.log and ompi_info --all output in attacment
>> LD_LIBRARY_PATH is set correctly.
>> Any idea?
>>
>> marcin
>>
>>
>>
>>
>>
>> 2009/9/12 Marcin Stolarek 
>> Hi,
>> I'm trying  to compile OpenMPI with  checkpoint restart via BLCR. I'm not
>> sure which path shoul I set as a value of --with-blcr option.
>> I'm using 1.3.3 release, which version of BLCR should I use?
>>
>> I've compiled the newest version of BLCR with --prefix=$BLCR, and I've
>> putten as a option to openmpi configure --with-blcr=$BLCR, but I recived:
>>
>>
>> configure:76646: checking if MCA component crs:blcr can compile
>> configure:76648: result: no
>>
>> marcin
>>
>>
>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] How to build OMPI with Checkpoint/restart.

2009-09-14 Thread Josh Hursey
The config.log looked fine, so I think you have fixed the configure  
problem that you previously posted about.


Though the config.log indicates that the BLCR component is scheduled  
for compile, ompi_info does not indicate that it is available. I  
suspect that the error below is because the CRS could not find any CRS  
components to select (though there should have been an error displayed  
indicating as such).


I would check your Open MPI installation to make sure that it is the  
one that you configured with. Specifically I would check to make sure  
that in the installation location there are the following files:

  $install_dir/lib/openmpi/mca_crs_blcr.so
  $install_dir/lib/openmpi/mca_crs_blcr.la

If that checks out, then I would remove the old installation directory  
and try reinstalling fresh.


Let me know how it goes.

-- Josh


On Sep 13, 2009, at 5:49 AM, Marcin Stolarek wrote:

I've tryed another time.  Here is what I get when trying to run  
using-1.4a1r21964 :


(terminus:~) mstol% mpirun --am ft-enable-cr ./a.out
--
It looks like opal_init failed for some reason; your parallel  
process is

likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_cr_init() failed failed
  --> Returned value -1 instead of OPAL_SUCCESS
--
[terminus:06120] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file  
runtime/orte_

init.c at line 79
--
It looks like MPI_INIT failed for some reason; your parallel process  
is

likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or  
environment

problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Error" (-1) instead of "Success" (0)
--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[terminus:6120] Abort before MPI_INIT completed successfully; not  
able to guaran

tee that all other processes were killed!
--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--

I've included config.log and ompi_info --all output in attacment
LD_LIBRARY_PATH is set correctly.
Any idea?

marcin





2009/9/12 Marcin Stolarek 
Hi,
I'm trying  to compile OpenMPI with  checkpoint restart via BLCR.  
I'm not sure which path shoul I set as a value of --with-blcr option.

I'm using 1.3.3 release, which version of BLCR should I use?

I've compiled the newest version of BLCR with --prefix=$BLCR, and  
I've putten as a option to openmpi configure --with-blcr=$BLCR, but  
I recived:



configure:76646: checking if MCA component crs:blcr can compile
configure:76648: result: no

marcin





___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] How to build OMPI with Checkpoint/restart.

2009-09-13 Thread Marcin Stolarek
I've tryed another time.  Here is what I get when trying to run
using-1.4a1r21964 :

(terminus:~) mstol% mpirun --am ft-enable-cr ./a.out
--
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_cr_init() failed failed
  --> Returned value -1 instead of OPAL_SUCCESS
--
[terminus:06120] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file
runtime/orte_
init.c at line 79
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Error" (-1) instead of "Success" (0)
--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[terminus:6120] Abort before MPI_INIT completed successfully; not able to
guaran
tee that all other processes were killed!
--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--

I've included config.log and ompi_info --all output in attacment
LD_LIBRARY_PATH is set correctly.
Any idea?

marcin





2009/9/12 Marcin Stolarek 

> Hi,
> I'm trying  to compile OpenMPI with  checkpoint restart via BLCR. I'm not
> sure which path shoul I set as a value of --with-blcr option.
> I'm using 1.3.3 release, which version of BLCR should I use?
>
> I've compiled the newest version of BLCR with --prefix=$BLCR, and I've
> putten as a option to openmpi configure --with-blcr=$BLCR, but I recived:
>
>
> configure:76646: checking if MCA component crs:blcr can compile
> configure:76648: result: no
>
> marcin
>
>
>
>
>


info.tar.gz
Description: GNU Zip compressed data