Re: [OMPI users] How to build OMPI with Checkpoint/restart.
On Sep 16, 2009, at 8:30 AM, Marcin Stolarek wrote: Hi, It seems I solved my problem. Root of the error was, that I haven't loaded blcr module. So I couldn't checkpoint even one therad application. I am glad to hear that you have things working now. However I stil can't find MCA:blcr in ompi_all -info, It's working. This may have been a red-herring, sorry. I think ompi_info will only show the 'none' component due to the way it searches for components in the system. This is a bug how in the CRS selection logic plays with ompi_info. I will take a note/file a bug to look into fixing it. Unfortunately I do not have a work around other than looking in the install directory for the mca_crs_blcr.so file. -- Josh marcin 2009/9/15 Marcin StolarekHi, I've done everythink from the beginig.: rm -r $ompi_install make clean make make install In $ompi_install, I've got files you mentioned: mstol@halo2:/home/guests/mstol/openmpi/lib/openmp# ls mca_crs_bl* mca_crs_blcr.la mca_crs_blcr.so but, when I try: # ompi_info -all | grep "crs:" mstol@halo2:/home/guests/mstol/openmpi/openmpi-1.3.3# ompi_info -- all | grep "crs:" MCA crs: none (MCA v2.0, API v2.0, Component v1.3.3) MCA crs: parameter "crs_base_verbose" (current value: "0", data source: default value) MCA crs: parameter "crs" (current value: "none", data source: default value) MCA crs: parameter "crs_none_select_warning" (current value: "0", data source: default value) MCA crs: parameter "crs_none_priority" (current value: "0", data source: default value) I don't have crs: blcr component. marcin 2009/9/14 Josh Hursey The config.log looked fine, so I think you have fixed the configure problem that you previously posted about. Though the config.log indicates that the BLCR component is scheduled for compile, ompi_info does not indicate that it is available. I suspect that the error below is because the CRS could not find any CRS components to select (though there should have been an error displayed indicating as such). I would check your Open MPI installation to make sure that it is the one that you configured with. Specifically I would check to make sure that in the installation location there are the following files: $install_dir/lib/openmpi/mca_crs_blcr.so $install_dir/lib/openmpi/mca_crs_blcr.la If that checks out, then I would remove the old installation directory and try reinstalling fresh. Let me know how it goes. -- Josh On Sep 13, 2009, at 5:49 AM, Marcin Stolarek wrote: I've tryed another time. Here is what I get when trying to run using-1.4a1r21964 : (terminus:~) mstol% mpirun --am ft-enable-cr ./a.out -- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_cr_init() failed failed --> Returned value -1 instead of OPAL_SUCCESS -- [terminus:06120] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_ init.c at line 79 -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: orte_init failed --> Returned "Error" (-1) instead of "Success" (0) -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [terminus:6120] Abort before MPI_INIT completed successfully; not able to guaran tee that all other processes were killed! -- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -- I've included config.log and ompi_info --all output in attacment LD_LIBRARY_PATH is set correctly. Any idea? marcin 2009/9/12 Marcin Stolarek Hi, I'm trying to compile OpenMPI with checkpoint restart via BLCR. I'm not sure which path shoul I set as a value of --with-blcr option. I'm using 1.3.3 release, which version of BLCR should I use? I've compiled the
Re: [OMPI users] How to build OMPI with Checkpoint/restart.
Hi, It seems I solved my problem. Root of the error was, that I haven't loaded blcr module. So I couldn't checkpoint even one therad application. However I stil can't find MCA:blcr in ompi_all -info, It's working. marcin 2009/9/15 Marcin Stolarek> Hi, > > I've done everythink from the beginig.: > > rm -r $ompi_install > make clean > make > make install > > In $ompi_install, I've got files you mentioned: > mstol@halo2:/home/guests/mstol/openmpi/lib/openmp# ls mca_crs_bl* > mca_crs_blcr.la mca_crs_blcr.so > > but, when I try: > # ompi_info -all | grep "crs:" > mstol@halo2:/home/guests/mstol/openmpi/openmpi-1.3.3# ompi_info --all | > grep "crs:" > MCA crs: none (MCA v2.0, API v2.0, Component v1.3.3) > MCA crs: parameter "crs_base_verbose" (current value: "0", > data source: default value) > MCA crs: parameter "crs" (current value: "none", data > source: default value) > MCA crs: parameter "crs_none_select_warning" (current > value: "0", data source: default value) > MCA crs: parameter "crs_none_priority" (current value: > "0", data source: default value) > > I don't have crs: blcr component. > > marcin > > 2009/9/14 Josh Hursey > > The config.log looked fine, so I think you have fixed the configure problem >> that you previously posted about. >> >> Though the config.log indicates that the BLCR component is scheduled for >> compile, ompi_info does not indicate that it is available. I suspect that >> the error below is because the CRS could not find any CRS components to >> select (though there should have been an error displayed indicating as >> such). >> >> I would check your Open MPI installation to make sure that it is the one >> that you configured with. Specifically I would check to make sure that in >> the installation location there are the following files: >> $install_dir/lib/openmpi/mca_crs_blcr.so >> $install_dir/lib/openmpi/mca_crs_blcr.la >> >> If that checks out, then I would remove the old installation directory and >> try reinstalling fresh. >> >> Let me know how it goes. >> >> -- Josh >> >> >> >> On Sep 13, 2009, at 5:49 AM, Marcin Stolarek wrote: >> >> I've tryed another time. Here is what I get when trying to run >>> using-1.4a1r21964 : >>> >>> (terminus:~) mstol% mpirun --am ft-enable-cr ./a.out >>> >>> -- >>> It looks like opal_init failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during opal_init; some of which are due to configuration or >>> environment problems. This failure appears to be an internal failure; >>> here's some additional information (which may only be relevant to an >>> Open MPI developer): >>> >>> opal_cr_init() failed failed >>> --> Returned value -1 instead of OPAL_SUCCESS >>> >>> -- >>> [terminus:06120] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file >>> runtime/orte_ >>> init.c at line 79 >>> >>> -- >>> It looks like MPI_INIT failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during MPI_INIT; some of which are due to configuration or >>> environment >>> problems. This failure appears to be an internal failure; here's some >>> additional information (which may only be relevant to an Open MPI >>> developer): >>> >>> ompi_mpi_init: orte_init failed >>> --> Returned "Error" (-1) instead of "Success" (0) >>> >>> -- >>> *** An error occurred in MPI_Init >>> *** before MPI was initialized >>> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) >>> [terminus:6120] Abort before MPI_INIT completed successfully; not able to >>> guaran >>> tee that all other processes were killed! >>> >>> -- >>> mpirun noticed that the job aborted, but has no info as to the process >>> that caused that situation. >>> >>> -- >>> >>> I've included config.log and ompi_info --all output in attacment >>> LD_LIBRARY_PATH is set correctly. >>> Any idea? >>> >>> marcin >>> >>> >>> >>> >>> >>> 2009/9/12 Marcin Stolarek >>> Hi, >>> I'm trying to compile OpenMPI with checkpoint restart via BLCR. I'm not >>> sure which path shoul I set as a value of --with-blcr option. >>> I'm using 1.3.3 release, which version of BLCR should I use? >>> >>> I've compiled the newest version of BLCR with --prefix=$BLCR, and I've >>> putten as a option to openmpi configure --with-blcr=$BLCR, but I recived: >>> >>> >>> configure:76646: checking if MCA component crs:blcr can
Re: [OMPI users] How to build OMPI with Checkpoint/restart.
Hi, I've done everythink from the beginig.: rm -r $ompi_install make clean make make install In $ompi_install, I've got files you mentioned: mstol@halo2:/home/guests/mstol/openmpi/lib/openmp# ls mca_crs_bl* mca_crs_blcr.la mca_crs_blcr.so but, when I try: # ompi_info -all | grep "crs:" mstol@halo2:/home/guests/mstol/openmpi/openmpi-1.3.3# ompi_info --all | grep "crs:" MCA crs: none (MCA v2.0, API v2.0, Component v1.3.3) MCA crs: parameter "crs_base_verbose" (current value: "0", data source: default value) MCA crs: parameter "crs" (current value: "none", data source: default value) MCA crs: parameter "crs_none_select_warning" (current value: "0", data source: default value) MCA crs: parameter "crs_none_priority" (current value: "0", data source: default value) I don't have crs: blcr component. marcin 2009/9/14 Josh Hursey> The config.log looked fine, so I think you have fixed the configure problem > that you previously posted about. > > Though the config.log indicates that the BLCR component is scheduled for > compile, ompi_info does not indicate that it is available. I suspect that > the error below is because the CRS could not find any CRS components to > select (though there should have been an error displayed indicating as > such). > > I would check your Open MPI installation to make sure that it is the one > that you configured with. Specifically I would check to make sure that in > the installation location there are the following files: > $install_dir/lib/openmpi/mca_crs_blcr.so > $install_dir/lib/openmpi/mca_crs_blcr.la > > If that checks out, then I would remove the old installation directory and > try reinstalling fresh. > > Let me know how it goes. > > -- Josh > > > > On Sep 13, 2009, at 5:49 AM, Marcin Stolarek wrote: > > I've tryed another time. Here is what I get when trying to run >> using-1.4a1r21964 : >> >> (terminus:~) mstol% mpirun --am ft-enable-cr ./a.out >> -- >> It looks like opal_init failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during opal_init; some of which are due to configuration or >> environment problems. This failure appears to be an internal failure; >> here's some additional information (which may only be relevant to an >> Open MPI developer): >> >> opal_cr_init() failed failed >> --> Returned value -1 instead of OPAL_SUCCESS >> -- >> [terminus:06120] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file >> runtime/orte_ >> init.c at line 79 >> -- >> It looks like MPI_INIT failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during MPI_INIT; some of which are due to configuration or >> environment >> problems. This failure appears to be an internal failure; here's some >> additional information (which may only be relevant to an Open MPI >> developer): >> >> ompi_mpi_init: orte_init failed >> --> Returned "Error" (-1) instead of "Success" (0) >> -- >> *** An error occurred in MPI_Init >> *** before MPI was initialized >> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) >> [terminus:6120] Abort before MPI_INIT completed successfully; not able to >> guaran >> tee that all other processes were killed! >> -- >> mpirun noticed that the job aborted, but has no info as to the process >> that caused that situation. >> -- >> >> I've included config.log and ompi_info --all output in attacment >> LD_LIBRARY_PATH is set correctly. >> Any idea? >> >> marcin >> >> >> >> >> >> 2009/9/12 Marcin Stolarek >> Hi, >> I'm trying to compile OpenMPI with checkpoint restart via BLCR. I'm not >> sure which path shoul I set as a value of --with-blcr option. >> I'm using 1.3.3 release, which version of BLCR should I use? >> >> I've compiled the newest version of BLCR with --prefix=$BLCR, and I've >> putten as a option to openmpi configure --with-blcr=$BLCR, but I recived: >> >> >> configure:76646: checking if MCA component crs:blcr can compile >> configure:76648: result: no >> >> marcin >> >> >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] How to build OMPI with Checkpoint/restart.
The config.log looked fine, so I think you have fixed the configure problem that you previously posted about. Though the config.log indicates that the BLCR component is scheduled for compile, ompi_info does not indicate that it is available. I suspect that the error below is because the CRS could not find any CRS components to select (though there should have been an error displayed indicating as such). I would check your Open MPI installation to make sure that it is the one that you configured with. Specifically I would check to make sure that in the installation location there are the following files: $install_dir/lib/openmpi/mca_crs_blcr.so $install_dir/lib/openmpi/mca_crs_blcr.la If that checks out, then I would remove the old installation directory and try reinstalling fresh. Let me know how it goes. -- Josh On Sep 13, 2009, at 5:49 AM, Marcin Stolarek wrote: I've tryed another time. Here is what I get when trying to run using-1.4a1r21964 : (terminus:~) mstol% mpirun --am ft-enable-cr ./a.out -- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_cr_init() failed failed --> Returned value -1 instead of OPAL_SUCCESS -- [terminus:06120] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_ init.c at line 79 -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: orte_init failed --> Returned "Error" (-1) instead of "Success" (0) -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [terminus:6120] Abort before MPI_INIT completed successfully; not able to guaran tee that all other processes were killed! -- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -- I've included config.log and ompi_info --all output in attacment LD_LIBRARY_PATH is set correctly. Any idea? marcin 2009/9/12 Marcin StolarekHi, I'm trying to compile OpenMPI with checkpoint restart via BLCR. I'm not sure which path shoul I set as a value of --with-blcr option. I'm using 1.3.3 release, which version of BLCR should I use? I've compiled the newest version of BLCR with --prefix=$BLCR, and I've putten as a option to openmpi configure --with-blcr=$BLCR, but I recived: configure:76646: checking if MCA component crs:blcr can compile configure:76648: result: no marcin ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] How to build OMPI with Checkpoint/restart.
I've tryed another time. Here is what I get when trying to run using-1.4a1r21964 : (terminus:~) mstol% mpirun --am ft-enable-cr ./a.out -- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): opal_cr_init() failed failed --> Returned value -1 instead of OPAL_SUCCESS -- [terminus:06120] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_ init.c at line 79 -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: orte_init failed --> Returned "Error" (-1) instead of "Success" (0) -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) [terminus:6120] Abort before MPI_INIT completed successfully; not able to guaran tee that all other processes were killed! -- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -- I've included config.log and ompi_info --all output in attacment LD_LIBRARY_PATH is set correctly. Any idea? marcin 2009/9/12 Marcin Stolarek> Hi, > I'm trying to compile OpenMPI with checkpoint restart via BLCR. I'm not > sure which path shoul I set as a value of --with-blcr option. > I'm using 1.3.3 release, which version of BLCR should I use? > > I've compiled the newest version of BLCR with --prefix=$BLCR, and I've > putten as a option to openmpi configure --with-blcr=$BLCR, but I recived: > > > configure:76646: checking if MCA component crs:blcr can compile > configure:76648: result: no > > marcin > > > > > info.tar.gz Description: GNU Zip compressed data