Re: [OMPI users] Live process migration
thanks Josh. On Thu, Dec 13, 2012 at 4:20 AM, Josh Hursey wrote: > ompi-migrate is not in the 1.6 release. It is only available in the Open > MPI trunk. > > > On Tue, Dec 11, 2012 at 8:04 PM, Ifeanyi wrote: > >> Hi Josh, >> >> I can checkpoint but cannot migrate. >> >> when I type ~openmpi-1.6# ompi-migrate ... I got this problem >> bash: ompi-migrate: command not found >> >> Please assist. >> >> Regards - Ifeanyi >> >> >> >> On Wed, Dec 12, 2012 at 3:19 AM, Josh Hursey wrote: >> >>> Process migration was implemented in Open MPI and working in the trunk a >>> couple of years ago. It has not been well maintained for a few years though >>> (hopefully that will change one day). So you can try it, but your results >>> may vary. >>> >>> Some details are at the link below: >>> http://osl.iu.edu/research/ft/ompi-cr/tools.php#ompi-migrate >>> >>> On Mon, Dec 10, 2012 at 10:39 PM, Ifeanyi wrote: >>> >>>> Hi all, >>>> >>>> Just wondering if live process migration of processes is supported in >>>> open mpi? >>>> >>>> or any idea of how to do live migration of processes pls. >>>> >>>> Regards, >>>> Ifeanyi >>>> >>>> ___ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>> >>> >>> >>> -- >>> Joshua Hursey >>> Assistant Professor of Computer Science >>> University of Wisconsin-La Crosse >>> http://cs.uwlax.edu/~jjhursey >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > > -- > Joshua Hursey > Assistant Professor of Computer Science > University of Wisconsin-La Crosse > http://cs.uwlax.edu/~jjhursey > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] openmpi-1.7rc5 cannot install when build with ./configure --with-ft=cr
thanks Ralph. On Thu, Dec 13, 2012 at 1:54 AM, Ralph Castain wrote: > The checkpoint/restart code in the 1.7 branch is almost certainly broken > as the developer/maintainer of that code graduated and left for a colder > climate. We do not yet have someone to take their place, so the future of > that capability is somewhat in doubt. > > Afraid you'll have to stick with the 1.6 series for now. > > On Dec 12, 2012, at 12:38 AM, Ifeanyi wrote: > > > Hi all, > > > > I am having trouble building openmpi-1.7rc5 with ../configure > --with-ft=cr > > > > openmpi-1.7rc5# ./configure --with-ft=cr > > openmpi-1.7rc5# make all install > > > > error message: > > base/errmgr_base_fns.c:565:13: warning: ignoring return value of > 'asprintf', declared with attribute warn_unused_result [-Wunused-result] > > base/errmgr_base_fns.c: In function 'orte_errmgr_base_migrate_state_str': > > base/errmgr_base_fns.c:384:17: warning: ignoring return value of > 'asprintf', declared with attribute warn_unused_result [-Wunused-result] > > base/errmgr_base_fns.c: In function 'orte_errmgr_base_abort': > > base/errmgr_base_fns.c:244:18: warning: ignoring return value of > 'vasprintf', declared with attribute warn_unused_result [-Wunused-result] > > make[2]: *** [base/errmgr_base_fns.lo] Error 1 > > make[2]: Leaving directory > `/home/abolap/Downloads/openmpi-1.7rc5/orte/mca/errmgr' > > make[1]: *** [all-recursive] Error 1 > > make[1]: Leaving directory `/home/abolap/Downloads/openmpi-1.7rc5/orte' > > make: *** [all-recursive] Error 1 > > > > It install successfully when fault tolerance is not enabled on the build. > > > > Pls help. > > > > Regards - Ifeanyi > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] openmpi-1.7rc5 cannot install when build with ./configure --with-ft=cr
Hi all, I am having trouble building openmpi-1.7rc5 with ../configure --with-ft=cr openmpi-1.7rc5# ./configure --with-ft=cr openmpi-1.7rc5# make all install error message: base/errmgr_base_fns.c:565:13: warning: ignoring return value of 'asprintf', declared with attribute warn_unused_result [-Wunused-result] base/errmgr_base_fns.c: In function 'orte_errmgr_base_migrate_state_str': base/errmgr_base_fns.c:384:17: warning: ignoring return value of 'asprintf', declared with attribute warn_unused_result [-Wunused-result] base/errmgr_base_fns.c: In function 'orte_errmgr_base_abort': base/errmgr_base_fns.c:244:18: warning: ignoring return value of 'vasprintf', declared with attribute warn_unused_result [-Wunused-result] make[2]: *** [base/errmgr_base_fns.lo] Error 1 make[2]: Leaving directory `/home/abolap/Downloads/openmpi-1.7rc5/orte/mca/errmgr' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/abolap/Downloads/openmpi-1.7rc5/orte' make: *** [all-recursive] Error 1 It install successfully when fault tolerance is not enabled on the build. Pls help. Regards - Ifeanyi
Re: [OMPI users] Live process migration
this is what I did after installing dmtcp on a different terminal # ./dmtcp_coordinator on another terminal # ./dmtcp_checkpoint mpirun ./icpi When I ran this command # ./dmtcp_command --checkpoint. it terminates with this messages [8147] WARNING at connectionmanager.cpp:263 in fdToDevice; REASON='JWARNING(false) failed' Message: PTS Device not found [8147] ERROR at connectionmanager.cpp:277 in fdToDevice; REASON='JASSERT(false) failed' fd = 1 device = /dev/pts/2 (deleted) Message: PTS Device not found in connection list icpi (8147): Terminating... pls what is the way to migrate process with dmtcp? please assist Regards - Ifeanyi On Tue, Dec 11, 2012 at 5:10 PM, Jaroslaw Slawinski wrote: > true, looks like the entire sourceforge is down > best > js > > On Tue, Dec 11, 2012 at 12:57 AM, Ifeanyi wrote: > > Thanks Jaroslaw, > > > > I will try it asap, it appears that DMTCP at sourceforge.net site is > down at > > the moment. > > > > Regards - ifeanyi > > > > > > On Tue, Dec 11, 2012 at 4:11 PM, Jaroslaw Slawinski > > wrote: > >> > >> check DMTCP - it worked for me > >> js > >> > >> On Mon, Dec 10, 2012 at 11:39 PM, Ifeanyi > wrote: > >> > Hi all, > >> > > >> > Just wondering if live process migration of processes is supported in > >> > open > >> > mpi? > >> > > >> > or any idea of how to do live migration of processes pls. > >> > > >> > Regards, > >> > Ifeanyi > >> > > >> > ___ > >> > users mailing list > >> > us...@open-mpi.org > >> > http://www.open-mpi.org/mailman/listinfo.cgi/users > >> ___ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Live process migration
Hi Josh, I can checkpoint but cannot migrate. when I type ~openmpi-1.6# ompi-migrate ... I got this problem bash: ompi-migrate: command not found Please assist. Regards - Ifeanyi On Wed, Dec 12, 2012 at 3:19 AM, Josh Hursey wrote: > Process migration was implemented in Open MPI and working in the trunk a > couple of years ago. It has not been well maintained for a few years though > (hopefully that will change one day). So you can try it, but your results > may vary. > > Some details are at the link below: > http://osl.iu.edu/research/ft/ompi-cr/tools.php#ompi-migrate > > On Mon, Dec 10, 2012 at 10:39 PM, Ifeanyi wrote: > >> Hi all, >> >> Just wondering if live process migration of processes is supported in >> open mpi? >> >> or any idea of how to do live migration of processes pls. >> >> Regards, >> Ifeanyi >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > > -- > Joshua Hursey > Assistant Professor of Computer Science > University of Wisconsin-La Crosse > http://cs.uwlax.edu/~jjhursey > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Live process migration
Thanks Josh. I will give a go. Regards - Ifeanyi On Wed, Dec 12, 2012 at 3:19 AM, Josh Hursey wrote: > Process migration was implemented in Open MPI and working in the trunk a > couple of years ago. It has not been well maintained for a few years though > (hopefully that will change one day). So you can try it, but your results > may vary. > > Some details are at the link below: > http://osl.iu.edu/research/ft/ompi-cr/tools.php#ompi-migrate > > On Mon, Dec 10, 2012 at 10:39 PM, Ifeanyi wrote: > >> Hi all, >> >> Just wondering if live process migration of processes is supported in >> open mpi? >> >> or any idea of how to do live migration of processes pls. >> >> Regards, >> Ifeanyi >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > > -- > Joshua Hursey > Assistant Professor of Computer Science > University of Wisconsin-La Crosse > http://cs.uwlax.edu/~jjhursey > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Live process migration
Thanks Jaroslaw, I will try it asap, it appears that DMTCP at sourceforge.net site is down at the moment. Regards - ifeanyi On Tue, Dec 11, 2012 at 4:11 PM, Jaroslaw Slawinski wrote: > check DMTCP - it worked for me > js > > On Mon, Dec 10, 2012 at 11:39 PM, Ifeanyi wrote: > > Hi all, > > > > Just wondering if live process migration of processes is supported in > open > > mpi? > > > > or any idea of how to do live migration of processes pls. > > > > Regards, > > Ifeanyi > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] Live process migration
Hi all, Just wondering if live process migration of processes is supported in open mpi? or any idea of how to do live migration of processes pls. Regards, Ifeanyi
[OMPI users] MCA crs: none (MCA v2.0, API v2.0, Component v1.6.3)
Hi all, I got this message when I issued this command: root@node1:/home/abolap# ompi_info | grep crs MCA crs: none (MCA v2.0, API v2.0, Component v1.6.3) The installation looks okay and I have reinstalled but still got the same issue. When I searched for the solution I found out that this is a bug which Josh has filed (https://svn.open-mpi.org/trac/ompi/ticket/2097) but I cannot see the solution or workaround. This is the initial post - http://www.digipedia.pl/usenet/thread/11269/6087/#post6031 Please assist. Regards, Ifeanyi
Re: [OMPI users] checkpointing of NPB
Thanks Josh for the reply. I will try patching, and possibly other benchmarking. Regards, Ifeanyi On Wed, Jun 20, 2012 at 11:51 PM, Josh Hursey wrote: > Ifeanyi, > > I am usually the one that responds to checkpoint/restart questions, > but unfortunately I do not have time to look into this issue at the > moment (and probably won't for at least a few more months). There are > a few other developers that work on the checkpoint/restart > functionality that might be able to more immediately help you. > Hopefully they will chime in. > > At one point in time (about a year ago) I was able to > checkpoint/restart the NAS benchmarks (and other applications) without > issue. From the error message that you posted earlier, it seems that > something has broken in the 1.6 branch. Unfortunately, I do not have > any advice on an alternative branch to try. The C/R functionality in > the Open MPI trunk is known to be broken. There is a patch for the > trunk making its way through testing at the moment. Once that is > committed then you should be able to use the Open MPI trunk until > someone fixes the 1.6 branch. > > Sorry I cannot be of much help. Hopefully others can assist. > > -- Josh > > On Tue, Jun 19, 2012 at 1:22 AM, Ifeanyi wrote: > > Dear, > > > > Please help. > > > > I configured the open mpi and it can checkpoint HPL. > > > > However, whenever I want to checkpoint NAS parallel benchmark it kills > the > > application without informative message. > > > > Please how do I configure the openmpi 1.6 to checkpoint NPB? I really > need a > > help, I have been on this issue for the past few days without solution > > > > Regards, > > Ifeanyi > > > > > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > -- > Joshua Hursey > Postdoctoral Research Associate > Oak Ridge National Laboratory > http://users.nccs.gov/~jjhursey > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] checkpointing of NPB
Dear, Please help. I configured the open mpi and it can checkpoint HPL. However, whenever I want to checkpoint NAS parallel benchmark it kills the application without informative message. Please how do I configure the openmpi 1.6 to checkpoint NPB? I really need a help, I have been on this issue for the past few days without solution Regards, Ifeanyi
[OMPI users] checkpointing
Hi Please help. I have installed openmpi-1.6, I have also tested the installation with different mpi applications and my application executed successfully. Whenever I ran NPB-3.3 LU without checkpointing, NPB-3.3 completes successfully. however whenever I checkpointing the application, it aborts without checkpointing with the following error "mpirun noticed that process rank 1 with PID 1048 on node node1 exited on signal 10 (User defined signal 1). -- 2 total processes killed (some possibly by mpirun during cleanup)" However, when I ran HPL and checkpoint - checkpointing was successfully completed as well as the application. I have tried to resolved this without success. Please I need assistance - I am new user of open mpi. Regards, Ifeanyi
Re: [OMPI users] checkpointing/restart of hpl
Thanks Constantinos. I have gone through the site you sent to me, however whenever I want to enable FT, it will not be enabled. this is what I got. /openmpi-1.6# ./configure --enable-ft-thread --with-ft=cr --with-blcr=/usr/src/blcr-0.8.2 #FT Checkpoint support: no (checkpoint thread: no) "FT Checkpoint support: no (checkpoint thread: no)" Please is there a special way to enable the FT? I also want test with real application that runs for about 30 mins but I cannot easily lay my hands on any. Please help. Thanks in advance. Regards, Ifeanyi On Tue, Jun 5, 2012 at 1:44 AM, Constantinos Makassikis < cmakassi...@gmail.com> wrote: > Hi, > > you may start by looking at http://www.open-mpi.org/faq/?category=ft > which leads you to > https://svn.open-mpi.org/trac/ompi/wiki/ProcessFT_CR > and > http://osl.iu.edu/research/ft/ompi-cr/ > > The latter is the most up-to-date link and probably what you are looking > for. > > > HTH, > > -- > Constantinos > > On Mon, Jun 4, 2012 at 3:24 AM, Ifeanyi wrote: > >> Dear, >> >> I am a new user of open mpi. I have already installed openmpi and build >> hpl. I want to checkpoint/restart hpl and compare its performance >> >> Please can you point me to a useful link that will guide me through this >> checkpointing of hpl. >> >> thanks in advance. >> >> Regards, >> Ifeanyi >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] checkpointing/restart of hpl
Dear, I am a new user of open mpi. I have already installed openmpi and build hpl. I want to checkpoint/restart hpl and compare its performance Please can you point me to a useful link that will guide me through this checkpointing of hpl. thanks in advance. Regards, Ifeanyi