Hi Bryan,
Sorry for not replying earlier. I was travelling at the time and then
missed the email. Is this still an issue?
Thanks,
Kapil
On Thu, Oct 24, 2013 at 2:27 PM, Bryan F Putnam <[email protected]> wrote:
> Hi All,
>
> I've been successful in using DMTCP-2.0 to start, checkpoint, and restart
> a multi-process mpich-3 job when only a single host is involved. However if
> there is more than one node involved, I'm seeing the following message, and
> the job doesn't start. I'm sure there is something simple that I'm missing.
>
> In this case for example, the $PBS_NODEFILE contains entries for two
> distinct nodes:
>
> carter-a069
> carter-a069
> carter-a068
> carter-a068
>
> Any help appreciated!
> Thanks,
> Bryan
>
>
> [Thu Oct 24 14:20:10 bfp@carter-a069:~/demos/MPI ] > dmtcp_launch mpiexec
> -f $PBS_NODEFILE -np 4 ./matmat2
> dmtcp_launch (DMTCP + MTCP) version 2.0
> Copyright (C) 2006-2013 Jason Ansel, Michael Rieker, Kapil Arya, and
> Gene Cooperman
> This program comes with ABSOLUTELY NO WARRANTY.
> This is free software, and you are welcome to redistribute it
> under certain conditions; see COPYING file for details.
> (Use flag "-q" to hide this message.)
>
> dmtcp_coordinator starting...
> Host: carter-a069.rcac.purdue.edu (172.18.80.109)
> Port: 7779
> Checkpoint Interval: disabled (checkpoint manually instead)
> Exit on last client: 1
> Backgrounding...
> [42000] NOTE at ssh.cpp:319 in prepareForExec; REASON='New ssh command'
> newCommand = /home/bfp/dmtcp/dmtcp-2.0/dmtcp/src/../../bin/dmtcp_ssh
> dmtcp_nocheckpoint /usr/bin/ssh -x carter-a068 dmtcp_launch --ssh-slave
> --host carter-a069.rcac.purdue.edu --port 7779 --ckptdir
> /home/bfp/demos/MPI dmtcp_sshd
> "/apps/rhel6/mpich/3.0.4_intel-13.1.1.163/bin/hydra_pmi_proxy"
> --control-port carter-a069:60562 --rmk pbs --launcher ssh --demux poll
> --pgid 0 --retries 10 --usize -2 --proxy-id 1
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
> _______________________________________________
> Dmtcp-forum mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>
------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum