Hi Kapil, we've set up an account on our cluster (which uses Torque-4) so that
Artem can take a look at this.
Thanks,
Bryan
----- Original Message -----
Hi Bryan,
Sorry for not replying earlier. I was travelling at the time and then missed
the email. Is this still an issue?
Thanks,
Kapil
On Thu, Oct 24, 2013 at 2:27 PM, Bryan F Putnam < [email protected] > wrote:
Hi All,
I've been successful in using DMTCP-2.0 to start, checkpoint, and restart a
multi-process mpich-3 job when only a single host is involved. However if there
is more than one node involved, I'm seeing the following message, and the job
doesn't start. I'm sure there is something simple that I'm missing.
In this case for example, the $PBS_NODEFILE contains entries for two distinct
nodes:
carter-a069
carter-a069
carter-a068
carter-a068
Any help appreciated!
Thanks,
Bryan
[Thu Oct 24 14:20:10 bfp@carter-a069:~/demos/MPI ] > dmtcp_launch mpiexec -f
$PBS_NODEFILE -np 4 ./matmat2
dmtcp_launch (DMTCP + MTCP) version 2.0
Copyright (C) 2006-2013 Jason Ansel, Michael Rieker, Kapil Arya, and
Gene Cooperman
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under certain conditions; see COPYING file for details.
(Use flag "-q" to hide this message.)
dmtcp_coordinator starting...
Host: carter-a069.rcac.purdue.edu (172.18.80.109)
Port: 7779
Checkpoint Interval: disabled (checkpoint manually instead)
Exit on last client: 1
Backgrounding...
[42000] NOTE at ssh.cpp:319 in prepareForExec; REASON='New ssh command'
newCommand = /home/bfp/dmtcp/dmtcp-2.0/dmtcp/src/../../bin/dmtcp_ssh
dmtcp_nocheckpoint /usr/bin/ssh -x carter-a068 dmtcp_launch --ssh-slave --host
carter-a069.rcac.purdue.edu --port 7779 --ckptdir /home/bfp/demos/MPI
dmtcp_sshd "/apps/rhel6/mpich/3.0.4_intel-13.1.1.163/bin/hydra_pmi_proxy"
--control-port carter-a069:60562 --rmk pbs --launcher ssh --demux poll --pgid 0
--retries 10 --usize -2 --proxy-id 1
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum