Hi All,

I've been successful in using DMTCP-2.0 to start, checkpoint, and restart a 
multi-process mpich-3 job when only a single host is involved. However if there 
is more than one node involved, I'm seeing the following message, and the job 
doesn't start. I'm sure there is something simple that I'm missing.

In this case for example, the $PBS_NODEFILE contains entries for two distinct 
nodes:

carter-a069
carter-a069
carter-a068
carter-a068

Any help appreciated!
Thanks,
Bryan


[Thu Oct 24 14:20:10 bfp@carter-a069:~/demos/MPI ] > dmtcp_launch mpiexec -f 
$PBS_NODEFILE -np 4 ./matmat2
dmtcp_launch (DMTCP + MTCP) version 2.0
Copyright (C) 2006-2013  Jason Ansel, Michael Rieker, Kapil Arya, and
                                                       Gene Cooperman
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under certain conditions; see COPYING file for details.
(Use flag "-q" to hide this message.)

dmtcp_coordinator starting...
    Host: carter-a069.rcac.purdue.edu (172.18.80.109)
    Port: 7779
    Checkpoint Interval: disabled (checkpoint manually instead)
    Exit on last client: 1
Backgrounding...
[42000] NOTE at ssh.cpp:319 in prepareForExec; REASON='New ssh command'
     newCommand = /home/bfp/dmtcp/dmtcp-2.0/dmtcp/src/../../bin/dmtcp_ssh 
dmtcp_nocheckpoint /usr/bin/ssh -x carter-a068 dmtcp_launch --ssh-slave --host 
carter-a069.rcac.purdue.edu --port 7779 --ckptdir /home/bfp/demos/MPI 
dmtcp_sshd "/apps/rhel6/mpich/3.0.4_intel-13.1.1.163/bin/hydra_pmi_proxy" 
--control-port carter-a069:60562 --rmk pbs --launcher ssh --demux poll --pgid 0 
--retries 10 --usize -2 --proxy-id 1 

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to