Hi Kapil, we've set up an account on our cluster (which uses Torque-4) so that 
Artem can take a look at this. 

Thanks, 
Bryan 

----- Original Message -----



Hi Bryan, 


Sorry for not replying earlier. I was travelling at the time and then missed 
the email. Is this still an issue? 


Thanks, 
Kapil 



On Thu, Oct 24, 2013 at 2:27 PM, Bryan F Putnam < [email protected] > wrote: 


Hi All, 

I've been successful in using DMTCP-2.0 to start, checkpoint, and restart a 
multi-process mpich-3 job when only a single host is involved. However if there 
is more than one node involved, I'm seeing the following message, and the job 
doesn't start. I'm sure there is something simple that I'm missing. 

In this case for example, the $PBS_NODEFILE contains entries for two distinct 
nodes: 

carter-a069 
carter-a069 
carter-a068 
carter-a068 

Any help appreciated! 
Thanks, 
Bryan 


[Thu Oct 24 14:20:10 bfp@carter-a069:~/demos/MPI ] > dmtcp_launch mpiexec -f 
$PBS_NODEFILE -np 4 ./matmat2 
dmtcp_launch (DMTCP + MTCP) version 2.0 
Copyright (C) 2006-2013 Jason Ansel, Michael Rieker, Kapil Arya, and 
Gene Cooperman 
This program comes with ABSOLUTELY NO WARRANTY. 
This is free software, and you are welcome to redistribute it 
under certain conditions; see COPYING file for details. 
(Use flag "-q" to hide this message.) 

dmtcp_coordinator starting... 
Host: carter-a069.rcac.purdue.edu (172.18.80.109) 
Port: 7779 
Checkpoint Interval: disabled (checkpoint manually instead) 
Exit on last client: 1 
Backgrounding... 
[42000] NOTE at ssh.cpp:319 in prepareForExec; REASON='New ssh command' 
newCommand = /home/bfp/dmtcp/dmtcp-2.0/dmtcp/src/../../bin/dmtcp_ssh 
dmtcp_nocheckpoint /usr/bin/ssh -x carter-a068 dmtcp_launch --ssh-slave --host 
carter-a069.rcac.purdue.edu --port 7779 --ckptdir /home/bfp/demos/MPI 
dmtcp_sshd "/apps/rhel6/mpich/3.0.4_intel-13.1.1.163/bin/hydra_pmi_proxy" 
--control-port carter-a069:60562 --rmk pbs --launcher ssh --demux poll --pgid 0 
--retries 10 --usize -2 --proxy-id 1 

------------------------------------------------------------------------------ 
October Webinars: Code for Performance 
Free Intel webinars can help you accelerate application performance. 
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register > 
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk 
_______________________________________________ 
Dmtcp-forum mailing list 
[email protected] 
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum 


------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most 
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to