Hi Robert,
    Thanks for writing.  It's not obvious to me what's happening.
But here's a quick question, for diagnosing it.
After starting the coordinator, could you run:
   lsof | grep dmtcp_coo
Alternatively, could you try:  lsof | grep <PORT_NUM>
where PORT_NUM is the supposed port number of the coordinator?

Let's verify that the coordinator is truly listening on the port
that it says it is.

Kapil,
    Could you please check in your code with the --port-file option?
Then we can make sure that we're all testing a common source, and there
is no issue about different versions.
    Also, I presume you've already tested something similar to what
Robert is doing below.  Is that correct?

Thanks,
- Gene

On Wed, Jun 12, 2013 at 05:17:26PM -0400, Robert William Leach wrote:
> Hi,
> 
> For the life of me, I cannot figure out why, when I run dmtcp_checkpoint, I 
> get an error about not being able to connect to the coordinator.  Here are 
> snippets from my script - it's all in 1 script - and the output I get from 
> each of these commands.  Help?
> 
> dmtcp_coordinator --port 0 --background --exit-on-last --port-file 
> /panfs/panfs.ccr.buffalo.edu/projects/ccrstaff/rwleach/PROJECT/CRPC/MACS/LNCaP_control-LNCaP_input_peaks.bed.pad150.formeme.memeout-8cores.port
>  --ckptdir 
> /panfs/panfs.ccr.buffalo.edu/projects/ccrstaff/rwleach/PROJECT/CRPC/MACS/LNCaP_control-LNCaP_input_peaks.bed.pad150.formeme.memeout-8cores.ckpt1
>  --tmpdir /panasas/scratch/rwleach/tmp
> 
> dmtcp_coordinator starting...
>     Port: 34511
>     Checkpoint Interval: disabled (checkpoint manually instead)
>     Exit on last client: 1
> The port number was written to file 
> (/panfs/panfs.ccr.buffalo.edu/projects/ccrstaff/rwleach/PROJECT/CRPC/MACS/LNCaP_control-LNCaP_input_peaks.bed.pad150.formeme.memeout-8cores.port)
> Backgrounding...
> 
> dmtcp_checkpoint --no-gzip --join --port 34511 --tmpdir 
> /panasas/scratch/rwleach/tmp --ckptdir 
> /panfs/panfs.ccr.buffalo.edu/projects/ccrstaff/rwleach/PROJECT/CRPC/MACS/LNCaP_control-LNCaP_input_peaks.bed.pad150.formeme.memeout-8cores.ckpt1
>  --quiet /util/meme/4.6.0/bin/meme.bin 
> LNCaP_control-LNCaP_input_peaks.bed.pad150.formeme -dna -mod zoops -minw 6 
> -maxw 25 -revcomp -nostatus -p 8 -o 
> LNCaP_control-LNCaP_input_peaks.bed.pad150.formeme.memepeak150-8cores 
> -maxsize 30000000 1> 
> /panfs/panfs.ccr.buffalo.edu/projects/ccrstaff/rwleach/PROJECT/CRPC/MACS/LNCaP_control-LNCaP_input_peaks.bed.pad150.formeme.memeout-8cores
>  2> 
> /panfs/panfs.ccr.buffalo.edu/projects/ccrstaff/rwleach/PROJECT/CRPC/MACS/LNCaP_control-LNCaP_input_peaks.bed.pad150.formeme.memeout-8cores.err
>  &
> 
> [15030] ERROR at dmtcpcoordinatorapi.cpp:81 in 
> createNewConnectionToCoordinator; REASON='JASSERT(fd.isValid()) failed'
>      coordinatorAddr = d06n40b.ccr.buffalo.edu
>      coordinatorPort = 34511
> Message: Failed to connect to DMTCP coordinator
> meme.bin (15030): Terminating...
> 
> env | grep DMTCP
> 
> DMTCP_HOST=d06n40b.ccr.buffalo.edu
> DMTCP=/util/dmtcp/1.2.7
> DMTCP_CHECKPOINT_DIR=/panfs/panfs.ccr.buffalo.edu/projects/ccrstaff/rwleach/PROJECT/CRPC/MACS/LNCaP_control-LNCaP_input_peaks.bed.pad150.formeme.memeout-8cores.
> ckpt1
> DMTCP_GZIP=0
> DMTCP_TMPDIR=/panasas/scratch/rwleach/tmp
> 
> 
> http://SwingBuffalo.com/
> - Phone Swing Buffalo or sign up for our email list via the contact page on 
> our website!
> http://RhythmShuffle.com/
> http://LindyFix.com/
> 
> 

> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
> 
> Build for Windows Store.
> 
> http://p.sf.net/sfu/windows-dev2dev

> _______________________________________________
> Dmtcp-forum mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum


------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to