Nitinder, Can you try by specifying --host for dmtcp_restart as well? Next thing would then be to drop the --join flag.
Finally, what DMTCP version are you using? I would encourage to checkout our git repository and try with that. Here are the commands: git clone https://github.com/dmtcp/dmtcp.git dmtcp-git cd dmtcp-git ./configure && make Kapil On Mon, Nov 3, 2014 at 11:32 AM, Nitinder Mohan <[email protected]> wrote: > Dear All, > > I am trying to use DMTCP and still learning to use it. I want to > checkpoint across multiple nodes using IP addresses. I am starting small, > with only two nodes to checkpoint. The application that I am trying to > checkpoint is sample app "dmtcp1". This is what I have done so far: > > 1. dmtcp_coordinator is running on one of the nodes. > > 2. Local Node [Node 1] is connected to coordinator using following command: > dmtcp_checkpoint --host 127.0.1.1 --port 7779 test/dmtcp1 > (as coordinator started on host address 127.0.1.1) > > 3. Remote Node [Node 2] is connected to coordinator using command: > dmtcp_checkpoint --host 192.168.32.192 --port 7779 > test/dmtcp1 > (as coordinator machine's IP address > is 192.168.32.192 ) > > 4. Both the machines are connected to coordinator and counting. > > 5. Stop Node1 and Node 2 (Note that coordinator is still up and running) > > Now, the problem comes into play when restarting: > > *Step 1:* Restart on Node 1 using command: > dmtcp_restart --join > ckpt_dmtcp1_16886b7f9e541c55-40000-5457a7f2.dmtcp > (Note the join flag for joining to running coordinator) > > This is the output shown: > > [2766] ERROR at coordinatorapi.cpp:567 in sendRecvHandshake; > REASON='JASSERT(msg.type == DMT_ACCEPT) failed' > dmtcp_restart (2766): Terminating... > > *Step 2: *Restart on Node 2 using command: > dmtcp_restart --join ckpt_dmtcp1_16886b7f9e541c55-41000-5457f5f2.dmtcp > > This is the output I get: > > dmtcp_coordinator starting... > Host: iiitd-HP-Compaq-8200-Elite-MT-PC (127.0.1.1) > Port: 7779 > Checkpoint Interval: disabled (checkpoint manually instead) > Exit on last client: 1 > > I am pretty sure I am missing something small and trivial. > > Any help will be deeply appreciated. > > Thanks and Regards > > Nitinder Mohan > MTech (CE) IIIT Delhi > http://home.iiitd.edu.in/~nitinder1369/ >
------------------------------------------------------------------------------
_______________________________________________ Dmtcp-forum mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
