Hi Rob,

Thanks for writing to us. Looks like you have hit a corner case in the SSH
plugin where it is failing to recognize some of the arguments passed on to
the ssh command. I have added temporary support to enable us to go past
this error, but it still might have trouble with restart. If possible,
could you try out the latest DMTCP master from github and see if it works
for you?

Here are the commands to do that:

git clone https://github.com/dmtcp/dmtcp.git
cd dmtcp && ./configure && make

PS: I have also created an issue on github for tracking:
https://github.com/dmtcp/dmtcp/issues/133

Best,
Kapil


On Tue, Jun 30, 2015 at 12:11 PM, Kalescky, Robert John-Brown <
[email protected]> wrote:

>  Hello DMTCP forum,
>
>  I’m trying to checkpoint a Gaussian calculation using Linda, the message
> passing system they use. This test job is using two nodes connected with
> InfiniBand and is run through SLURM. I’m using the start_coordinator script
> as suggested. The job is started via:
>
>  start_coordinator -i 43200
>  dmtcp_launch --with-plugin libdmtcp_infiniband.so --batch-queue —infini
> band --ckpt-open-files --modify-env g09_linda PD1_zmat.g09
>
>  The job then crashes shortly after starting with:
>
>  [53000] WARNING at socketconnection.cpp:187 in TcpConnection;
> REASON='JWARNING(false) failed'
>      type = 2
> Message: Datagram Sockets not supported. Hopefully, this is a short lived
> connection!
> [53000] WARNING at socketconnection.cpp:187 in TcpConnection;
> REASON='JWARNING(false) failed'
>      type = 2
> Message: Datagram Sockets not supported. Hopefully, this is a short lived
> connection!
> [53000] WARNING at socketconnection.cpp:187 in TcpConnection;
> REASON='JWARNING(false) failed'
>      type = 2
> Message: Datagram Sockets not supported. Hopefully, this is a short lived
> connection!
> [55000] ERROR at ssh.cpp:277 in prepareForExec;
> REASON='JASSERT(commandStart < nargs && argv[commandStart][0] != '-')
> failed'
>      commandStart = 3
>      nargs = 33
>      argv[commandStart] = -n
> Message: failed to parse ssh command line
> bash (55000): Terminating...
>
>  Any thoughts or suggestions are greatly appreciated.
>
>  Best regards,
> Rob
>
> * Robert Kalescky, Ph.D.*
>  HPC Applications Scientist
>  Center for Scientific Computation
>  Southern Methodist University
>  Perkins Administration 101H
>  Office: 214-768-2030
>  www.smu.edu/csc
>
>
>
>
>
> ------------------------------------------------------------------------------
> Don't Limit Your Business. Reach for the Cloud.
> GigeNET's Cloud Solutions provide you with the tools and support that
> you need to offload your IT needs and focus on growing your business.
> Configured For All Businesses. Start Your Cloud Today.
> https://www.gigenetcloud.com/
> _______________________________________________
> Dmtcp-forum mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>
>
------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to