dear all,

I have tried to checkpoint mpi application using dmtcp but I failed with
the error message as follows :


[40000] WARNING at kernelbufferdrainer.cpp:124 in onTimeoutInterval;
REASON='JWARNING(false) failed'
     _dataSockets[i]->socket().sockfd() = 9
     buffer.size() = 0
     WARN_INTERVAL_SEC = 10
Message: Still draining socket... perhaps remote host is not running under
DMTCP?
[40000] WARNING at kernelbufferdrainer.cpp:124 in onTimeoutInterval;
REASON='JWARNING(false) failed'
     _dataSockets[i]->socket().sockfd() = 7
     buffer.size() = 0
     WARN_INTERVAL_SEC = 10
Message: Still draining socket... perhaps remote host is not running under
DMTCP?
......
......
......

I use this sbatch script to submit job :

#####################################SBATCH###########################
#!/bin/bash
# Put your SLURM options here
#SBATCH --partition=comeon
#SBATCH --time=01:15:00
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --job-name="dmtcp_job"
#SBATCH --output=dmtcp_ckpt_img/dmtcp-%j.out

start_coordinator()
{

    fname=dmtcp_command.$SLURM_JOBID
    h=$(hostname)
    check_coordinator=$(which dmtcp_coordinator)

    if [ -z "$check_coordinator" ]; then
        echo "No dmtcp_coordinator found. Check your DMTCP installation and
PATH settings."
        exit 0
    fi

    dmtcp_coordinator --daemon --exit-on-last -p 0 --port-file $fname $@
1>/dev/null 2>&1

    p=`cat $fname`
    chmod +x $fname
    echo "#!/bin/bash" > $fname
    echo >> $fname
    echo "export PATH=$PATH" >> $fname
    echo "export DMTCP_COORD_HOST=$h" >> $fname
    echo "export DMTCP_COORD_PORT=$p" >> $fname
    echo "dmtcp_command \$@" >> $fname

    # Set up local environment for DMTCP
    export DMTCP_COORD_HOST=$h
    export DMTCP_COORD_PORT=$p
}

cd $SLURM_SUBMIT_DIR
start_coordinator -i 240
dmtcp_launch -h $h -p $p mpiexec ./mm.o

#########################################################################

I also have tried using --rm option in dmtcp_launch but it doesn't work and
no output at all.

anybody tell me how to solve this please ? I need help


Regards,



Husen
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to