Hello

I did it and still got Restart error : cannot map initial resources into the restart allocation.

Also i used openmpi 1.8.8 and got the same error msg.


On 10/06/2015 07:06 PM, Jiajun Cao wrote:
Hi,

Could you replace

dmtcp_launch --rm mpirun --mca btl self,tcp ./<your binary>

with the following:

srun dmtcp_launch --rm ./<your binary>

Also, add the following env vars to the script:

export OMPI_MCA_mtl=^psm
export OMPI_MCA_btl=self,tcp

and try again?

On Tue, Oct 6, 2015 at 4:41 PM, abderrahmane <[email protected] <mailto:[email protected]>> wrote:

    Hello
    ]Thanks for the respond.


    On 10/06/2015 02:18 PM, Jiajun Cao wrote:
    Hi,


    1. What kind of application are you running? Is there an
    integration of matlab and mpi? I'm asking because I haven't run
    any mpi-based matlab applications before.

    i just created a script that calculate fibonacci number a prints
    it out.
    2. What kind of environment are you using? Specifically, I'd like
    to know the MPI version, interconnect network type (Ethernet or
    InfiniBand), and how MPI and Slurm are integrated (i.e., in the
    cluster, what command do you use to run the application, srun or
    mpirun).

    I am using rhel7 and openmpi 1.8 inbiniband. for the slurm it is
    integrated in a cluster environment, I used the script here :
    
https://github.com/dmtcp/dmtcp/blob/master/plugin/batch-queue/job_examples/slurm_launch.job

    3. Do you get a valid checkpoint image(s)? Also, please attach
    your job scripts.
    I get the checkpoint needed but when i restart i received the
    error i sent

    Thanks

    On Tue, Oct 6, 2015 at 1:29 PM, Kapil Arya
    <[email protected] <mailto:[email protected]>> wrote:

        Jiajun, Artem,

        Can one of you take a look at this one?

        Kapil

        On Tue, Oct 6, 2015 at 12:31 PM, abderrahmane
        <[email protected] <mailto:[email protected]>> wrote:

            Hello

            Thank you for the effort and work (dmtcp), I do have some
            questions:
            ( P.S :I run my matlab code using --rm mpirun and slurm.)

            1- is there a good way to run matlab code? I created a
            bash file in
            added the following :
                 matlab -nojvm < file.m

            2- running the code above with dmtcp and matlab worked
            fine, but when i
            tried to restart the code using slurm_restart.job code
            from your github
            and using --rm mpirun , I received the following error:

            restart error: cannot map initial resources into the
            restart allocation.
            Allocated resources : *nodex:4 nodey:4

            any ideas? please feel free to ask me more questions.

            best regards;

            
------------------------------------------------------------------------------
            _______________________________________________
            Dmtcp-forum mailing list
            [email protected]
            <mailto:[email protected]>
            https://lists.sourceforge.net/lists/listinfo/dmtcp-forum






------------------------------------------------------------------------------
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to