Hello
I did it and still got Restart error : cannot map initial resources into
the restart allocation.
Also i used openmpi 1.8.8 and got the same error msg.
On 10/06/2015 07:06 PM, Jiajun Cao wrote:
Hi,
Could you replace
dmtcp_launch --rm mpirun --mca btl self,tcp ./<your binary>
with the following:
srun dmtcp_launch --rm ./<your binary>
Also, add the following env vars to the script:
export OMPI_MCA_mtl=^psm
export OMPI_MCA_btl=self,tcp
and try again?
On Tue, Oct 6, 2015 at 4:41 PM, abderrahmane <[email protected]
<mailto:[email protected]>> wrote:
Hello
]Thanks for the respond.
On 10/06/2015 02:18 PM, Jiajun Cao wrote:
Hi,
1. What kind of application are you running? Is there an
integration of matlab and mpi? I'm asking because I haven't run
any mpi-based matlab applications before.
i just created a script that calculate fibonacci number a prints
it out.
2. What kind of environment are you using? Specifically, I'd like
to know the MPI version, interconnect network type (Ethernet or
InfiniBand), and how MPI and Slurm are integrated (i.e., in the
cluster, what command do you use to run the application, srun or
mpirun).
I am using rhel7 and openmpi 1.8 inbiniband. for the slurm it is
integrated in a cluster environment, I used the script here :
https://github.com/dmtcp/dmtcp/blob/master/plugin/batch-queue/job_examples/slurm_launch.job
3. Do you get a valid checkpoint image(s)? Also, please attach
your job scripts.
I get the checkpoint needed but when i restart i received the
error i sent
Thanks
On Tue, Oct 6, 2015 at 1:29 PM, Kapil Arya
<[email protected] <mailto:[email protected]>> wrote:
Jiajun, Artem,
Can one of you take a look at this one?
Kapil
On Tue, Oct 6, 2015 at 12:31 PM, abderrahmane
<[email protected] <mailto:[email protected]>> wrote:
Hello
Thank you for the effort and work (dmtcp), I do have some
questions:
( P.S :I run my matlab code using --rm mpirun and slurm.)
1- is there a good way to run matlab code? I created a
bash file in
added the following :
matlab -nojvm < file.m
2- running the code above with dmtcp and matlab worked
fine, but when i
tried to restart the code using slurm_restart.job code
from your github
and using --rm mpirun , I received the following error:
restart error: cannot map initial resources into the
restart allocation.
Allocated resources : *nodex:4 nodey:4
any ideas? please feel free to ask me more questions.
best regards;
------------------------------------------------------------------------------
_______________________________________________
Dmtcp-forum mailing list
[email protected]
<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
------------------------------------------------------------------------------
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum