P.S. no way to avoid that for now and near future IMO. 2015-10-09 17:01 GMT+03:00 Artem Polyakov <[email protected]>:
> You don't need "exact" allocation in terms of nodenames but you do need to > remember how many nodes and how many procs per node you had in original > allocation. > > 2015-10-09 16:39 GMT+03:00 MR.AB <[email protected]>: > >> Hey >> Thank you for the email, is there a way to make it work or i have tot >> have variables to "remember" the exact allocations? >> >> >> >> On Friday, October 9, 2015 4:34 AM, Artem Polyakov <[email protected]> >> wrote: >> >> >> Hello, >> Please note, that one of the reasons may be non-equivalent allocations. >> DMTCP cannot restore processes that was originally running on the same node >> to be on different nodes. This means that if you originally requested the >> following allocation: cn[0-1], ppn = 4 >> and trying to restart on cn[0-4], ppn = 2 >> this won't work even though the allocations are logically equivalent. >> >> 2015-10-08 16:00 GMT+03:00 abderrahmane <[email protected]>: >> >> Hello >> >> I did it and still got Restart error : cannot map initial resources into >> the restart allocation. >> >> Also i used openmpi 1.8.8 and got the same error msg. >> >> >> On 10/06/2015 07:06 PM, Jiajun Cao wrote: >> >> Hi, >> >> Could you replace >> >> dmtcp_launch --rm mpirun --mca btl self,tcp ./<your binary> >> >> with the following: >> >> srun dmtcp_launch --rm ./<your binary> >> >> Also, add the following env vars to the script: >> >> export OMPI_MCA_mtl=^psm >> export OMPI_MCA_btl=self,tcp >> >> and try again? >> >> On Tue, Oct 6, 2015 at 4:41 PM, abderrahmane <[email protected]> >> wrote: >> >> Hello >> ]Thanks for the respond. >> >> >> On 10/06/2015 02:18 PM, Jiajun Cao wrote: >> >> Hi, >> >> >> 1. What kind of application are you running? Is there an integration of >> matlab and mpi? I'm asking because I haven't run any mpi-based matlab >> applications before. >> >> i just created a script that calculate fibonacci number a prints it out. >> >> 2. What kind of environment are you using? Specifically, I'd like to know >> the MPI version, interconnect network type (Ethernet or InfiniBand), and >> how MPI and Slurm are integrated (i.e., in the cluster, what command do you >> use to run the application, srun or mpirun). >> >> I am using rhel7 and openmpi 1.8 inbiniband. for the slurm it is >> integrated in a cluster environment, I used the script here : >> >> https://github.com/dmtcp/dmtcp/blob/master/plugin/batch-queue/job_examples/slurm_launch.job >> >> 3. Do you get a valid checkpoint image(s)? Also, please attach your job >> scripts. >> >> I get the checkpoint needed but when i restart i received the error i sent >> >> Thanks >> >> >> On Tue, Oct 6, 2015 at 1:29 PM, Kapil Arya < <[email protected]> >> [email protected]> wrote: >> >> Jiajun, Artem, >> >> Can one of you take a look at this one? >> >> Kapil >> >> On Tue, Oct 6, 2015 at 12:31 PM, abderrahmane < <[email protected]> >> [email protected]> wrote: >> >> Hello >> >> Thank you for the effort and work (dmtcp), I do have some questions: >> ( P.S :I run my matlab code using --rm mpirun and slurm.) >> >> 1- is there a good way to run matlab code? I created a bash file in >> added the following : >> matlab -nojvm < file.m >> >> 2- running the code above with dmtcp and matlab worked fine, but when i >> tried to restart the code using slurm_restart.job code from your github >> and using --rm mpirun , I received the following error: >> >> restart error: cannot map initial resources into the restart allocation. >> Allocated resources : *nodex:4 nodey:4 >> >> any ideas? please feel free to ask me more questions. >> >> best regards; >> >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Dmtcp-forum mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum >> >> >> >> >> >> >> >> >> >> -- >> С Уважением, Поляков Артем Юрьевич >> Best regards, Artem Y. Polyakov >> >> >> > > > -- > С Уважением, Поляков Артем Юрьевич > Best regards, Artem Y. Polyakov > -- С Уважением, Поляков Артем Юрьевич Best regards, Artem Y. Polyakov
------------------------------------------------------------------------------
_______________________________________________ Dmtcp-forum mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
