P.S. no way to avoid that for now and near future IMO.

2015-10-09 17:01 GMT+03:00 Artem Polyakov <[email protected]>:

> You don't need "exact" allocation in terms of nodenames but you do need to
> remember how many nodes and how many procs per node you had in original
> allocation.
>
> 2015-10-09 16:39 GMT+03:00 MR.AB <[email protected]>:
>
>> Hey
>> Thank you for the email, is there a way to make it work or i have tot
>> have variables to "remember" the exact allocations?
>>
>>
>>
>> On Friday, October 9, 2015 4:34 AM, Artem Polyakov <[email protected]>
>> wrote:
>>
>>
>> Hello,
>> Please note, that one of the reasons may be non-equivalent allocations.
>> DMTCP cannot restore processes that was originally running on the same node
>> to be on different nodes. This means that if you originally requested the
>> following allocation: cn[0-1], ppn = 4
>> and trying to restart on cn[0-4], ppn = 2
>> this won't work even though the allocations are logically equivalent.
>>
>> 2015-10-08 16:00 GMT+03:00 abderrahmane <[email protected]>:
>>
>> Hello
>>
>> I did it and still got Restart error : cannot map initial resources into
>> the restart allocation.
>>
>> Also i used openmpi 1.8.8 and got the same error msg.
>>
>>
>> On 10/06/2015 07:06 PM, Jiajun Cao wrote:
>>
>> Hi,
>>
>> Could you replace
>>
>> dmtcp_launch --rm mpirun --mca btl self,tcp ./<your binary>
>>
>> with the following:
>>
>> srun dmtcp_launch --rm ./<your binary>
>>
>> Also, add the following env vars to the script:
>>
>> export OMPI_MCA_mtl=^psm
>> export OMPI_MCA_btl=self,tcp
>>
>> and try again?
>>
>> On Tue, Oct 6, 2015 at 4:41 PM, abderrahmane <[email protected]>
>> wrote:
>>
>> Hello
>> ]Thanks for the respond.
>>
>>
>> On 10/06/2015 02:18 PM, Jiajun Cao wrote:
>>
>> Hi,
>>
>>
>> 1. What kind of application are you running? Is there an integration of
>> matlab and mpi? I'm asking because I haven't run any mpi-based matlab
>> applications before.
>>
>> i just created a script that calculate fibonacci number a prints it out.
>>
>> 2. What kind of environment are you using? Specifically, I'd like to know
>> the MPI version, interconnect network type (Ethernet or InfiniBand), and
>> how MPI and Slurm are integrated (i.e., in the cluster, what command do you
>> use to run the application, srun or mpirun).
>>
>> I am using rhel7 and openmpi 1.8 inbiniband. for the slurm it is
>> integrated in a cluster environment, I used the script here :
>>
>> https://github.com/dmtcp/dmtcp/blob/master/plugin/batch-queue/job_examples/slurm_launch.job
>>
>> 3. Do you get a valid checkpoint image(s)? Also, please attach your job
>> scripts.
>>
>> I get the checkpoint needed but when i restart i received the error i sent
>>
>> Thanks
>>
>>
>> On Tue, Oct 6, 2015 at 1:29 PM, Kapil Arya < <[email protected]>
>> [email protected]> wrote:
>>
>> Jiajun, Artem,
>>
>> Can one of you take a look at this one?
>>
>> Kapil
>>
>> On Tue, Oct 6, 2015 at 12:31 PM, abderrahmane < <[email protected]>
>> [email protected]> wrote:
>>
>> Hello
>>
>> Thank you for the effort and work (dmtcp), I do have some questions:
>> ( P.S :I run my matlab code using --rm mpirun and slurm.)
>>
>> 1- is there a good way to run matlab code? I created a bash file in
>> added the following :
>>      matlab -nojvm < file.m
>>
>> 2- running the code above with dmtcp and matlab worked fine, but when i
>> tried to restart the code using slurm_restart.job code from your github
>> and using --rm mpirun , I received the following error:
>>
>> restart error: cannot map initial resources into the restart allocation.
>> Allocated resources : *nodex:4  nodey:4
>>
>> any ideas? please feel free to ask me more questions.
>>
>> best regards;
>>
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Dmtcp-forum mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> С Уважением, Поляков Артем Юрьевич
>> Best regards, Artem Y. Polyakov
>>
>>
>>
>
>
> --
> С Уважением, Поляков Артем Юрьевич
> Best regards, Artem Y. Polyakov
>



-- 
С Уважением, Поляков Артем Юрьевич
Best regards, Artem Y. Polyakov
------------------------------------------------------------------------------
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to