Hello, Sven,
Thank you very much for your message.
I tried your work around and it works!
Best,
Xiaoge
________________________________
发件人: Sven Willner <[email protected]>
发送时间: 2018年9月7日 7:39:29
收件人: Wang, Xiaoge
抄送: [email protected]
主题: Re: [Dmtcp-forum] dmtcp restart problem with slurm cgroup
Hey Xiaoge,
I had the same problemand in my case it turned out that the open
cgroup files were inherited file descriptors used for watching
events. The pathvirt plugin did not help as it only handles newly
opened files.
I worked around the problem by preventing bash (which starts my
program) from passing the cgroup file descriptors on to my
program. Using lsof I found their numbers were 10 (memory) and 11
(cpu) inherited from the slurm starting process itself. Thus, I
used the call
dmtcp_launch my_program 10>&- 11>&-
which works fine. I hope that helps you, too.
Sven
Wang, Xiaoge <[email protected]> writes:
> Hello,
>
>
> I have been trying to run batch job (using slurm) with
> checkpointing. I run into the same issue as already reported in
> this forum, see
> https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceforge.net_p_dmtcp_mailman_message_36347021_&d=DwIBAg&c=nE__W8dFE-shTxStwXtp0A&r=x_FZe_Hk8hQ_n_O7vZFvuw&m=nO7dgHZ0sVmZyvirPHxCKSd8Ef76ebdTi4aUA6ctUwo&s=dMm1iUuO61_y2MWShBF7XMCXkQkoN7lpoC9A78kb4Ug&e=
> .
>
>
> I am wondering if it is resolved. If it is resolved, what is the
> solution? I would like to try on my end. Thanks.
>
>
> -Xiaoge
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org!
> https://urldefense.proofpoint.com/v2/url?u=http-3A__sdm.link_slashdot-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F-5F&d=DwIBAg&c=nE__W8dFE-shTxStwXtp0A&r=x_FZe_Hk8hQ_n_O7vZFvuw&m=nO7dgHZ0sVmZyvirPHxCKSd8Ef76ebdTi4aUA6ctUwo&s=VE2zSUc1aDx3wFFxx2sJrVZjYcCbr2pbFoeWJJrjtYU&e=
> Dmtcp-forum mailing list
> [email protected]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_dmtcp-2Dforum&d=DwIBAg&c=nE__W8dFE-shTxStwXtp0A&r=x_FZe_Hk8hQ_n_O7vZFvuw&m=nO7dgHZ0sVmZyvirPHxCKSd8Ef76ebdTi4aUA6ctUwo&s=axG4r8cSlTEWOM5OPwqobakm48ekrjNV-ZdaqbC9NXQ&e=
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum