Artem,
Could you also comment, in case I have some of the details wrong?
Hi Levi,
My understanding is that SLURM has special hooks to integrate BLCR.
At this time, the SLURM developers have not extended those hooks to
include DMTCP. We are hoping to talk with them in the future. So,
DMTCP will not yet integrate with SLURM's own requeue and checkpoint options.
Having said that, the scripts that you'll find in the plugin directory
should allow an application to be checkpointed and restarted under
SLURM, even though SLURM sees only the script itself. Internal to
our script, we write the checkpoint, and look for a checkpoint
image on restart. When we have more experience with this from users,
we will propose the SLURM developers a tighter integration.
Hope this helps,
- Gene
On Mon, Nov 03, 2014 at 03:18:39PM -0700, Levi Morrison wrote:
> Gene,
>
> You answered the "how" I integrate, but I'm after what "integration"
> means. How do I know if it "works" or not if I'm not even sure what it
> does?
>
> For instance, does it integrate with Slurm's requeue and checkpoint
> options so I can automatically checkpoint and restart jobs?
>
> Levi Morrison
>
> On Mon, Nov 3, 2014 at 3:10 PM, Gene Cooperman <[email protected]> wrote:
> > Hi Levi,
> > In order to integrate with SLURM, you will want to use the plugin:
> > DMTCP_ROOT/plugin/batch-queue/
> > Be sure to read the README file there. There are example scripts that
> > you can use in conjunction with SLURM. If you have any trouble,
> > please write to the full DMTCP team, in addition to Artem. Artem Polyakov
> > is taking primary responsibility for the SLURM integration.
> > As for integration of DMTCP with MPI, this seems to work well
> > with most common dialects of MPI. But there are some known bugs in using
> > DMTCP with MVAPICH2. If you encounter bugs in the use of MVAPICH2 or
> > any MPI, please write back to us. We also have some bug fixes and
> > workarounds that may help you.
> >
> > Best,
> > - Gene
> >
> > On Mon, Nov 03, 2014 at 04:36:07PM -0500, Kapil Arya wrote:
> >> Artem/Jiajun,
> >>
> >> Can one of you help Levi with Slurm?
> >>
> >> Kapil
> >>
> >> On Mon, Nov 3, 2014 at 4:33 PM, Levi Morrison <[email protected]>
> >> wrote:
> >>
> >> > I have been using DMTCP and BLCR for a few applications and want to
> >> > try out scheduler integration with Slurm. However, I haven't found any
> >> > documentation that says what "integration" means; any pointers on
> >> > where I could find the documentation for it?
> >> >
> >> >
> >> > ------------------------------------------------------------------------------
> >> > _______________________________________________
> >> > Dmtcp-forum mailing list
> >> > [email protected]
> >> > https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
> >> >
> >
> >> ------------------------------------------------------------------------------
> >
> >> _______________________________________________
> >> Dmtcp-forum mailing list
> >> [email protected]
> >> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
> >
------------------------------------------------------------------------------
_______________________________________________
Dmtcp-forum mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum