Re: [slurm-users] Running an MPI job across two partitions

2020-03-24 Thread Chris Samuel

On 23/3/20 8:32 am, CB wrote:

I've looked at the heterogeneous job support but it creates two-separate 
jobs.


Yes, but the web page does say:

# By default, the applications launched by a single execution of
# the srun command (even for different components of the
# heterogeneous job) are combined into one MPI_COMM_WORLD with
# non-overlapping task IDs.

So it _should_ work.

I know there are issues with Cray systems & hetjobs at the moment, but I 
suspect that's not likely to concern you.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Running an MPI job across two partitions

2020-03-24 Thread CB
Hi Michael,

Thanks for the comment.

I was just checking if there is any other way to do the job before
introducing another partition.
So it appears to me that creating a new partition is the way to go.

Thanks,
Chansup

On Mon, Mar 23, 2020 at 1:25 PM Renfro, Michael  wrote:

> Others might have more ideas, but anything I can think of would require a
> lot of manual steps to avoid mutual interference with jobs in the other
> partitions (allocating resources for a dummy job in the other partition,
> modifying the MPI host list to include nodes in the other partition, etc.).
>
> So why not make another partition encompassing both sets of nodes?
>
> > On Mar 23, 2020, at 10:58 AM, CB  wrote:
> >
> > Hi Andy,
> >
> > Yes, they are on teh same network fabric.
> >
> > Sure, creating another partition that encompass all of the nodes of the
> two or more partitions would solve the problem.
> > I am wondering if there are any other ways instead of creating a new
> partition?
> >
> > Thanks,
> > Chansup
> >
> >
> > On Mon, Mar 23, 2020 at 11:51 AM Riebs, Andy  wrote:
> > When you say “distinct compute nodes,” are they at least on the same
> network fabric?
> >
> >
> >
> > If so, the first thing I’d try would be to create a new partition that
> encompasses all of the nodes of the other two partitions.
> >
> >
> >
> > Andy
> >
> >
> >
> > From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On
> Behalf Of CB
> > Sent: Monday, March 23, 2020 11:32 AM
> > To: Slurm User Community List 
> > Subject: [slurm-users] Running an MPI job across two partitions
> >
> >
> >
> > Hi,
> >
> >
> >
> > I'm running Slurm 19.05 version.
> >
> >
> >
> > Is there any way to launch an MPI job on a group of distributed  nodes
> from two or more partitions, where each partition has distinct compute
> nodes?
> >
> >
> >
> > I've looked at the heterogeneous job support but it creates two-separate
> jobs.
> >
> >
> >
> > If there is no such capability with the current Slurm, I'd like to hear
> any recommendations or suggestions.
> >
> >
> >
> > Thanks,
> >
> > Chansup
> >
>
>


Re: [slurm-users] Running an MPI job across two partitions

2020-03-23 Thread Renfro, Michael
Others might have more ideas, but anything I can think of would require a lot 
of manual steps to avoid mutual interference with jobs in the other partitions 
(allocating resources for a dummy job in the other partition, modifying the MPI 
host list to include nodes in the other partition, etc.).

So why not make another partition encompassing both sets of nodes?

> On Mar 23, 2020, at 10:58 AM, CB  wrote:
> 
> Hi Andy,
> 
> Yes, they are on teh same network fabric.
> 
> Sure, creating another partition that encompass all of the nodes of the two 
> or more partitions would solve the problem.
> I am wondering if there are any other ways instead of creating a new 
> partition?
> 
> Thanks,
> Chansup
> 
> 
> On Mon, Mar 23, 2020 at 11:51 AM Riebs, Andy  wrote:
> When you say “distinct compute nodes,” are they at least on the same network 
> fabric?
> 
>  
> 
> If so, the first thing I’d try would be to create a new partition that 
> encompasses all of the nodes of the other two partitions.
> 
>  
> 
> Andy
> 
>  
> 
> From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of 
> CB
> Sent: Monday, March 23, 2020 11:32 AM
> To: Slurm User Community List 
> Subject: [slurm-users] Running an MPI job across two partitions
> 
>  
> 
> Hi,
> 
>  
> 
> I'm running Slurm 19.05 version.
> 
>  
> 
> Is there any way to launch an MPI job on a group of distributed  nodes from 
> two or more partitions, where each partition has distinct compute nodes?
> 
>  
> 
> I've looked at the heterogeneous job support but it creates two-separate jobs.
> 
>  
> 
> If there is no such capability with the current Slurm, I'd like to hear any 
> recommendations or suggestions.
> 
>  
> 
> Thanks,
> 
> Chansup
> 



Re: [slurm-users] Running an MPI job across two partitions

2020-03-23 Thread CB
Hi Andy,

Yes, they are on teh same network fabric.

Sure, creating another partition that encompass all of the nodes of the two
or more partitions would solve the problem.
I am wondering if there are any other ways instead of creating a new
partition?

Thanks,
Chansup


On Mon, Mar 23, 2020 at 11:51 AM Riebs, Andy  wrote:

> When you say “distinct compute nodes,” are they at least on the same
> network fabric?
>
>
>
> If so, the first thing I’d try would be to create a new partition that
> encompasses all of the nodes of the other two partitions.
>
>
>
> Andy
>
>
>
> *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On
> Behalf Of *CB
> *Sent:* Monday, March 23, 2020 11:32 AM
> *To:* Slurm User Community List 
> *Subject:* [slurm-users] Running an MPI job across two partitions
>
>
>
> Hi,
>
>
>
> I'm running Slurm 19.05 version.
>
>
>
> Is there any way to launch an MPI job on a group of distributed  nodes
> from two or more partitions, where each partition has distinct compute
> nodes?
>
>
>
> I've looked at the heterogeneous job support but it creates two-separate
> jobs.
>
>
>
> If there is no such capability with the current Slurm, I'd like to hear
> any recommendations or suggestions.
>
>
>
> Thanks,
>
> Chansup
>


Re: [slurm-users] Running an MPI job across two partitions

2020-03-23 Thread Riebs, Andy
When you say “distinct compute nodes,” are they at least on the same network 
fabric?

If so, the first thing I’d try would be to create a new partition that 
encompasses all of the nodes of the other two partitions.

Andy

From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of CB
Sent: Monday, March 23, 2020 11:32 AM
To: Slurm User Community List 
Subject: [slurm-users] Running an MPI job across two partitions

Hi,

I'm running Slurm 19.05 version.

Is there any way to launch an MPI job on a group of distributed  nodes from two 
or more partitions, where each partition has distinct compute nodes?

I've looked at the heterogeneous job support but it creates two-separate jobs.

If there is no such capability with the current Slurm, I'd like to hear any 
recommendations or suggestions.

Thanks,
Chansup


[slurm-users] Running an MPI job across two partitions

2020-03-23 Thread CB
Hi,

I'm running Slurm 19.05 version.

Is there any way to launch an MPI job on a group of distributed  nodes from
two or more partitions, where each partition has distinct compute nodes?

I've looked at the heterogeneous job support but it creates two-separate
jobs.

If there is no such capability with the current Slurm, I'd like to hear any
recommendations or suggestions.

Thanks,
Chansup