Re: [slurm-users] Running an MPI job across two partitions
On 23/3/20 8:32 am, CB wrote: I've looked at the heterogeneous job support but it creates two-separate jobs. Yes, but the web page does say: # By default, the applications launched by a single execution of # the srun command (even for different components of the # heterogeneous job) are combined into one MPI_COMM_WORLD with # non-overlapping task IDs. So it _should_ work. I know there are issues with Cray systems & hetjobs at the moment, but I suspect that's not likely to concern you. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
Re: [slurm-users] Running an MPI job across two partitions
Hi Michael, Thanks for the comment. I was just checking if there is any other way to do the job before introducing another partition. So it appears to me that creating a new partition is the way to go. Thanks, Chansup On Mon, Mar 23, 2020 at 1:25 PM Renfro, Michael wrote: > Others might have more ideas, but anything I can think of would require a > lot of manual steps to avoid mutual interference with jobs in the other > partitions (allocating resources for a dummy job in the other partition, > modifying the MPI host list to include nodes in the other partition, etc.). > > So why not make another partition encompassing both sets of nodes? > > > On Mar 23, 2020, at 10:58 AM, CB wrote: > > > > Hi Andy, > > > > Yes, they are on teh same network fabric. > > > > Sure, creating another partition that encompass all of the nodes of the > two or more partitions would solve the problem. > > I am wondering if there are any other ways instead of creating a new > partition? > > > > Thanks, > > Chansup > > > > > > On Mon, Mar 23, 2020 at 11:51 AM Riebs, Andy wrote: > > When you say “distinct compute nodes,” are they at least on the same > network fabric? > > > > > > > > If so, the first thing I’d try would be to create a new partition that > encompasses all of the nodes of the other two partitions. > > > > > > > > Andy > > > > > > > > From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On > Behalf Of CB > > Sent: Monday, March 23, 2020 11:32 AM > > To: Slurm User Community List > > Subject: [slurm-users] Running an MPI job across two partitions > > > > > > > > Hi, > > > > > > > > I'm running Slurm 19.05 version. > > > > > > > > Is there any way to launch an MPI job on a group of distributed nodes > from two or more partitions, where each partition has distinct compute > nodes? > > > > > > > > I've looked at the heterogeneous job support but it creates two-separate > jobs. > > > > > > > > If there is no such capability with the current Slurm, I'd like to hear > any recommendations or suggestions. > > > > > > > > Thanks, > > > > Chansup > > > >
Re: [slurm-users] Running an MPI job across two partitions
Others might have more ideas, but anything I can think of would require a lot of manual steps to avoid mutual interference with jobs in the other partitions (allocating resources for a dummy job in the other partition, modifying the MPI host list to include nodes in the other partition, etc.). So why not make another partition encompassing both sets of nodes? > On Mar 23, 2020, at 10:58 AM, CB wrote: > > Hi Andy, > > Yes, they are on teh same network fabric. > > Sure, creating another partition that encompass all of the nodes of the two > or more partitions would solve the problem. > I am wondering if there are any other ways instead of creating a new > partition? > > Thanks, > Chansup > > > On Mon, Mar 23, 2020 at 11:51 AM Riebs, Andy wrote: > When you say “distinct compute nodes,” are they at least on the same network > fabric? > > > > If so, the first thing I’d try would be to create a new partition that > encompasses all of the nodes of the other two partitions. > > > > Andy > > > > From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of > CB > Sent: Monday, March 23, 2020 11:32 AM > To: Slurm User Community List > Subject: [slurm-users] Running an MPI job across two partitions > > > > Hi, > > > > I'm running Slurm 19.05 version. > > > > Is there any way to launch an MPI job on a group of distributed nodes from > two or more partitions, where each partition has distinct compute nodes? > > > > I've looked at the heterogeneous job support but it creates two-separate jobs. > > > > If there is no such capability with the current Slurm, I'd like to hear any > recommendations or suggestions. > > > > Thanks, > > Chansup >
Re: [slurm-users] Running an MPI job across two partitions
Hi Andy, Yes, they are on teh same network fabric. Sure, creating another partition that encompass all of the nodes of the two or more partitions would solve the problem. I am wondering if there are any other ways instead of creating a new partition? Thanks, Chansup On Mon, Mar 23, 2020 at 11:51 AM Riebs, Andy wrote: > When you say “distinct compute nodes,” are they at least on the same > network fabric? > > > > If so, the first thing I’d try would be to create a new partition that > encompasses all of the nodes of the other two partitions. > > > > Andy > > > > *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On > Behalf Of *CB > *Sent:* Monday, March 23, 2020 11:32 AM > *To:* Slurm User Community List > *Subject:* [slurm-users] Running an MPI job across two partitions > > > > Hi, > > > > I'm running Slurm 19.05 version. > > > > Is there any way to launch an MPI job on a group of distributed nodes > from two or more partitions, where each partition has distinct compute > nodes? > > > > I've looked at the heterogeneous job support but it creates two-separate > jobs. > > > > If there is no such capability with the current Slurm, I'd like to hear > any recommendations or suggestions. > > > > Thanks, > > Chansup >
Re: [slurm-users] Running an MPI job across two partitions
When you say “distinct compute nodes,” are they at least on the same network fabric? If so, the first thing I’d try would be to create a new partition that encompasses all of the nodes of the other two partitions. Andy From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of CB Sent: Monday, March 23, 2020 11:32 AM To: Slurm User Community List Subject: [slurm-users] Running an MPI job across two partitions Hi, I'm running Slurm 19.05 version. Is there any way to launch an MPI job on a group of distributed nodes from two or more partitions, where each partition has distinct compute nodes? I've looked at the heterogeneous job support but it creates two-separate jobs. If there is no such capability with the current Slurm, I'd like to hear any recommendations or suggestions. Thanks, Chansup
[slurm-users] Running an MPI job across two partitions
Hi, I'm running Slurm 19.05 version. Is there any way to launch an MPI job on a group of distributed nodes from two or more partitions, where each partition has distinct compute nodes? I've looked at the heterogeneous job support but it creates two-separate jobs. If there is no such capability with the current Slurm, I'd like to hear any recommendations or suggestions. Thanks, Chansup