[slurm-users] Burst buffer plugin roadmap

2019-11-01 Thread Cao, Lei
Hi,


We have been using the burst buffer plugin to build our own staging layer 
at LANL, and we are wondering if there will be any big changes to the burst 
buffer plugin in the future?



Thanks


Lei


Re: [slurm-users] job priority keeping resources from being used?

2019-11-01 Thread c b
I see - yes, to clarify, we are specifying memory for each of these jobs,
and there is enough memory on the nodes for both types of jobs to be
running simultaneously.

On Fri, Nov 1, 2019 at 1:59 PM Brian Andrus  wrote:

> I ask if you are specifying it, because if not, slurm will assume a job
> will use all the memory available.
>
> So without specifying, your big job gets allocated 100% of the memory so
> nothing could be sent to the node. Same if you don't specify for the little
> jobs. It would want 100%, but if anything is running there, 100% is not
> available as far as slurm is concerned.
>
> Brian
> On 11/1/2019 10:52 AM, c b wrote:
>
> yes, there is enough memory for each of these jobs, and there is enough
> memory to run the high resource and low resource jobs at the same time.
>
> On Fri, Nov 1, 2019 at 1:37 PM Brian Andrus  wrote:
>
>> Are you specifying memory for each of the jobs?
>>
>> Can't run a small job if there isn't enough memory available for it.
>>
>> Brian Andrus
>> On 11/1/2019 7:42 AM, c b wrote:
>>
>> I have:
>> SelectType=select/cons_res
>> SelectTypeParameters=CR_CPU_Memory
>>
>> On Fri, Nov 1, 2019 at 10:39 AM Mark Hahn  wrote:
>>
>>> > In theory, these small jobs could slip in and run alongside the large
>>> jobs,
>>>
>>> what are your SelectType and SelectTypeParameters settings?
>>> ExclusiveUser=YES on partitions?
>>>
>>> regards, mark hahn.
>>>
>>>


Re: [slurm-users] job priority keeping resources from being used?

2019-11-01 Thread Burian, John
I don’t know from experience if Slurm behaves unexpectedly with “unlimited” 
versus some large number, like 30 days; but barring something unexpected, it 
seems like time limits shouldn’t be the problem.

John


From: slurm-users  on behalf of c b 

Reply-To: Slurm User Community List 
Date: Friday, November 1, 2019 at 11:09 AM
To: Slurm User Community List 
Subject: Re: [slurm-users] job priority keeping resources from being used?

On my low resource jobs I'm setting the time to 1 hour, and on my large ones 
I'm setting time=unlimited.

Is the unlimited part the problem?  I have that setting because in my cluster 
there are some machines that come in and out during the day via reservations, 
and I want to keep these larger jobs from running on those machines.





On Fri, Nov 1, 2019 at 10:56 AM Burian, John 
mailto:john.bur...@nationwidechildrens.org>>
 wrote:
Are you setting realistic job run times (sbatch –t )?

Slurm won’t backfill low priority jobs (with low resource requirements) in 
front of a high priority job (blocked waiting on high resource requirements) if 
it thinks the low priority jobs will delay the eventual start of the high 
priority job. If all jobs are submitted with the same job run time, then Slurm 
will never backfill, because as far as Slurm knows, the low priority jobs will 
take longer to finish than just waiting for the current running jobs to finish.

John


From: slurm-users 
mailto:slurm-users-boun...@lists.schedmd.com>>
 on behalf of c b 
mailto:breedthoughts@gmail.com>>
Reply-To: Slurm User Community List 
mailto:slurm-users@lists.schedmd.com>>
Date: Friday, November 1, 2019 at 10:30 AM
To: "slurm-users@lists.schedmd.com" 
mailto:slurm-users@lists.schedmd.com>>
Subject: [slurm-users] job priority keeping resources from being used?

[WARNING: External Email - Use Caution]


Hi,

Apologies for the weird subject line...I don't know how else to describe what 
I'm seeing.

Suppose my cluster has machines with 8 cores each.  I have many large high 
priority jobs that each require 6 cores, so each machine in my cluster runs one 
of each of these jobs at a time.  However, I also have lots of small jobs that 
each require one core, and these jobs have low priority so in my queue they are 
behind all my large jobs.

In theory, these small jobs could slip in and run alongside the large jobs, but 
I'm not seeing that happen.  So my machines have two cores sitting idle when 
they could be doing work.  How do I configure slurm to run these jobs better?

thanks for any help.



Re: [slurm-users] RHEL8 support - Missing Symbols in SelectType libraries

2019-11-01 Thread Michael Jennings
On Friday, 01 November 2019, at 10:41:26 (-0700),
Brian Andrus wrote:

> That's pretty much how I did it too.
> 
> But...
> 
> When you try to run slurmd, it chokes on the missing symbols issue.

I don't yet have a full RHEL8 cluster to test on, and this isn't
really my area of expertise, but have you tried disabling "-Wl,-z,now"
from $LDFLAGS during the RPM build?  Since the powercap symbols are
defined in slurmctld but not slurmd, I suspect that the symbol
problems are related to the disabling of lazy symbol bindings.

I could be completely wrong, of course, but that's what I'd try. :-)

Michael

-- 
Michael E. Jennings 
HPC Systems Team, Los Alamos National Laboratory
Bldg. 03-2327, Rm. 2341 W: +1 (505) 606-0605



Re: [slurm-users] job priority keeping resources from being used?

2019-11-01 Thread Brian Andrus
I ask if you are specifying it, because if not, slurm will assume a job 
will use all the memory available.


So without specifying, your big job gets allocated 100% of the memory so 
nothing could be sent to the node. Same if you don't specify for the 
little jobs. It would want 100%, but if anything is running there, 100% 
is not available as far as slurm is concerned.


Brian

On 11/1/2019 10:52 AM, c b wrote:
yes, there is enough memory for each of these jobs, and there is 
enough memory to run the high resource and low resource jobs at the 
same time.


On Fri, Nov 1, 2019 at 1:37 PM Brian Andrus > wrote:


Are you specifying memory for each of the jobs?

Can't run a small job if there isn't enough memory available for it.

Brian Andrus

On 11/1/2019 7:42 AM, c b wrote:

I have:
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory

On Fri, Nov 1, 2019 at 10:39 AM Mark Hahn mailto:h...@mcmaster.ca>> wrote:

> In theory, these small jobs could slip in and run alongside
the large jobs,

what are your SelectType and SelectTypeParameters settings?
ExclusiveUser=YES on partitions?

regards, mark hahn.



Re: [slurm-users] job priority keeping resources from being used?

2019-11-01 Thread c b
yes, there is enough memory for each of these jobs, and there is enough
memory to run the high resource and low resource jobs at the same time.

On Fri, Nov 1, 2019 at 1:37 PM Brian Andrus  wrote:

> Are you specifying memory for each of the jobs?
>
> Can't run a small job if there isn't enough memory available for it.
>
> Brian Andrus
> On 11/1/2019 7:42 AM, c b wrote:
>
> I have:
> SelectType=select/cons_res
> SelectTypeParameters=CR_CPU_Memory
>
> On Fri, Nov 1, 2019 at 10:39 AM Mark Hahn  wrote:
>
>> > In theory, these small jobs could slip in and run alongside the large
>> jobs,
>>
>> what are your SelectType and SelectTypeParameters settings?
>> ExclusiveUser=YES on partitions?
>>
>> regards, mark hahn.
>>
>>


Re: [slurm-users] RHEL8 support - Missing Symbols in SelectType libraries

2019-11-01 Thread Michael Jennings
On Friday, 01 November 2019, at 11:37:37 (-0600),
Michael Jennings wrote:

> I build with Mezzanine, but the equivalent would roughly be this:
> 
>   rpmbuild -ts slurm-19.05.3-2.tar.bz2
>   cat the_above_diff.patch | (cd ~/rpmbuild/SPECS ; patch -p0)
>   rpmbuild --with x11 --with lua --with pmix ~/rpmbuild/SPECS/slurm.spec

Sorry; that last line should read:
  rpmbuild -ba --with x11 --with lua --with pmix ~/rpmbuild/SPECS/slurm.spec

Oopsie-daisy!
Michael

-- 
Michael E. Jennings 
HPC Systems Team, Los Alamos National Laboratory
Bldg. 03-2327, Rm. 2341 W: +1 (505) 606-0605



Re: [slurm-users] RHEL8 support - Missing Symbols in SelectType libraries

2019-11-01 Thread Michael Jennings
On Tuesday, 29 October 2019, at 15:11:38 (+),
Christopher Benjamin Coffey wrote:

> Brian, I've actually just started attempting to build slurm 19 on
> centos 8 yesterday. As you say, there are packages missing now from
> repos like:

They're not missing; they're just harder to get at now, for some
reason.

I have successfully built SLURM 19.05.3 on CentOS 8.  I'm still
verifying that everything works correctly, but here's my spec file
diff so far (safe to apply upstream, if desired...won't break other
distros):

--- slurm.spec.orig 2019-10-31 13:55:28.077658869 -0600
+++ slurm.spec  2019-11-01 10:57:28.921423048 -0600
@@ -62,14 +62,14 @@
 %{?systemd_requires}
 BuildRequires: systemd
 BuildRequires: munge-devel munge-libs
-BuildRequires: python
+BuildRequires: python%{?el8:2}
 BuildRequires: readline-devel
 Obsoletes: slurm-lua slurm-munge slurm-plugins
 
 # fake systemd support when building rpms on other platforms
 %{!?_unitdir: %global _unitdir /lib/systemd/systemd}
 
-%define use_mysql_devel %(perl -e '`rpm -q mariadb-devel`; print $?;')
+%{expand:%%global use_mysql_devel %(perl -e '`rpm -q mariadb-devel`; print 
$?;')}
 
 %if %{with mysql}
 %if %{use_mysql_devel}
@@ -124,12 +124,12 @@
 
 %if %{with pmix}
 BuildRequires: pmix
-%global pmix %(rpm -q pmix --qf "%{VERSION}")
+%{expand:%%global pmix %(rpm -q pmix --qf "%{VERSION}")}
 %endif
 
 %if %{with ucx}
 BuildRequires: ucx-devel
-%global ucx_version %(rpm -q ucx-devel --qf "%{VERSION}")
+%{expand:%%global ucx_version %(rpm -q ucx-devel --qf "%{VERSION}")}
 %endif
 
 #  Allow override of sysconfdir via _slurm_sysconfdir.
@@ -158,7 +158,7 @@
 #
 # Should unpackaged files in a build root terminate a build?
 # Uncomment if needed again.
-#%define _unpackaged_files_terminate_build  0
+#define _unpackaged_files_terminate_build  0
 
 # Slurm may intentionally include empty manifest files, which will
 # cause errors with rpm 4.13 and on. Turn that check off.
@@ -265,7 +265,7 @@
 Obsoletes: slurm-sjobexit slurm-sjstat slurm-seff
 %description contribs
 seff is a mail program used directly by the Slurm daemons. On completion of a
-job, wait for it's accounting information to be available and include that
+job, wait for its accounting information to be available and include that
 information in the email body.
 sjobexit is a slurm job exit code management tool. It enables users to alter
 job exit code information for completed jobs
@@ -305,6 +305,7 @@
 %prep
 # when the rel number is one, the tarball filename does not include it
 %setup -n %{slurm_source_dir}
+%{__sed} -i -e 's!env python$!env python2!' $(%{__grep} -Frl 'env python' .)
 
 %build
 %configure \

I build with Mezzanine, but the equivalent would roughly be this:

  rpmbuild -ts slurm-19.05.3-2.tar.bz2
  cat the_above_diff.patch | (cd ~/rpmbuild/SPECS ; patch -p0)
  rpmbuild --with x11 --with lua --with pmix ~/rpmbuild/SPECS/slurm.spec

In order for that to work, though, you need to configure your yum/dnf
setup, like so:

  dnf -y install dnf-utils
  dnf config-manager --set-enabled cr
  dnf config-manager --set-enabled PowerTools
  dnf -y install munge-devel lua-devel json-c-devel lz4-devel

Hope that helps!
Michael

-- 
Michael E. Jennings 
HPC Systems Team, Los Alamos National Laboratory
Bldg. 03-2327, Rm. 2341 W: +1 (505) 606-0605



Re: [slurm-users] job priority keeping resources from being used?

2019-11-01 Thread Brian Andrus

Are you specifying memory for each of the jobs?

Can't run a small job if there isn't enough memory available for it.

Brian Andrus

On 11/1/2019 7:42 AM, c b wrote:

I have:
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory

On Fri, Nov 1, 2019 at 10:39 AM Mark Hahn > wrote:


> In theory, these small jobs could slip in and run alongside the
large jobs,

what are your SelectType and SelectTypeParameters settings?
ExclusiveUser=YES on partitions?

regards, mark hahn.



Re: [slurm-users] job priority keeping resources from being used?

2019-11-01 Thread c b
I tried setting a 5 minute time limit on some low resource jobs, and one
hour on high resource jobs, but my 5 minute jobs are still waiting behind
the hourlong jobs.

Can you suggest some combination of time limits that would work here?



On Fri, Nov 1, 2019 at 11:08 AM c b  wrote:

> On my low resource jobs I'm setting the time to 1 hour, and on my large
> ones I'm setting time=unlimited.
>
> Is the unlimited part the problem?  I have that setting because in my
> cluster there are some machines that come in and out during the day via
> reservations, and I want to keep these larger jobs from running on those
> machines.
>
>
>
>
>
> On Fri, Nov 1, 2019 at 10:56 AM Burian, John <
> john.bur...@nationwidechildrens.org> wrote:
>
>> Are you setting realistic job run times (sbatch –t )?
>>
>>
>>
>> Slurm won’t backfill low priority jobs (with low resource requirements)
>> in front of a high priority job (blocked waiting on high resource
>> requirements) if it thinks the low priority jobs will delay the eventual
>> start of the high priority job. If all jobs are submitted with the same job
>> run time, then Slurm will never backfill, because as far as Slurm knows,
>> the low priority jobs will take longer to finish than just waiting for the
>> current running jobs to finish.
>>
>>
>>
>> John
>>
>>
>>
>>
>>
>> *From: *slurm-users  on behalf of
>> c b 
>> *Reply-To: *Slurm User Community List 
>> *Date: *Friday, November 1, 2019 at 10:30 AM
>> *To: *"slurm-users@lists.schedmd.com" 
>> *Subject: *[slurm-users] job priority keeping resources from being used?
>>
>>
>>
>> [WARNING: External Email - Use Caution]
>>
>>
>>
>> Hi,
>>
>>
>>
>> Apologies for the weird subject line...I don't know how else to describe
>> what I'm seeing.
>>
>>
>>
>> Suppose my cluster has machines with 8 cores each.  I have many large
>> high priority jobs that each require 6 cores, so each machine in my cluster
>> runs one of each of these jobs at a time.  However, I also have lots of
>> small jobs that each require one core, and these jobs have low priority so
>> in my queue they are behind all my large jobs.
>>
>>
>>
>> In theory, these small jobs could slip in and run alongside the large
>> jobs, but I'm not seeing that happen.  So my machines have two cores
>> sitting idle when they could be doing work.  How do I configure slurm to
>> run these jobs better?
>>
>>
>>
>> thanks for any help.
>>
>>
>>
>


Re: [slurm-users] job priority keeping resources from being used?

2019-11-01 Thread c b
On my low resource jobs I'm setting the time to 1 hour, and on my large
ones I'm setting time=unlimited.

Is the unlimited part the problem?  I have that setting because in my
cluster there are some machines that come in and out during the day via
reservations, and I want to keep these larger jobs from running on those
machines.





On Fri, Nov 1, 2019 at 10:56 AM Burian, John <
john.bur...@nationwidechildrens.org> wrote:

> Are you setting realistic job run times (sbatch –t )?
>
>
>
> Slurm won’t backfill low priority jobs (with low resource requirements) in
> front of a high priority job (blocked waiting on high resource
> requirements) if it thinks the low priority jobs will delay the eventual
> start of the high priority job. If all jobs are submitted with the same job
> run time, then Slurm will never backfill, because as far as Slurm knows,
> the low priority jobs will take longer to finish than just waiting for the
> current running jobs to finish.
>
>
>
> John
>
>
>
>
>
> *From: *slurm-users  on behalf of
> c b 
> *Reply-To: *Slurm User Community List 
> *Date: *Friday, November 1, 2019 at 10:30 AM
> *To: *"slurm-users@lists.schedmd.com" 
> *Subject: *[slurm-users] job priority keeping resources from being used?
>
>
>
> [WARNING: External Email - Use Caution]
>
>
>
> Hi,
>
>
>
> Apologies for the weird subject line...I don't know how else to describe
> what I'm seeing.
>
>
>
> Suppose my cluster has machines with 8 cores each.  I have many large high
> priority jobs that each require 6 cores, so each machine in my cluster runs
> one of each of these jobs at a time.  However, I also have lots of small
> jobs that each require one core, and these jobs have low priority so in my
> queue they are behind all my large jobs.
>
>
>
> In theory, these small jobs could slip in and run alongside the large
> jobs, but I'm not seeing that happen.  So my machines have two cores
> sitting idle when they could be doing work.  How do I configure slurm to
> run these jobs better?
>
>
>
> thanks for any help.
>
>
>


Re: [slurm-users] job priority keeping resources from being used?

2019-11-01 Thread Burian, John
Are you setting realistic job run times (sbatch –t )?

Slurm won’t backfill low priority jobs (with low resource requirements) in 
front of a high priority job (blocked waiting on high resource requirements) if 
it thinks the low priority jobs will delay the eventual start of the high 
priority job. If all jobs are submitted with the same job run time, then Slurm 
will never backfill, because as far as Slurm knows, the low priority jobs will 
take longer to finish than just waiting for the current running jobs to finish.

John


From: slurm-users  on behalf of c b 

Reply-To: Slurm User Community List 
Date: Friday, November 1, 2019 at 10:30 AM
To: "slurm-users@lists.schedmd.com" 
Subject: [slurm-users] job priority keeping resources from being used?

[WARNING: External Email - Use Caution]


Hi,

Apologies for the weird subject line...I don't know how else to describe what 
I'm seeing.

Suppose my cluster has machines with 8 cores each.  I have many large high 
priority jobs that each require 6 cores, so each machine in my cluster runs one 
of each of these jobs at a time.  However, I also have lots of small jobs that 
each require one core, and these jobs have low priority so in my queue they are 
behind all my large jobs.

In theory, these small jobs could slip in and run alongside the large jobs, but 
I'm not seeing that happen.  So my machines have two cores sitting idle when 
they could be doing work.  How do I configure slurm to run these jobs better?

thanks for any help.



Re: [slurm-users] job priority keeping resources from being used?

2019-11-01 Thread c b
I have:
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory

On Fri, Nov 1, 2019 at 10:39 AM Mark Hahn  wrote:

> > In theory, these small jobs could slip in and run alongside the large
> jobs,
>
> what are your SelectType and SelectTypeParameters settings?
> ExclusiveUser=YES on partitions?
>
> regards, mark hahn.
>
>


[slurm-users] job priority keeping resources from being used?

2019-11-01 Thread c b
Hi,

Apologies for the weird subject line...I don't know how else to describe
what I'm seeing.

Suppose my cluster has machines with 8 cores each.  I have many large high
priority jobs that each require 6 cores, so each machine in my cluster runs
one of each of these jobs at a time.  However, I also have lots of small
jobs that each require one core, and these jobs have low priority so in my
queue they are behind all my large jobs.

In theory, these small jobs could slip in and run alongside the large jobs,
but I'm not seeing that happen.  So my machines have two cores sitting idle
when they could be doing work.  How do I configure slurm to run these jobs
better?

thanks for any help.