Re: [slurm-users] enabling job script archival

2023-10-05 Thread Davide DelVento
Okay, so perhaps this is another bug. At each reconfigure, users lose
access to the jobs they submitted before the reconfigure itself and start
"clean slate". Newly submitted jobs can be queried normally. The slurm
administrator can query everything at all times, so the data is not
lost, but this is really unfortunate

Has anybody experienced this issue or can try querying some of their old
jobs which were completed before a reconfigure and confirm if this is
happening for them too?
Anybody knows this being already a bug and/or suggest if I should submit it?

Thanks!

On Wed, Oct 4, 2023 at 7:47 PM Davide DelVento 
wrote:

> And weirdly enough it has now stopped working again, after I did the
> experimentation for power save described in the other thread.
> That is really strange. At the highest verbosity level the logs just say
>
> slurmdbd: debug:  REQUEST_PERSIST_INIT: CLUSTER:cluster VERSION:9984
> UID:1457 IP:192.168.2.254 CONN:13
>
> I reconfigured and reverted stuff to no change. Does anybody have any clue?
>
> On Tue, Oct 3, 2023 at 5:43 PM Davide DelVento 
> wrote:
>
>> For others potentially seeing this on mailing list search, yes, I needed
>> that, which of course required creating an account charge which I wasn't
>> using. So I ran
>>
>> sacctmgr add account default_account
>> sacctmgr add -i user $user Accounts=default_account
>>
>> with an appropriate looping around for $user and everything is working
>> fine now.
>>
>> Thanks everybody!
>>
>> On Tue, Oct 3, 2023 at 7:44 AM Paul Edmon  wrote:
>>
>>> You will probably need to.
>>>
>>> The way we handle it is that we add users when the first submit a job
>>> via the job_submit.lua script. This way the database autopopulates with
>>> active users.
>>>
>>> -Paul Edmon-
>>> On 10/3/23 9:01 AM, Davide DelVento wrote:
>>>
>>> By increasing the slurmdbd verbosity level, I got additional
>>> information, namely the following:
>>>
>>> slurmdbd: error: couldn't get information for this user (null)(xx)
>>> slurmdbd: debug: accounting_storage/as_mysql:
>>> as_mysql_jobacct_process_get_jobs: User  xx  has no associations, and
>>> is not admin, so not returning any jobs.
>>>
>>> again where x is the posix ID of the user who's running the query in
>>> the slurmdbd logs.
>>>
>>> I suspect this is due to the fact that our userbase is small enough (we
>>> are a department HPC) that we don't need to use allocation and the like, so
>>> I have not configured any association (and not even studied its
>>> configuration, since when I was at another place which did use
>>> associations, someone else took care of slurm administration).
>>>
>>> Anyway, I read the fantastic document by our own member at
>>> https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_accounting/#associations
>>> and in fact I have not even configured slurm users:
>>>
>>> # sacctmgr show user
>>>   User   Def Acct Admin
>>> -- -- -
>>>   root   root Administ+
>>> #
>>>
>>> So is that the issue? Should I just add all users? Any suggestions on
>>> the minimal (but robust) way to do that?
>>>
>>> Thanks!
>>>
>>>
>>> On Mon, Oct 2, 2023 at 9:20 AM Davide DelVento 
>>> wrote:
>>>
 Thanks Paul, this helps.

 I don't have any PrivateData line in either config file. According to
 the docs, "By default, all information is visible to all users" so this
 should not be an issue. I tried to add a line with "PrivateData=jobs" to
 the conf files, just in case, but that didn't change the behavior.

 On Mon, Oct 2, 2023 at 9:10 AM Paul Edmon 
 wrote:

> At least in our setup, users can see their own scripts by doing sacct
> -B -j JOBID
>
> I would make sure that the scripts are being stored and how you have
> PrivateData set.
>
> -Paul Edmon-
> On 10/2/2023 10:57 AM, Davide DelVento wrote:
>
> I deployed the job_script archival and it is working, however it can
> be queried only by root.
>
> A regular user can run sacct -lj towards any jobs (even those by other
> users, and that's okay in our setup) with no problem. However if they run
> sacct -j job_id --batch-script even against a job they own themselves,
> nothing is returned and I get a
>
> slurmdbd: error: couldn't get information for this user (null)(xx)
>
> where x is the posix ID of the user who's running the query in the
> slurmdbd logs.
>
> Both configure files slurmdbd.conf and slurm.conf do not have any
> "permission" setting. FWIW, we use LDAP.
>
> Is that the expected behavior, in that by default only root can see
> the job scripts? I was assuming the users themselves should be able to
> debug their own jobs... Any hint on what could be changed to achieve this?
>
> Thanks!
>
>
>
> On Fri, Sep 29, 2023 at 5:48 AM Davide DelVento <
> davide.quan...@gmail.com> wrote:
>
>> Fantastic, this is really helpful, 

Re: [slurm-users] enabling job script archival

2023-10-04 Thread Davide DelVento
And weirdly enough it has now stopped working again, after I did the
experimentation for power save described in the other thread.
That is really strange. At the highest verbosity level the logs just say

slurmdbd: debug:  REQUEST_PERSIST_INIT: CLUSTER:cluster VERSION:9984
UID:1457 IP:192.168.2.254 CONN:13

I reconfigured and reverted stuff to no change. Does anybody have any clue?

On Tue, Oct 3, 2023 at 5:43 PM Davide DelVento 
wrote:

> For others potentially seeing this on mailing list search, yes, I needed
> that, which of course required creating an account charge which I wasn't
> using. So I ran
>
> sacctmgr add account default_account
> sacctmgr add -i user $user Accounts=default_account
>
> with an appropriate looping around for $user and everything is working
> fine now.
>
> Thanks everybody!
>
> On Tue, Oct 3, 2023 at 7:44 AM Paul Edmon  wrote:
>
>> You will probably need to.
>>
>> The way we handle it is that we add users when the first submit a job via
>> the job_submit.lua script. This way the database autopopulates with active
>> users.
>>
>> -Paul Edmon-
>> On 10/3/23 9:01 AM, Davide DelVento wrote:
>>
>> By increasing the slurmdbd verbosity level, I got additional information,
>> namely the following:
>>
>> slurmdbd: error: couldn't get information for this user (null)(xx)
>> slurmdbd: debug: accounting_storage/as_mysql:
>> as_mysql_jobacct_process_get_jobs: User  xx  has no associations, and
>> is not admin, so not returning any jobs.
>>
>> again where x is the posix ID of the user who's running the query in
>> the slurmdbd logs.
>>
>> I suspect this is due to the fact that our userbase is small enough (we
>> are a department HPC) that we don't need to use allocation and the like, so
>> I have not configured any association (and not even studied its
>> configuration, since when I was at another place which did use
>> associations, someone else took care of slurm administration).
>>
>> Anyway, I read the fantastic document by our own member at
>> https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_accounting/#associations
>> and in fact I have not even configured slurm users:
>>
>> # sacctmgr show user
>>   User   Def Acct Admin
>> -- -- -
>>   root   root Administ+
>> #
>>
>> So is that the issue? Should I just add all users? Any suggestions on the
>> minimal (but robust) way to do that?
>>
>> Thanks!
>>
>>
>> On Mon, Oct 2, 2023 at 9:20 AM Davide DelVento 
>> wrote:
>>
>>> Thanks Paul, this helps.
>>>
>>> I don't have any PrivateData line in either config file. According to
>>> the docs, "By default, all information is visible to all users" so this
>>> should not be an issue. I tried to add a line with "PrivateData=jobs" to
>>> the conf files, just in case, but that didn't change the behavior.
>>>
>>> On Mon, Oct 2, 2023 at 9:10 AM Paul Edmon 
>>> wrote:
>>>
 At least in our setup, users can see their own scripts by doing sacct
 -B -j JOBID

 I would make sure that the scripts are being stored and how you have
 PrivateData set.

 -Paul Edmon-
 On 10/2/2023 10:57 AM, Davide DelVento wrote:

 I deployed the job_script archival and it is working, however it can be
 queried only by root.

 A regular user can run sacct -lj towards any jobs (even those by other
 users, and that's okay in our setup) with no problem. However if they run
 sacct -j job_id --batch-script even against a job they own themselves,
 nothing is returned and I get a

 slurmdbd: error: couldn't get information for this user (null)(xx)

 where x is the posix ID of the user who's running the query in the
 slurmdbd logs.

 Both configure files slurmdbd.conf and slurm.conf do not have any
 "permission" setting. FWIW, we use LDAP.

 Is that the expected behavior, in that by default only root can see the
 job scripts? I was assuming the users themselves should be able to debug
 their own jobs... Any hint on what could be changed to achieve this?

 Thanks!



 On Fri, Sep 29, 2023 at 5:48 AM Davide DelVento <
 davide.quan...@gmail.com> wrote:

> Fantastic, this is really helpful, thanks!
>
> On Thu, Sep 28, 2023 at 12:05 PM Paul Edmon 
> wrote:
>
>> Yes it was later than that. If you are 23.02 you are good.  We've
>> been running with storing job_scripts on for years at this point and that
>> part of the database only uses up 8.4G.  Our entire database takes up 29G
>> on disk. So its about 1/3 of the database.  We also have database
>> compression which helps with the on disk size. Raw uncompressed our
>> database is about 90G.  We keep 6 months of data in our active database.
>>
>> -Paul Edmon-
>> On 9/28/2023 1:57 PM, Ryan Novosielski wrote:
>>
>> Sorry for the duplicate e-mail in a short time: do you know (or
>> anyone) when the hashing was added? Was planning to 

Re: [slurm-users] enabling job script archival

2023-10-03 Thread Davide DelVento
For others potentially seeing this on mailing list search, yes, I needed
that, which of course required creating an account charge which I wasn't
using. So I ran

sacctmgr add account default_account
sacctmgr add -i user $user Accounts=default_account

with an appropriate looping around for $user and everything is working fine
now.

Thanks everybody!

On Tue, Oct 3, 2023 at 7:44 AM Paul Edmon  wrote:

> You will probably need to.
>
> The way we handle it is that we add users when the first submit a job via
> the job_submit.lua script. This way the database autopopulates with active
> users.
>
> -Paul Edmon-
> On 10/3/23 9:01 AM, Davide DelVento wrote:
>
> By increasing the slurmdbd verbosity level, I got additional information,
> namely the following:
>
> slurmdbd: error: couldn't get information for this user (null)(xx)
> slurmdbd: debug: accounting_storage/as_mysql:
> as_mysql_jobacct_process_get_jobs: User  xx  has no associations, and
> is not admin, so not returning any jobs.
>
> again where x is the posix ID of the user who's running the query in
> the slurmdbd logs.
>
> I suspect this is due to the fact that our userbase is small enough (we
> are a department HPC) that we don't need to use allocation and the like, so
> I have not configured any association (and not even studied its
> configuration, since when I was at another place which did use
> associations, someone else took care of slurm administration).
>
> Anyway, I read the fantastic document by our own member at
> https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_accounting/#associations
> and in fact I have not even configured slurm users:
>
> # sacctmgr show user
>   User   Def Acct Admin
> -- -- -
>   root   root Administ+
> #
>
> So is that the issue? Should I just add all users? Any suggestions on the
> minimal (but robust) way to do that?
>
> Thanks!
>
>
> On Mon, Oct 2, 2023 at 9:20 AM Davide DelVento 
> wrote:
>
>> Thanks Paul, this helps.
>>
>> I don't have any PrivateData line in either config file. According to the
>> docs, "By default, all information is visible to all users" so this should
>> not be an issue. I tried to add a line with "PrivateData=jobs" to the conf
>> files, just in case, but that didn't change the behavior.
>>
>> On Mon, Oct 2, 2023 at 9:10 AM Paul Edmon  wrote:
>>
>>> At least in our setup, users can see their own scripts by doing sacct -B
>>> -j JOBID
>>>
>>> I would make sure that the scripts are being stored and how you have
>>> PrivateData set.
>>>
>>> -Paul Edmon-
>>> On 10/2/2023 10:57 AM, Davide DelVento wrote:
>>>
>>> I deployed the job_script archival and it is working, however it can be
>>> queried only by root.
>>>
>>> A regular user can run sacct -lj towards any jobs (even those by other
>>> users, and that's okay in our setup) with no problem. However if they run
>>> sacct -j job_id --batch-script even against a job they own themselves,
>>> nothing is returned and I get a
>>>
>>> slurmdbd: error: couldn't get information for this user (null)(xx)
>>>
>>> where x is the posix ID of the user who's running the query in the
>>> slurmdbd logs.
>>>
>>> Both configure files slurmdbd.conf and slurm.conf do not have any
>>> "permission" setting. FWIW, we use LDAP.
>>>
>>> Is that the expected behavior, in that by default only root can see the
>>> job scripts? I was assuming the users themselves should be able to debug
>>> their own jobs... Any hint on what could be changed to achieve this?
>>>
>>> Thanks!
>>>
>>>
>>>
>>> On Fri, Sep 29, 2023 at 5:48 AM Davide DelVento <
>>> davide.quan...@gmail.com> wrote:
>>>
 Fantastic, this is really helpful, thanks!

 On Thu, Sep 28, 2023 at 12:05 PM Paul Edmon 
 wrote:

> Yes it was later than that. If you are 23.02 you are good.  We've been
> running with storing job_scripts on for years at this point and that part
> of the database only uses up 8.4G.  Our entire database takes up 29G on
> disk. So its about 1/3 of the database.  We also have database compression
> which helps with the on disk size. Raw uncompressed our database is about
> 90G.  We keep 6 months of data in our active database.
>
> -Paul Edmon-
> On 9/28/2023 1:57 PM, Ryan Novosielski wrote:
>
> Sorry for the duplicate e-mail in a short time: do you know (or
> anyone) when the hashing was added? Was planning to enable this on 21.08,
> but we then had to delay our upgrade to it. I’m assuming later than that,
> as I believe that’s when the feature was added.
>
> On Sep 28, 2023, at 13:55, Ryan Novosielski 
>  wrote:
>
> Thank you; we’ll put in a feature request for improvements in that
> area, and also thanks for the warning? I thought of that in passing, but
> the real world experience is really useful. I could easily see wanting 
> that
> stuff to be retained less often than the main records, which is what I’d
> ask for.
>

Re: [slurm-users] enabling job script archival

2023-10-03 Thread Paul Edmon

You will probably need to.

The way we handle it is that we add users when the first submit a job 
via the job_submit.lua script. This way the database autopopulates with 
active users.


-Paul Edmon-

On 10/3/23 9:01 AM, Davide DelVento wrote:
By increasing the slurmdbd verbosity level, I got additional 
information, namely the following:


slurmdbd: error: couldn't get information for this user (null)(xx)
slurmdbd: debug: accounting_storage/as_mysql: 
as_mysql_jobacct_process_get_jobs: User xx  has no associations, 
and is not admin, so not returning any jobs.


again where x is the posix ID of the user who's running the query 
in the slurmdbd logs.


I suspect this is due to the fact that our userbase is small enough 
(we are a department HPC) that we don't need to use allocation and the 
like, so I have not configured any association (and not even studied 
its configuration, since when I was at another place which did use 
associations, someone else took care of slurm administration).


Anyway, I read the fantastic document by our own member at 
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_accounting/#associations 
and in fact I have not even configured slurm users:


# sacctmgr show user
      User   Def Acct     Admin
-- -- -
      root       root Administ+
#

So is that the issue? Should I just add all users? Any suggestions on 
the minimal (but robust) way to do that?


Thanks!


On Mon, Oct 2, 2023 at 9:20 AM Davide DelVento 
 wrote:


Thanks Paul, this helps.

I don't have any PrivateData line in either config file. According
to the docs, "By default, all information is visible to all users"
so this should not be an issue. I tried to add a line with
"PrivateData=jobs" to the conf files, just in case, but that
didn't change the behavior.

On Mon, Oct 2, 2023 at 9:10 AM Paul Edmon 
wrote:

At least in our setup, users can see their own scripts by
doing sacct -B -j JOBID

I would make sure that the scripts are being stored and how
you have PrivateData set.

-Paul Edmon-

On 10/2/2023 10:57 AM, Davide DelVento wrote:

I deployed the job_script archival and it is working, however
it can be queried only by root.

A regular user can run sacct -lj towards any jobs (even those
by other users, and that's okay in our setup) with no
problem. However if they run sacct -j job_id --batch-script
even against a job they own themselves, nothing is returned
and I get a

slurmdbd: error: couldn't get information for this user
(null)(xx)

where x is the posix ID of the user who's running the
query in the slurmdbd logs.

Both configure files slurmdbd.conf and slurm.conf do not have
any "permission" setting. FWIW, we use LDAP.

Is that the expected behavior, in that by default only root
can see the job scripts? I was assuming the users themselves
should be able to debug their own jobs... Any hint on what
could be changed to achieve this?

Thanks!



On Fri, Sep 29, 2023 at 5:48 AM Davide DelVento
 wrote:

Fantastic, this is really helpful, thanks!

On Thu, Sep 28, 2023 at 12:05 PM Paul Edmon
 wrote:

Yes it was later than that. If you are 23.02 you are
good.  We've been running with storing job_scripts on
for years at this point and that part of the database
only uses up 8.4G.  Our entire database takes up 29G
on disk. So its about 1/3 of the database.  We also
have database compression which helps with the on
disk size. Raw uncompressed our database is about
90G.  We keep 6 months of data in our active database.

-Paul Edmon-

On 9/28/2023 1:57 PM, Ryan Novosielski wrote:

Sorry for the duplicate e-mail in a short time: do
you know (or anyone) when the hashing was added? Was
planning to enable this on 21.08, but we then had to
delay our upgrade to it. I’m assuming later than
that, as I believe that’s when the feature was added.


On Sep 28, 2023, at 13:55, Ryan Novosielski

 wrote:

Thank you; we’ll put in a feature request for
improvements in that area, and also thanks for the
warning? I thought of that in passing, but the real
world experience is really useful. I could easily
see wanting that stuff to be retained less often
than the main records, which is what I’d ask for.

I assume that archiving, in general, would also
remove this stuff, since old jobs 

Re: [slurm-users] enabling job script archival

2023-10-03 Thread Davide DelVento
By increasing the slurmdbd verbosity level, I got additional information,
namely the following:

slurmdbd: error: couldn't get information for this user (null)(xx)
slurmdbd: debug: accounting_storage/as_mysql:
as_mysql_jobacct_process_get_jobs: User  xx  has no associations, and
is not admin, so not returning any jobs.

again where x is the posix ID of the user who's running the query in
the slurmdbd logs.

I suspect this is due to the fact that our userbase is small enough (we are
a department HPC) that we don't need to use allocation and the like, so I
have not configured any association (and not even studied its
configuration, since when I was at another place which did use
associations, someone else took care of slurm administration).

Anyway, I read the fantastic document by our own member at
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_accounting/#associations
and in fact I have not even configured slurm users:

# sacctmgr show user
  User   Def Acct Admin
-- -- -
  root   root Administ+
#

So is that the issue? Should I just add all users? Any suggestions on the
minimal (but robust) way to do that?

Thanks!


On Mon, Oct 2, 2023 at 9:20 AM Davide DelVento 
wrote:

> Thanks Paul, this helps.
>
> I don't have any PrivateData line in either config file. According to the
> docs, "By default, all information is visible to all users" so this should
> not be an issue. I tried to add a line with "PrivateData=jobs" to the conf
> files, just in case, but that didn't change the behavior.
>
> On Mon, Oct 2, 2023 at 9:10 AM Paul Edmon  wrote:
>
>> At least in our setup, users can see their own scripts by doing sacct -B
>> -j JOBID
>>
>> I would make sure that the scripts are being stored and how you have
>> PrivateData set.
>>
>> -Paul Edmon-
>> On 10/2/2023 10:57 AM, Davide DelVento wrote:
>>
>> I deployed the job_script archival and it is working, however it can be
>> queried only by root.
>>
>> A regular user can run sacct -lj towards any jobs (even those by other
>> users, and that's okay in our setup) with no problem. However if they run
>> sacct -j job_id --batch-script even against a job they own themselves,
>> nothing is returned and I get a
>>
>> slurmdbd: error: couldn't get information for this user (null)(xx)
>>
>> where x is the posix ID of the user who's running the query in the
>> slurmdbd logs.
>>
>> Both configure files slurmdbd.conf and slurm.conf do not have any
>> "permission" setting. FWIW, we use LDAP.
>>
>> Is that the expected behavior, in that by default only root can see the
>> job scripts? I was assuming the users themselves should be able to debug
>> their own jobs... Any hint on what could be changed to achieve this?
>>
>> Thanks!
>>
>>
>>
>> On Fri, Sep 29, 2023 at 5:48 AM Davide DelVento 
>> wrote:
>>
>>> Fantastic, this is really helpful, thanks!
>>>
>>> On Thu, Sep 28, 2023 at 12:05 PM Paul Edmon 
>>> wrote:
>>>
 Yes it was later than that. If you are 23.02 you are good.  We've been
 running with storing job_scripts on for years at this point and that part
 of the database only uses up 8.4G.  Our entire database takes up 29G on
 disk. So its about 1/3 of the database.  We also have database compression
 which helps with the on disk size. Raw uncompressed our database is about
 90G.  We keep 6 months of data in our active database.

 -Paul Edmon-
 On 9/28/2023 1:57 PM, Ryan Novosielski wrote:

 Sorry for the duplicate e-mail in a short time: do you know (or anyone)
 when the hashing was added? Was planning to enable this on 21.08, but we
 then had to delay our upgrade to it. I’m assuming later than that, as I
 believe that’s when the feature was added.

 On Sep 28, 2023, at 13:55, Ryan Novosielski 
  wrote:

 Thank you; we’ll put in a feature request for improvements in that
 area, and also thanks for the warning? I thought of that in passing, but
 the real world experience is really useful. I could easily see wanting that
 stuff to be retained less often than the main records, which is what I’d
 ask for.

 I assume that archiving, in general, would also remove this stuff,
 since old jobs themselves will be removed?

 --
 #BlackLivesMatter
 
 || \\UTGERS,
 |---*O*---
 ||_// the State  | Ryan Novosielski - novos...@rutgers.edu
 || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~
 RBHS Campus
 ||  \\of NJ  | Office of Advanced Research Computing - MSB
 A555B, Newark
  `'

 On Sep 28, 2023, at 13:48, Paul Edmon 
  wrote:

 Slurm should take care of it when you add it.

 So far as horror stories, under previous versions our database size
 ballooned to be so massive that it actually prevented us from upgrading and
 we had to drop the columns containing the 

Re: [slurm-users] enabling job script archival

2023-10-02 Thread Davide DelVento
Thanks Paul, this helps.

I don't have any PrivateData line in either config file. According to the
docs, "By default, all information is visible to all users" so this should
not be an issue. I tried to add a line with "PrivateData=jobs" to the conf
files, just in case, but that didn't change the behavior.

On Mon, Oct 2, 2023 at 9:10 AM Paul Edmon  wrote:

> At least in our setup, users can see their own scripts by doing sacct -B
> -j JOBID
>
> I would make sure that the scripts are being stored and how you have
> PrivateData set.
>
> -Paul Edmon-
> On 10/2/2023 10:57 AM, Davide DelVento wrote:
>
> I deployed the job_script archival and it is working, however it can be
> queried only by root.
>
> A regular user can run sacct -lj towards any jobs (even those by other
> users, and that's okay in our setup) with no problem. However if they run
> sacct -j job_id --batch-script even against a job they own themselves,
> nothing is returned and I get a
>
> slurmdbd: error: couldn't get information for this user (null)(xx)
>
> where x is the posix ID of the user who's running the query in the
> slurmdbd logs.
>
> Both configure files slurmdbd.conf and slurm.conf do not have any
> "permission" setting. FWIW, we use LDAP.
>
> Is that the expected behavior, in that by default only root can see the
> job scripts? I was assuming the users themselves should be able to debug
> their own jobs... Any hint on what could be changed to achieve this?
>
> Thanks!
>
>
>
> On Fri, Sep 29, 2023 at 5:48 AM Davide DelVento 
> wrote:
>
>> Fantastic, this is really helpful, thanks!
>>
>> On Thu, Sep 28, 2023 at 12:05 PM Paul Edmon 
>> wrote:
>>
>>> Yes it was later than that. If you are 23.02 you are good.  We've been
>>> running with storing job_scripts on for years at this point and that part
>>> of the database only uses up 8.4G.  Our entire database takes up 29G on
>>> disk. So its about 1/3 of the database.  We also have database compression
>>> which helps with the on disk size. Raw uncompressed our database is about
>>> 90G.  We keep 6 months of data in our active database.
>>>
>>> -Paul Edmon-
>>> On 9/28/2023 1:57 PM, Ryan Novosielski wrote:
>>>
>>> Sorry for the duplicate e-mail in a short time: do you know (or anyone)
>>> when the hashing was added? Was planning to enable this on 21.08, but we
>>> then had to delay our upgrade to it. I’m assuming later than that, as I
>>> believe that’s when the feature was added.
>>>
>>> On Sep 28, 2023, at 13:55, Ryan Novosielski 
>>>  wrote:
>>>
>>> Thank you; we’ll put in a feature request for improvements in that area,
>>> and also thanks for the warning? I thought of that in passing, but the real
>>> world experience is really useful. I could easily see wanting that stuff to
>>> be retained less often than the main records, which is what I’d ask for.
>>>
>>> I assume that archiving, in general, would also remove this stuff, since
>>> old jobs themselves will be removed?
>>>
>>> --
>>> #BlackLivesMatter
>>> 
>>> || \\UTGERS,
>>> |---*O*---
>>> ||_// the State  | Ryan Novosielski - novos...@rutgers.edu
>>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~
>>> RBHS Campus
>>> ||  \\of NJ  | Office of Advanced Research Computing - MSB
>>> A555B, Newark
>>>  `'
>>>
>>> On Sep 28, 2023, at 13:48, Paul Edmon 
>>>  wrote:
>>>
>>> Slurm should take care of it when you add it.
>>>
>>> So far as horror stories, under previous versions our database size
>>> ballooned to be so massive that it actually prevented us from upgrading and
>>> we had to drop the columns containing the job_script and job_env.  This was
>>> back before slurm started hashing the scripts so that it would only store
>>> one copy of duplicate scripts.  After this point we found that the
>>> job_script database stayed at a fairly reasonable size as most users use
>>> functionally the same script each time. However the job_env continued to
>>> grow like crazy as there are variables in our environment that change
>>> fairly consistently depending on where the user is. Thus job_envs ended up
>>> being too massive to keep around and so we had to drop them. Frankly we
>>> never really used them for debugging. The job_scripts though are super
>>> useful and not that much overhead.
>>>
>>> In summary my recommendation is to only store job_scripts. job_envs add
>>> too much storage for little gain, unless your job_envs are basically the
>>> same for each user in each location.
>>>
>>> Also it should be noted that there is no way to prune out job_scripts or
>>> job_envs right now. So the only way to get rid of them if they get large is
>>> to 0 out the column in the table. You can ask SchedMD for the mysql command
>>> to do this as we had to do it here to our job_envs.
>>>
>>> -Paul Edmon-
>>>
>>> On 9/28/2023 1:40 PM, Davide DelVento wrote:
>>>
>>> In my current slurm installation, (recently upgraded to slurm v23.02.3),
>>> I only have
>>>
>>> 

Re: [slurm-users] enabling job script archival

2023-10-02 Thread Paul Edmon
At least in our setup, users can see their own scripts by doing sacct -B 
-j JOBID


I would make sure that the scripts are being stored and how you have 
PrivateData set.


-Paul Edmon-

On 10/2/2023 10:57 AM, Davide DelVento wrote:
I deployed the job_script archival and it is working, however it can 
be queried only by root.


A regular user can run sacct -lj towards any jobs (even those by other 
users, and that's okay in our setup) with no problem. However if they 
run sacct -j job_id --batch-script even against a job they own 
themselves, nothing is returned and I get a


slurmdbd: error: couldn't get information for this user (null)(xx)

where x is the posix ID of the user who's running the query in the 
slurmdbd logs.


Both configure files slurmdbd.conf and slurm.conf do not have any 
"permission" setting. FWIW, we use LDAP.


Is that the expected behavior, in that by default only root can see 
the job scripts? I was assuming the users themselves should be able to 
debug their own jobs... Any hint on what could be changed to achieve this?


Thanks!



On Fri, Sep 29, 2023 at 5:48 AM Davide DelVento 
 wrote:


Fantastic, this is really helpful, thanks!

On Thu, Sep 28, 2023 at 12:05 PM Paul Edmon
 wrote:

Yes it was later than that. If you are 23.02 you are good. 
We've been running with storing job_scripts on for years at
this point and that part of the database only uses up 8.4G. 
Our entire database takes up 29G on disk. So its about 1/3 of
the database.  We also have database compression which helps
with the on disk size. Raw uncompressed our database is about
90G.  We keep 6 months of data in our active database.

-Paul Edmon-

On 9/28/2023 1:57 PM, Ryan Novosielski wrote:

Sorry for the duplicate e-mail in a short time: do you know
(or anyone) when the hashing was added? Was planning to
enable this on 21.08, but we then had to delay our upgrade to
it. I’m assuming later than that, as I believe that’s when
the feature was added.


On Sep 28, 2023, at 13:55, Ryan Novosielski
  wrote:

Thank you; we’ll put in a feature request for improvements
in that area, and also thanks for the warning? I thought of
that in passing, but the real world experience is really
useful. I could easily see wanting that stuff to be retained
less often than the main records, which is what I’d ask for.

I assume that archiving, in general, would also remove this
stuff, since old jobs themselves will be removed?

--
#BlackLivesMatter

|| \\UTGERS,
|---*O*---
||_// the State |         Ryan Novosielski -
novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922)
~*~ RBHS Campus
||  \\    of NJ | Office of Advanced Research Computing -
MSB A555B, Newark
     `'


On Sep 28, 2023, at 13:48, Paul Edmon
  wrote:

Slurm should take care of it when you add it.

So far as horror stories, under previous versions our
database size ballooned to be so massive that it actually
prevented us from upgrading and we had to drop the columns
containing the job_script and job_env.  This was back
before slurm started hashing the scripts so that it would
only store one copy of duplicate scripts.  After this point
we found that the job_script database stayed at a fairly
reasonable size as most users use functionally the same
script each time. However the job_env continued to grow
like crazy as there are variables in our environment that
change fairly consistently depending on where the user is.
Thus job_envs ended up being too massive to keep around and
so we had to drop them. Frankly we never really used them
for debugging. The job_scripts though are super useful and
not that much overhead.

In summary my recommendation is to only store job_scripts.
job_envs add too much storage for little gain, unless your
job_envs are basically the same for each user in each location.

Also it should be noted that there is no way to prune out
job_scripts or job_envs right now. So the only way to get
rid of them if they get large is to 0 out the column in the
table. You can ask SchedMD for the mysql command to do this
as we had to do it here to our job_envs.

-Paul Edmon-

On 9/28/2023 1:40 PM, Davide DelVento wrote:

In my current slurm installation, (recently upgraded to
slurm v23.02.3), I only have

AccountingStoreFlags=job_comment

I now intend to add both


Re: [slurm-users] enabling job script archival

2023-10-02 Thread Davide DelVento
I deployed the job_script archival and it is working, however it can be
queried only by root.

A regular user can run sacct -lj towards any jobs (even those by other
users, and that's okay in our setup) with no problem. However if they run
sacct -j job_id --batch-script even against a job they own themselves,
nothing is returned and I get a

slurmdbd: error: couldn't get information for this user (null)(xx)

where x is the posix ID of the user who's running the query in the
slurmdbd logs.

Both configure files slurmdbd.conf and slurm.conf do not have any
"permission" setting. FWIW, we use LDAP.

Is that the expected behavior, in that by default only root can see the job
scripts? I was assuming the users themselves should be able to debug their
own jobs... Any hint on what could be changed to achieve this?

Thanks!



On Fri, Sep 29, 2023 at 5:48 AM Davide DelVento 
wrote:

> Fantastic, this is really helpful, thanks!
>
> On Thu, Sep 28, 2023 at 12:05 PM Paul Edmon 
> wrote:
>
>> Yes it was later than that. If you are 23.02 you are good.  We've been
>> running with storing job_scripts on for years at this point and that part
>> of the database only uses up 8.4G.  Our entire database takes up 29G on
>> disk. So its about 1/3 of the database.  We also have database compression
>> which helps with the on disk size. Raw uncompressed our database is about
>> 90G.  We keep 6 months of data in our active database.
>>
>> -Paul Edmon-
>> On 9/28/2023 1:57 PM, Ryan Novosielski wrote:
>>
>> Sorry for the duplicate e-mail in a short time: do you know (or anyone)
>> when the hashing was added? Was planning to enable this on 21.08, but we
>> then had to delay our upgrade to it. I’m assuming later than that, as I
>> believe that’s when the feature was added.
>>
>> On Sep 28, 2023, at 13:55, Ryan Novosielski 
>>  wrote:
>>
>> Thank you; we’ll put in a feature request for improvements in that area,
>> and also thanks for the warning? I thought of that in passing, but the real
>> world experience is really useful. I could easily see wanting that stuff to
>> be retained less often than the main records, which is what I’d ask for.
>>
>> I assume that archiving, in general, would also remove this stuff, since
>> old jobs themselves will be removed?
>>
>> --
>> #BlackLivesMatter
>> 
>> || \\UTGERS,
>> |---*O*---
>> ||_// the State  | Ryan Novosielski - novos...@rutgers.edu
>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~
>> RBHS Campus
>> ||  \\of NJ  | Office of Advanced Research Computing - MSB
>> A555B, Newark
>>  `'
>>
>> On Sep 28, 2023, at 13:48, Paul Edmon 
>>  wrote:
>>
>> Slurm should take care of it when you add it.
>>
>> So far as horror stories, under previous versions our database size
>> ballooned to be so massive that it actually prevented us from upgrading and
>> we had to drop the columns containing the job_script and job_env.  This was
>> back before slurm started hashing the scripts so that it would only store
>> one copy of duplicate scripts.  After this point we found that the
>> job_script database stayed at a fairly reasonable size as most users use
>> functionally the same script each time. However the job_env continued to
>> grow like crazy as there are variables in our environment that change
>> fairly consistently depending on where the user is. Thus job_envs ended up
>> being too massive to keep around and so we had to drop them. Frankly we
>> never really used them for debugging. The job_scripts though are super
>> useful and not that much overhead.
>>
>> In summary my recommendation is to only store job_scripts. job_envs add
>> too much storage for little gain, unless your job_envs are basically the
>> same for each user in each location.
>>
>> Also it should be noted that there is no way to prune out job_scripts or
>> job_envs right now. So the only way to get rid of them if they get large is
>> to 0 out the column in the table. You can ask SchedMD for the mysql command
>> to do this as we had to do it here to our job_envs.
>>
>> -Paul Edmon-
>>
>> On 9/28/2023 1:40 PM, Davide DelVento wrote:
>>
>> In my current slurm installation, (recently upgraded to slurm v23.02.3),
>> I only have
>>
>> AccountingStoreFlags=job_comment
>>
>> I now intend to add both
>>
>> AccountingStoreFlags=job_script
>> AccountingStoreFlags=job_env
>>
>> leaving the default 4MB value for max_script_size
>>
>> Do I need to do anything on the DB myself, or will slurm take care of the
>> additional tables if needed?
>>
>> Any comments/suggestions/gotcha/pitfalls/horror_stories to share? I know
>> about the additional diskspace and potentially load needed, and with our
>> resources and typical workload I should be okay with that.
>>
>> Thanks!
>>
>>
>>
>>
>>


Re: [slurm-users] enabling job script archival

2023-09-29 Thread Davide DelVento
Fantastic, this is really helpful, thanks!

On Thu, Sep 28, 2023 at 12:05 PM Paul Edmon  wrote:

> Yes it was later than that. If you are 23.02 you are good.  We've been
> running with storing job_scripts on for years at this point and that part
> of the database only uses up 8.4G.  Our entire database takes up 29G on
> disk. So its about 1/3 of the database.  We also have database compression
> which helps with the on disk size. Raw uncompressed our database is about
> 90G.  We keep 6 months of data in our active database.
>
> -Paul Edmon-
> On 9/28/2023 1:57 PM, Ryan Novosielski wrote:
>
> Sorry for the duplicate e-mail in a short time: do you know (or anyone)
> when the hashing was added? Was planning to enable this on 21.08, but we
> then had to delay our upgrade to it. I’m assuming later than that, as I
> believe that’s when the feature was added.
>
> On Sep 28, 2023, at 13:55, Ryan Novosielski 
>  wrote:
>
> Thank you; we’ll put in a feature request for improvements in that area,
> and also thanks for the warning? I thought of that in passing, but the real
> world experience is really useful. I could easily see wanting that stuff to
> be retained less often than the main records, which is what I’d ask for.
>
> I assume that archiving, in general, would also remove this stuff, since
> old jobs themselves will be removed?
>
> --
> #BlackLivesMatter
> 
> || \\UTGERS, |---*O*---
> ||_// the State  | Ryan Novosielski - novos...@rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> ||  \\of NJ  | Office of Advanced Research Computing - MSB
> A555B, Newark
>  `'
>
> On Sep 28, 2023, at 13:48, Paul Edmon 
>  wrote:
>
> Slurm should take care of it when you add it.
>
> So far as horror stories, under previous versions our database size
> ballooned to be so massive that it actually prevented us from upgrading and
> we had to drop the columns containing the job_script and job_env.  This was
> back before slurm started hashing the scripts so that it would only store
> one copy of duplicate scripts.  After this point we found that the
> job_script database stayed at a fairly reasonable size as most users use
> functionally the same script each time. However the job_env continued to
> grow like crazy as there are variables in our environment that change
> fairly consistently depending on where the user is. Thus job_envs ended up
> being too massive to keep around and so we had to drop them. Frankly we
> never really used them for debugging. The job_scripts though are super
> useful and not that much overhead.
>
> In summary my recommendation is to only store job_scripts. job_envs add
> too much storage for little gain, unless your job_envs are basically the
> same for each user in each location.
>
> Also it should be noted that there is no way to prune out job_scripts or
> job_envs right now. So the only way to get rid of them if they get large is
> to 0 out the column in the table. You can ask SchedMD for the mysql command
> to do this as we had to do it here to our job_envs.
>
> -Paul Edmon-
>
> On 9/28/2023 1:40 PM, Davide DelVento wrote:
>
> In my current slurm installation, (recently upgraded to slurm v23.02.3), I
> only have
>
> AccountingStoreFlags=job_comment
>
> I now intend to add both
>
> AccountingStoreFlags=job_script
> AccountingStoreFlags=job_env
>
> leaving the default 4MB value for max_script_size
>
> Do I need to do anything on the DB myself, or will slurm take care of the
> additional tables if needed?
>
> Any comments/suggestions/gotcha/pitfalls/horror_stories to share? I know
> about the additional diskspace and potentially load needed, and with our
> resources and typical workload I should be okay with that.
>
> Thanks!
>
>
>
>
>


Re: [slurm-users] enabling job script archival

2023-09-28 Thread Paul Edmon
Yes it was later than that. If you are 23.02 you are good.  We've been 
running with storing job_scripts on for years at this point and that 
part of the database only uses up 8.4G.  Our entire database takes up 
29G on disk. So its about 1/3 of the database. We also have database 
compression which helps with the on disk size. Raw uncompressed our 
database is about 90G.  We keep 6 months of data in our active database.


-Paul Edmon-

On 9/28/2023 1:57 PM, Ryan Novosielski wrote:
Sorry for the duplicate e-mail in a short time: do you know (or 
anyone) when the hashing was added? Was planning to enable this on 
21.08, but we then had to delay our upgrade to it. I’m assuming later 
than that, as I believe that’s when the feature was added.



On Sep 28, 2023, at 13:55, Ryan Novosielski  wrote:

Thank you; we’ll put in a feature request for improvements in that 
area, and also thanks for the warning? I thought of that in passing, 
but the real world experience is really useful. I could easily see 
wanting that stuff to be retained less often than the main records, 
which is what I’d ask for.


I assume that archiving, in general, would also remove this stuff, 
since old jobs themselves will be removed?


--
#BlackLivesMatter

|| \\UTGERS, |---*O*---
||_// the State |         Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ 
RBHS Campus
||  \\    of NJ | Office of Advanced Research Computing - MSB 
A555B, Newark

     `'


On Sep 28, 2023, at 13:48, Paul Edmon  wrote:

Slurm should take care of it when you add it.

So far as horror stories, under previous versions our database size 
ballooned to be so massive that it actually prevented us from 
upgrading and we had to drop the columns containing the job_script 
and job_env.  This was back before slurm started hashing the scripts 
so that it would only store one copy of duplicate scripts.  After 
this point we found that the job_script database stayed at a fairly 
reasonable size as most users use functionally the same script each 
time. However the job_env continued to grow like crazy as there are 
variables in our environment that change fairly consistently 
depending on where the user is. Thus job_envs ended up being too 
massive to keep around and so we had to drop them. Frankly we never 
really used them for debugging. The job_scripts though are super 
useful and not that much overhead.


In summary my recommendation is to only store job_scripts. job_envs 
add too much storage for little gain, unless your job_envs are 
basically the same for each user in each location.


Also it should be noted that there is no way to prune out 
job_scripts or job_envs right now. So the only way to get rid of 
them if they get large is to 0 out the column in the table. You can 
ask SchedMD for the mysql command to do this as we had to do it here 
to our job_envs.


-Paul Edmon-

On 9/28/2023 1:40 PM, Davide DelVento wrote:
In my current slurm installation, (recently upgraded to slurm 
v23.02.3), I only have


AccountingStoreFlags=job_comment

I now intend to add both

AccountingStoreFlags=job_script
AccountingStoreFlags=job_env

leaving the default 4MB value for max_script_size

Do I need to do anything on the DB myself, or will slurm take care 
of the additional tables if needed?


Any comments/suggestions/gotcha/pitfalls/horror_stories to share? I 
know about the additional diskspace and potentially load needed, 
and with our resources and typical workload I should be okay with that.


Thanks!






Re: [slurm-users] enabling job script archival

2023-09-28 Thread Paul Edmon
No, all the archiving does is remove the pointer.  What slurm does right 
now is that it creates a hash of the job_script/job_env and then checks 
and sees if that hash matches one on record. If not then it adds it to 
the record, if it does match then it adds a pointer to the appropriate 
record.  So you can think of the job_script/job_env as an internal 
database of all the various scripts and envs that slurm has ever seen 
and then what ends up in the Job record is a pointer to that database.  
This way slurm can deduplicate scripts/envs that are the same. This 
works great for job_scripts as they are functionally the same and thus 
you have many jobs pointed to the same script, but less so for job_envs.


-Paul Edmon-

On 9/28/2023 1:55 PM, Ryan Novosielski wrote:
Thank you; we’ll put in a feature request for improvements in that 
area, and also thanks for the warning? I thought of that in passing, 
but the real world experience is really useful. I could easily see 
wanting that stuff to be retained less often than the main records, 
which is what I’d ask for.


I assume that archiving, in general, would also remove this stuff, 
since old jobs themselves will be removed?


--
#BlackLivesMatter

|| \\UTGERS, |---*O*---
||_// the State |         Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ 
RBHS Campus
||  \\    of NJ | Office of Advanced Research Computing - MSB 
A555B, Newark

     `'


On Sep 28, 2023, at 13:48, Paul Edmon  wrote:

Slurm should take care of it when you add it.

So far as horror stories, under previous versions our database size 
ballooned to be so massive that it actually prevented us from 
upgrading and we had to drop the columns containing the job_script 
and job_env.  This was back before slurm started hashing the scripts 
so that it would only store one copy of duplicate scripts.  After 
this point we found that the job_script database stayed at a fairly 
reasonable size as most users use functionally the same script each 
time. However the job_env continued to grow like crazy as there are 
variables in our environment that change fairly consistently 
depending on where the user is. Thus job_envs ended up being too 
massive to keep around and so we had to drop them. Frankly we never 
really used them for debugging. The job_scripts though are super 
useful and not that much overhead.


In summary my recommendation is to only store job_scripts. job_envs 
add too much storage for little gain, unless your job_envs are 
basically the same for each user in each location.


Also it should be noted that there is no way to prune out job_scripts 
or job_envs right now. So the only way to get rid of them if they get 
large is to 0 out the column in the table. You can ask SchedMD for 
the mysql command to do this as we had to do it here to our job_envs.


-Paul Edmon-

On 9/28/2023 1:40 PM, Davide DelVento wrote:
In my current slurm installation, (recently upgraded to slurm 
v23.02.3), I only have


AccountingStoreFlags=job_comment

I now intend to add both

AccountingStoreFlags=job_script
AccountingStoreFlags=job_env

leaving the default 4MB value for max_script_size

Do I need to do anything on the DB myself, or will slurm take care 
of the additional tables if needed?


Any comments/suggestions/gotcha/pitfalls/horror_stories to share? I 
know about the additional diskspace and potentially load needed, and 
with our resources and typical workload I should be okay with that.


Thanks!




Re: [slurm-users] enabling job script archival

2023-09-28 Thread Ryan Novosielski
Sorry for the duplicate e-mail in a short time: do you know (or anyone) when 
the hashing was added? Was planning to enable this on 21.08, but we then had to 
delay our upgrade to it. I’m assuming later than that, as I believe that’s when 
the feature was added.

On Sep 28, 2023, at 13:55, Ryan Novosielski  wrote:

Thank you; we’ll put in a feature request for improvements in that area, and 
also thanks for the warning? I thought of that in passing, but the real world 
experience is really useful. I could easily see wanting that stuff to be 
retained less often than the main records, which is what I’d ask for.

I assume that archiving, in general, would also remove this stuff, since old 
jobs themselves will be removed?

--
#BlackLivesMatter

|| \\UTGERS, |---*O*---
||_// the State  | Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\of NJ  | Office of Advanced Research Computing - MSB A555B, Newark
 `'

On Sep 28, 2023, at 13:48, Paul Edmon  wrote:

Slurm should take care of it when you add it.

So far as horror stories, under previous versions our database size ballooned 
to be so massive that it actually prevented us from upgrading and we had to 
drop the columns containing the job_script and job_env.  This was back before 
slurm started hashing the scripts so that it would only store one copy of 
duplicate scripts.  After this point we found that the job_script database 
stayed at a fairly reasonable size as most users use functionally the same 
script each time. However the job_env continued to grow like crazy as there are 
variables in our environment that change fairly consistently depending on where 
the user is. Thus job_envs ended up being too massive to keep around and so we 
had to drop them. Frankly we never really used them for debugging. The 
job_scripts though are super useful and not that much overhead.

In summary my recommendation is to only store job_scripts. job_envs add too 
much storage for little gain, unless your job_envs are basically the same for 
each user in each location.

Also it should be noted that there is no way to prune out job_scripts or 
job_envs right now. So the only way to get rid of them if they get large is to 
0 out the column in the table. You can ask SchedMD for the mysql command to do 
this as we had to do it here to our job_envs.

-Paul Edmon-

On 9/28/2023 1:40 PM, Davide DelVento wrote:
In my current slurm installation, (recently upgraded to slurm v23.02.3), I only 
have

AccountingStoreFlags=job_comment

I now intend to add both

AccountingStoreFlags=job_script
AccountingStoreFlags=job_env

leaving the default 4MB value for max_script_size

Do I need to do anything on the DB myself, or will slurm take care of the 
additional tables if needed?

Any comments/suggestions/gotcha/pitfalls/horror_stories to share? I know about 
the additional diskspace and potentially load needed, and with our resources 
and typical workload I should be okay with that.

Thanks!





Re: [slurm-users] enabling job script archival

2023-09-28 Thread Ryan Novosielski
Thank you; we’ll put in a feature request for improvements in that area, and 
also thanks for the warning? I thought of that in passing, but the real world 
experience is really useful. I could easily see wanting that stuff to be 
retained less often than the main records, which is what I’d ask for.

I assume that archiving, in general, would also remove this stuff, since old 
jobs themselves will be removed?

--
#BlackLivesMatter

|| \\UTGERS, |---*O*---
||_// the State  | Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\of NJ  | Office of Advanced Research Computing - MSB A555B, Newark
 `'

On Sep 28, 2023, at 13:48, Paul Edmon  wrote:

Slurm should take care of it when you add it.

So far as horror stories, under previous versions our database size ballooned 
to be so massive that it actually prevented us from upgrading and we had to 
drop the columns containing the job_script and job_env.  This was back before 
slurm started hashing the scripts so that it would only store one copy of 
duplicate scripts.  After this point we found that the job_script database 
stayed at a fairly reasonable size as most users use functionally the same 
script each time. However the job_env continued to grow like crazy as there are 
variables in our environment that change fairly consistently depending on where 
the user is. Thus job_envs ended up being too massive to keep around and so we 
had to drop them. Frankly we never really used them for debugging. The 
job_scripts though are super useful and not that much overhead.

In summary my recommendation is to only store job_scripts. job_envs add too 
much storage for little gain, unless your job_envs are basically the same for 
each user in each location.

Also it should be noted that there is no way to prune out job_scripts or 
job_envs right now. So the only way to get rid of them if they get large is to 
0 out the column in the table. You can ask SchedMD for the mysql command to do 
this as we had to do it here to our job_envs.

-Paul Edmon-

On 9/28/2023 1:40 PM, Davide DelVento wrote:
In my current slurm installation, (recently upgraded to slurm v23.02.3), I only 
have

AccountingStoreFlags=job_comment

I now intend to add both

AccountingStoreFlags=job_script
AccountingStoreFlags=job_env

leaving the default 4MB value for max_script_size

Do I need to do anything on the DB myself, or will slurm take care of the 
additional tables if needed?

Any comments/suggestions/gotcha/pitfalls/horror_stories to share? I know about 
the additional diskspace and potentially load needed, and with our resources 
and typical workload I should be okay with that.

Thanks!




Re: [slurm-users] enabling job script archival

2023-09-28 Thread Paul Edmon

Slurm should take care of it when you add it.

So far as horror stories, under previous versions our database size 
ballooned to be so massive that it actually prevented us from upgrading 
and we had to drop the columns containing the job_script and job_env.  
This was back before slurm started hashing the scripts so that it would 
only store one copy of duplicate scripts.  After this point we found 
that the job_script database stayed at a fairly reasonable size as most 
users use functionally the same script each time. However the job_env 
continued to grow like crazy as there are variables in our environment 
that change fairly consistently depending on where the user is. Thus 
job_envs ended up being too massive to keep around and so we had to drop 
them. Frankly we never really used them for debugging. The job_scripts 
though are super useful and not that much overhead.


In summary my recommendation is to only store job_scripts. job_envs add 
too much storage for little gain, unless your job_envs are basically the 
same for each user in each location.


Also it should be noted that there is no way to prune out job_scripts or 
job_envs right now. So the only way to get rid of them if they get large 
is to 0 out the column in the table. You can ask SchedMD for the mysql 
command to do this as we had to do it here to our job_envs.


-Paul Edmon-

On 9/28/2023 1:40 PM, Davide DelVento wrote:
In my current slurm installation, (recently upgraded to slurm 
v23.02.3), I only have


AccountingStoreFlags=job_comment

I now intend to add both

AccountingStoreFlags=job_script
AccountingStoreFlags=job_env

leaving the default 4MB value for max_script_size

Do I need to do anything on the DB myself, or will slurm take care of 
the additional tables if needed?


Any comments/suggestions/gotcha/pitfalls/horror_stories to share? I 
know about the additional diskspace and potentially load needed, and 
with our resources and typical workload I should be okay with that.


Thanks!