[slurm-dev] job with different resources

2016-10-11 Thread Danny Marc Rotscher
Hello everybody,

one of our user want to run a job with many nodes, which need different 
resources.
1 - 2: je 2 CPUs, 30 GB RAM
3 - 4: je 24 CPUs, 64GB RAM
5 -10: je 8 CPUs, 40GB RAM
It is possible to start such a job without wasting resources, because the most 
of our nodes have 24 CPUs and 64GB RAM.
It is also possible to start different jobs, for every resource group one for 
example, but they have to start at the same time.
Could anybody solve this request?

Kind regards,
Danny Rotscher

smime.p7s
Description: S/MIME cryptographic signature


[slurm-dev] Re: Draining, Maint or ?

2016-10-11 Thread Lachlan Musicman
Ok, I think what I want is to set the state of the partitions to down:

http://slurm.schedmd.com/scontrol.html#OPT_SPECIFICATIONS-FOR-CREATE,-UPDATE,-AND-DELETE-COMMANDS,-PARTITIONS

ie,
 - no newly queued jobs will be started on that partition
 - slurm will continue to accept jobs for that partition
 - jobs running on that partition will continue to do so

cheers
L.



--
The most dangerous phrase in the language is, "We've always done it this
way."

- Grace Hopper

On 12 October 2016 at 10:35, Lachlan Musicman  wrote:

> Hola,
>
> For reasons, our IT team needs some downtime on our authentication server
> (FreeIPA/sssd).
>
> We would like to minimize the disruption, but also not lose any work.
>
> The current plan is for the nodes to be set to DRAIN on Friday afternoon
> and on Monday morning we will suspend any running jobs, make the changes,
> then resume nodes then jobs when complete.
>
> If we set all nodes to drain, will the partitions still accept jobs on the
> queue, but just line them up waiting for the resources to come back online,
> or does setting all resources to drain prevent people from putting jobs in
> a queue?
>
> Is MAINT what I'm after?
>
> Cheers
> L.
>
> --
> The most dangerous phrase in the language is, "We've always done it this
> way."
>
> - Grace Hopper
>


[slurm-dev] Draining, Maint or ?

2016-10-11 Thread Lachlan Musicman
Hola,

For reasons, our IT team needs some downtime on our authentication server
(FreeIPA/sssd).

We would like to minimize the disruption, but also not lose any work.

The current plan is for the nodes to be set to DRAIN on Friday afternoon
and on Monday morning we will suspend any running jobs, make the changes,
then resume nodes then jobs when complete.

If we set all nodes to drain, will the partitions still accept jobs on the
queue, but just line them up waiting for the resources to come back online,
or does setting all resources to drain prevent people from putting jobs in
a queue?

Is MAINT what I'm after?

Cheers
L.

--
The most dangerous phrase in the language is, "We've always done it this
way."

- Grace Hopper


[slurm-dev] slurmdbd.log gets created with 600 file perm - Any way to change this to 755 by default?

2016-10-11 Thread Balaji Deivam
Hello,

I see all the slurm logs file gets created with restricted file permission.
Is it any wat to change ti by default to allow groups to read the file?

-rw--- 1 sassrv sas  372 Sep 27 14:36 slurmdbd.27Sep2016.log
-rw--- 1 sassrv sas   281841 Sep 27 14:36 slurmctld.27Sep2016.log
-rw--- 1 sassrv sas  505 Oct  3 14:25 slurmdbd.03Oct2016.log
-rw--- 1 sassrv sas   391403 Oct  3 14:25 slurmctld.03Oct2016.log
-rw--- 1 sassrv sas  300 Oct  3 14:26 slurmdbd.log
-rw--- 1 sassrv sas   386379 Oct 11 16:59 slurmctld.log


Thanks & Regards,
Balaji


[slurm-dev] SLURM 15.08.12; disable sview?

2016-10-11 Thread Ryan Novosielski
Hi there,

I build SLURM 15.08.4 without the required libraries to build sview. That was 
fine, but someone later asked us for sview, so we added the dependencies and 
rebuilt. Now, upgrading to 15.08.12, we’re seeing that the slurm-15.08.12 RPM, 
which will need to go on all compute nodes, will require a whole bunch of X11 
stuff that we don’t have on our compute nodes. I’m contemplating just 
installing the stuff on the compute node image, but considering these are 
stateless nodes, I consider such things carefully. Is there an easy way to 
disable building of sview somehow through the RPM process, even if I do have 
the dependencies available? I don’t see any config switch in the specfile, or 
any other obvious way to do it.

TIA,
--

|| \\UTGERS, |---*O*---
||_// the State  | Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\of NJ  | Office of Advanced Research Computing - MSB C630, Newark
`'



[slurm-dev] Re: Slurmctld auto restart and kill running job, why ?

2016-10-11 Thread Douglas Jacobsen
Fyi, sending sighup to slurmctld is sufficient for rotating the
slurmctld.log file.  No need to actually restart it all the way.  It is
good to know the cause behind the deleted jobs.

Doug

On Oct 11, 2016 7:36 AM, "Ryan Novosielski"  wrote:

>
> Thanks for clearing that up. I was pretty sure there was no problem at all
> in using logrotate, and I know that restarting slurmctld does not
> ordinarily lose jobs.
>
> --
> 
> || \\UTGERS, |---*
> O*---
> ||_// the State  | Ryan Novosielski - novos...@rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS
> Campus
> ||  \\of NJ  | Office of Advanced Research Computing - MSB C630, Newark
> `'
>
> > On Oct 11, 2016, at 06:19, Philippe  wrote:
> >
> > Hello all,
> > sorry for this long delay since my first post.
> > Thanks for all the answers, it helped me to make some tests, and after
> not so long, I realize I use a personnal script to launch the daemons, and
> I was still using my "debug" start line, which contains the startclean
> argument ...
> > So it's all my fault, Slurm did his job to startclean when the logrotate
> triggered it.
> >
> > Sorry for that !
> >
> > On Thu, Sep 29, 2016 at 2:05 PM, Janne Blomqvist <
> janne.blomqv...@aalto.fi> wrote:
> > On 2016-09-27 10:39, Philippe wrote:
> > > If I can't use logrotate, what must I use ?
> >
> > You can log via syslog, and let your syslog daemon handle the rotation
> > (and rate limiting, disk full, logging to a central log host and all the
> > other nice things that syslog can do for you).
> >
> >
> > --
> > Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
> > Aalto University School of Science, PHYS & NBE
> > +358503841576 || janne.blomqv...@aalto.fi
> >
> >
>


[slurm-dev] Re: Slurmctld auto restart and kill running job, why ?

2016-10-11 Thread Ryan Novosielski

Thanks for clearing that up. I was pretty sure there was no problem at all in 
using logrotate, and I know that restarting slurmctld does not ordinarily lose 
jobs.

--

|| \\UTGERS, |---*O*---
||_// the State  | Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\of NJ  | Office of Advanced Research Computing - MSB C630, Newark
`'

> On Oct 11, 2016, at 06:19, Philippe  wrote:
> 
> Hello all,
> sorry for this long delay since my first post.
> Thanks for all the answers, it helped me to make some tests, and after not so 
> long, I realize I use a personnal script to launch the daemons, and I was 
> still using my "debug" start line, which contains the startclean argument ...
> So it's all my fault, Slurm did his job to startclean when the logrotate 
> triggered it.
> 
> Sorry for that !
> 
> On Thu, Sep 29, 2016 at 2:05 PM, Janne Blomqvist  
> wrote:
> On 2016-09-27 10:39, Philippe wrote:
> > If I can't use logrotate, what must I use ?
> 
> You can log via syslog, and let your syslog daemon handle the rotation
> (and rate limiting, disk full, logging to a central log host and all the
> other nice things that syslog can do for you).
> 
> 
> --
> Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
> Aalto University School of Science, PHYS & NBE
> +358503841576 || janne.blomqv...@aalto.fi
> 
> 


[slurm-dev] Re: Slurmctld auto restart and kill running job, why ?

2016-10-11 Thread Philippe
Hello all,
sorry for this long delay since my first post.
Thanks for all the answers, it helped me to make some tests, and after not
so long, I realize I use a personnal script to launch the daemons, and I
was still using my "debug" start line, which contains the startclean
argument ...
So it's all my fault, Slurm did his job to startclean when the logrotate
triggered it.

Sorry for that !

On Thu, Sep 29, 2016 at 2:05 PM, Janne Blomqvist 
wrote:

> On 2016-09-27 10:39, Philippe wrote:
> > If I can't use logrotate, what must I use ?
>
> You can log via syslog, and let your syslog daemon handle the rotation
> (and rate limiting, disk full, logging to a central log host and all the
> other nice things that syslog can do for you).
>
>
> --
> Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
> Aalto University School of Science, PHYS & NBE
> +358503841576 || janne.blomqv...@aalto.fi
>
>


[slurm-dev] Re: Accounting needs slurm daemon restart to apply changes

2016-10-11 Thread Ryan Novosielski

I suspect that you, like I, ended up with an incorrect "ControlHost" in 
"sacctmgr list clusters". This is the address that will be notified that a 
change has been made in the accounting database.

I still haven't gotten a suggestion on how to fix it without losing my 
accounting data, though. :-\


From: Eneko Anasagasti 
Sent: Tuesday, October 11, 2016 2:59 AM
To: slurm-dev
Subject: [slurm-dev] Accounting needs slurm daemon restart to apply changes

Hi,

Following an issue we had with sreport where a user wasn't reporting (thread 
below). We discovered that any change done to the accounting database isn't 
notified to the slurm daemon, therefore it does not apply changes until 
slurmctld is restarted.

The docs (http://slurm.schedmd.com/accounting.html) say: Once an entity has 
been added, modified or removed, the change is sent to the appropriate Slurm 
daemons and will be available for use instantly.

We are using Slurm 15.08.7

Thanks,

Eneko Anasagasti
IT Engineer
BCAM -  Basque Center for Applied Mathematics
Alameda de Mazarredo, 14
E-48009 Bilbao, Basque Country - Spain
Tel. +34 946 567 842
eanasaga...@bcamath.org | 
www.bcamath.org/eanasagasti

( matematika mugaz bestalde )


 Forwarded Message 
Subject:Re: [slurm-dev] Re: Missing user in sreport
Date:   Tue, 13 Sep 2016 08:48:18 +0200
From:   Eneko Anasagasti 

To: slurm-dev@schedmd.com



Hi,

Thanks for your answer Paddy. Actually you were right about the user not been 
in a slurm account.

So I added it with the following comand

sacctmgr -i add user testuser DefaultAccount=testgroup FairShare=100

And now it will show up when listing associations but it still doesn't show in 
sreport...

I'm trying this:

sreport  cluster AccountUtilizationByUser Start=2016-03-01-00:00:00 
End=2016-09-12-23:59:59 -t percent

(I sent slurm jobs myself using this user to try to make them show in the 
report)

Thanks,

Eneko Anasagasti
IT Engineer
BCAM -  Basque Center for Applied Mathematics
Alameda de Mazarredo, 14
E-48009 Bilbao, Basque Country - Spain
Tel. +34 946 567 842
eanasaga...@bcamath.org | 
www.bcamath.org/eanasagasti

( matematika mugaz bestalde )

On 09/09/16 10:17, Paddy Doyle wrote:

Hi Eneko,

On Fri, Sep 09, 2016 at 12:12:32AM -0700, Eneko Anasagasti wrote:



Hi,

I just realized a user is missing from sreport, although it is perfectly
visible with sacct.

We are using slurm 15.8.7-1.1

After doing a small analysis I found out that this user wasn't a member of
the group object it should be in openldap.

So I add it where it belonged. But still this wasn't enough apparently.


I don't think that slurm would interact directly with LDAP. It's more likely
that the user is not in a slurm Account, as managed by sacctmgr. If your sreport
is something like "sreport cluster AccountUtilizationByUser..." then I think
users need to be in a slurm Account/Association before they are visible in that
report. Does the user show up in "sacctmgr list associations cluster=XX"?

As a wild guess, do you have a process that looks at a user's group membership
in LDAP and then adds them to a slurm Account? Perhaps that hadn't run yet for
the user?



I also tried using slurmreport but I am getting the following error

/Can't locate Slurm.pm in @INC/


I don't know what 'slurmreport' is, but that looks like a Perl include path
error. Maybe you need to install the slurm-perlapi package?

Thanks,
Paddy