Re: [slurm-users] Python and R installation in a SLURM cluster

2018-05-12 Thread John Hearns
Eric, I'm sorry to be a little prickly here.
Each node has an independent home directory for the user?
How then do applications update dot files?
How then would as a for instance do the users edit the .bashrc file to
bring Anaconda into their paths?

Beofre anyone says it, a proper Modules system is the way forward.
But I know that when you install Anaconda as a user it adds the path to
your .bashrc
Which fouls up Gnomes dbus daemon, which is another tale.







On 12 May 2018 at 07:09, Eric F. Alemany  wrote:

> Hi Chris,
>
> Thank you for your comments. I will look at Easybuild. There are quite a
> few options to automate the creation of software modules.
>
> I will be doing lots of reading this week-end.
>
> By the way, i signed up to the Beowulf mailing list.
>
> Thank you,
>
> Eric
> 
> _
>
> * Eric F.  Alemany *
> *System Administrator for Research*
>
> Division of Radiation & Cancer  Biology
> Department of Radiation Oncology
>
> Stanford University School of Medicine
> Stanford, California 94305
>
> Tel:1-650-498-7969  No Texting
> Fax:1-650-723-7382
>
>
>
> On May 11, 2018, at 12:56 AM, Chris Samuel  wrote:
>
> On Friday, 11 May 2018 5:11:38 PM AEST John Hearns wrote:
>
> Eric, my advice would be to definitely learn the Modules system and
> implement modules for your users.
>
>
> I will echo that, and the suggestion of shared storage (we use our Lustre
> filesystem for that).  I would also suggest looking at a system to help
> you
> automate building of software packages.   Not only does this help
> replicate
> builds, but it also gives you access to the community who write the
> recipes
> for them - and that itself can be very valuable.
>
> We use Easybuild (which also automates the creation of software modules -
> and
> I would suggest using the Lmod system for that):
>
> https://easybuilders.github.io/easybuild/
>
> But there's also Spack too:
>
> https://spack.io/
>
> As another resource (as we are going off topic from Slurm here), I would
> suggest the Beowulf list as a mailing list that deals with Linux based HPC
> systems of many different scales.  Disclosure: I now caretake the list,
> but
> it's been going since the 1990s.
>
> http://beowulf.org/
>
> All the best!
> Chris
> --
> Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
>
>
>
>


[slurm-users] Taking a break from slurm-users

2018-05-12 Thread Chris Samuel
Hey folks,

I'm going to be unsubscribing from slurm-users for a while as I'll be 
travelling to the US & UK for a number of weeks & I don't want to drown in 
email. 

I'll be back...

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC




Re: [slurm-users] Issue with salloc

2018-05-12 Thread Mahmood Naderan
I tried the following command but it failed

[mahmood@rocks7 ~]$ srun --pty -u /bin/bash -l -a em1 -p IACTIVE --mem=4GB
srun: error: Unable to allocate resources: Invalid account or
account/partition combination specified
[mahmood@rocks7 ~]$ scontrol show partition IACTIVE
PartitionName=IACTIVE
   AllowGroups=ALL AllowAccounts=em1 AllowQos=ALL
   AllocNodes=rocks7 Default=NO QoS=N/A
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO
MaxCPUsPerNode=UNLIMITED
   Nodes=compute-0-[4-6]
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
   OverTimeLimit=NONE PreemptMode=OFF
   State=UP TotalCPUs=144 TotalNodes=3 SelectTypeParameters=NONE
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED



Regards,
Mahmood




On Sat, May 12, 2018 at 11:10 AM, Chris Samuel  wrote:
> salloc doesn't do that.
>
> We use a 2 line script called "sinteractive" to do this, it's really simple.
>
> #!/bin/bash
> exec srun $* --pty -u ${SHELL} -l
>
> That's it..
>
> Hope that helps!
> Chris
> --
>  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
>
>



Re: [slurm-users] Python and R installation in a SLURM cluster

2018-05-12 Thread Eric F. Alemany


._

Eric F.  Alemany
System Administrator for Research

Division of Radiation & Cancer  Biology
Department of Radiation Oncology

Stanford University School of Medicine
Stanford, California 94305

Tel:1-650-498-7969  No Texting
Fax:1-650-723-7382

On May 12, 2018, at 00:08, John Hearns 
mailto:hear...@googlemail.com>> wrote:

Eric, I'm sorry to be a little prickly here.
Each node has an independent home directory for the user?
How then do applications update dot files?
How then would as a for instance do the users edit the .bashrc file to bring 
Anaconda into their paths?

Beofre anyone says it, a proper Modules system is the way forward.
But I know that when you install Anaconda as a user it adds the path to your 
.bashrc
Which fouls up Gnomes dbus daemon, which is another tale.







On 12 May 2018 at 07:09, Eric F. Alemany 
mailto:ealem...@stanford.edu>> wrote:
Hi Chris,

Thank you for your comments. I will look at Easybuild. There are quite a few 
options to automate the creation of software modules.

I will be doing lots of reading this week-end.

By the way, i signed up to the Beowulf mailing list.

Thank you,

Eric
_

Eric F.  Alemany
System Administrator for Research

Division of Radiation & Cancer  Biology
Department of Radiation Oncology

Stanford University School of Medicine
Stanford, California 94305

Tel:1-650-498-7969  No Texting
Fax:1-650-723-7382



On May 11, 2018, at 12:56 AM, Chris Samuel 
mailto:ch...@csamuel.org>> wrote:

On Friday, 11 May 2018 5:11:38 PM AEST John Hearns wrote:

Eric, my advice would be to definitely learn the Modules system and
implement modules for your users.

I will echo that, and the suggestion of shared storage (we use our Lustre
filesystem for that).  I would also suggest looking at a system to help you
automate building of software packages.   Not only does this help replicate
builds, but it also gives you access to the community who write the recipes
for them - and that itself can be very valuable.

We use Easybuild (which also automates the creation of software modules - and
I would suggest using the Lmod system for that):

https://easybuilders.github.io/easybuild/

But there's also Spack too:

https://spack.io/

As another resource (as we are going off topic from Slurm here), I would
suggest the Beowulf list as a mailing list that deals with Linux based HPC
systems of many different scales.  Disclosure: I now caretake the list, but
it's been going since the 1990s.

http://beowulf.org/

All the best!
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






Re: [slurm-users] Python and R installation in a SLURM cluster

2018-05-12 Thread John Hearns
Well I DID say that you need 'what looks like a home directory'.
So yes indeed you prove, correctly, that this works just fine!

On 12 May 2018 at 20:17, Eric F. Alemany  wrote:

>
> Hi John,
>
> No worries at all. I take all ideas, comments and advice with the greatest
> respect.
> I know that my questions and knowledge of SULRM/cluster are very basics. I
> have built very small and simple cluster. This is an opportunity for me to
> learn at a bigger scale.
>
> Each node has a user home directory because I thought that all users must
> have the same uid and guid across the nodes.
> The two users (post-docs) who will use the cluster will only log in to the
> headnode and install their programs and run their jobs from the headnode.
>
> Thank you for your help.
>
> Best,
> Eric
>
> .___
> __
>
> * Eric F.  Alemany *
> *System Administrator for Research*
>
> Division of Radiation & Cancer  Biology
> Department of Radiation Oncology
>
> Stanford University School of Medicine
> Stanford, California 94305
>
> Tel:1-650-498-7969  No Texting
> Fax:1-650-723-7382
>
> On May 12, 2018, at 00:08, John Hearns  wrote:
>
> Eric, I'm sorry to be a little prickly here.
> Each node has an independent home directory for the user?
> How then do applications update dot files?
> How then would as a for instance do the users edit the .bashrc file to
> bring Anaconda into their paths?
>
> Beofre anyone says it, a proper Modules system is the way forward.
> But I know that when you install Anaconda as a user it adds the path to
> your .bashrc
> Which fouls up Gnomes dbus daemon, which is another tale.
>
>
>
>
>
>
>
> On 12 May 2018 at 07:09, Eric F. Alemany  wrote:
>
>> Hi Chris,
>>
>> Thank you for your comments. I will look at Easybuild. There are quite a
>> few options to automate the creation of software modules.
>>
>> I will be doing lots of reading this week-end.
>>
>> By the way, i signed up to the Beowulf mailing list.
>>
>> Thank you,
>>
>> Eric
>> 
>> _
>>
>> * Eric F.  Alemany *
>> *System Administrator for Research*
>>
>> Division of Radiation & Cancer  Biology
>> Department of Radiation Oncology
>>
>> Stanford University School of Medicine
>> Stanford, California 94305
>>
>> Tel:1-650-498-7969  No Texting
>> Fax:1-650-723-7382
>>
>>
>>
>> On May 11, 2018, at 12:56 AM, Chris Samuel  wrote:
>>
>> On Friday, 11 May 2018 5:11:38 PM AEST John Hearns wrote:
>>
>> Eric, my advice would be to definitely learn the Modules system and
>> implement modules for your users.
>>
>>
>> I will echo that, and the suggestion of shared storage (we use our Lustre
>> filesystem for that).  I would also suggest looking at a system to help
>> you
>> automate building of software packages.   Not only does this help
>> replicate
>> builds, but it also gives you access to the community who write the
>> recipes
>> for them - and that itself can be very valuable.
>>
>> We use Easybuild (which also automates the creation of software modules -
>> and
>> I would suggest using the Lmod system for that):
>>
>> https://easybuilders.github.io/easybuild/
>>
>> But there's also Spack too:
>>
>> https://spack.io/
>>
>> As another resource (as we are going off topic from Slurm here), I would
>> suggest the Beowulf list as a mailing list that deals with Linux based
>> HPC
>> systems of many different scales.  Disclosure: I now caretake the list,
>> but
>> it's been going since the 1990s.
>>
>> http://beowulf.org/
>>
>> All the best!
>> Chris
>> --
>> Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
>>
>>
>>
>>
>


Re: [slurm-users] Python and R installation in a SLURM cluster

2018-05-12 Thread Miguel Gutiérrez Páez
Hi,

Home directory are also shared among all nodes (and head one as well). The
base system (that is, local drive, including os, system files, etc) are
identical and cloned based among all the hosts. If I need to install some
light package (dev version of a library for example or similar) I install
it over all the nodes with, for example, ansible, but this is not usual and
99% of apps, libraries or wathever, are installed on a shared location and
managed with lmod.

Regards

El sáb., 12 may. 2018 6:06, Eric F. Alemany 
escribió:

> HI Miguel,
>
> Thank you for your comment. That sounds pretty straight forward.
> you never had issues with programs relying on the system files or relying
> on the home directory location?
>
> Thanks
> Eric
>
> _
>
> * Eric F.  Alemany *
> *System Administrator for Research*
>
> Division of Radiation & Cancer  Biology
> Department of Radiation Oncology
>
> Stanford University School of Medicine
> Stanford, California 94305
>
> Tel:1-650-498-7969  No Texting
> Fax:1-650-723-7382
>
>
>
> On May 10, 2018, at 11:55 PM, Miguel Gutiérrez Páez 
> wrote:
>
> Hi,
>
> I install all my apps in a shared storage, and change environment
> variables (path, vars, etc.) with lmod. It's very useful.
>
> Regards.
>
> El vie., 11 may. 2018 a las 6:19, Eric F. Alemany ()
> escribió:
>
>> Hi Lachlan,
>>
>> Thank you for sharing your environment. Everyone has their own set of
>> rules and i appreciate everyone’s input.
>> It seems as if the NFS share is a great place to start.
>>
>> Best,
>> Eric
>>
>> _
>>
>> * Eric F.  Alemany *
>> *System Administrator for Research*
>>
>> Division of Radiation & Cancer  Biology
>> Department of Radiation Oncology
>>
>> Stanford University School of Medicine
>> Stanford, California 94305
>>
>> Tel:1-650-498-7969  No Texting
>> Fax:1-650-723-7382
>>
>>
>>
>> On May 10, 2018, at 4:23 PM, Lachlan Musicman  wrote:
>>
>> On 11 May 2018 at 01:35, Eric F. Alemany  wrote:
>>
>>> Hi All,
>>>
>>> I know this might sounds as a very basic question: where in the cluster
>>> should I install Python and R?
>>> Headnode?
>>> Execute nodes ?
>>>
>>> And is there a particular directory (path) I need to install Python and
>>> R.
>>>
>>> Background:
>>> SLURM on Ubuntu 18.04
>>> 1 headnode
>>> 4 execute nodes
>>> NFS shared drive among all nodes.
>>>
>>
>>
>> Eric,
>>
>> To echo the others: we have a /binaries nfs share that utilises the
>> standard Environment Modules software so that researchers can manipulate
>> their $PATH on the fly with module load/module unload. That share is
>> mounted on all the nodes.
>>
>> For Python, I use virtualenv's but instead of activating, the path is
>> changed by the Module file. Personally, I find conda doesn't work very well
>> in a shared environment. It's fine on a personal level/
>>
>> For R, we have resorted to only installing the main point release because
>> we have >700 libraries installed within R and I don't want to reinstall
>> them every time. We do also have packrat installed so researchers can
>> install their own libraries locally as well.
>>
>>
>> Cheers
>> L.
>>
>>
>>
>>
>>
>


Re: [slurm-users] Python and R installation in a SLURM cluster

2018-05-12 Thread John Hearns
Completely as an aside, the next question then is 'Aha - but what happens
when you have new users on the cluster'
I am currently working with sssd authentication and with the pam_mkhomedir
plugin.
I guess if an MPI job is launched using ssh then pam_mkhomedir would
automatically create the home directory too!


On 12 May 2018 at 22:02, John Hearns  wrote:

> Well I DID say that you need 'what looks like a home directory'.
> So yes indeed you prove, correctly, that this works just fine!
>
> On 12 May 2018 at 20:17, Eric F. Alemany  wrote:
>
>>
>> Hi John,
>>
>> No worries at all. I take all ideas, comments and advice with the
>> greatest respect.
>> I know that my questions and knowledge of SULRM/cluster are very basics.
>> I have built very small and simple cluster. This is an opportunity for me
>> to learn at a bigger scale.
>>
>> Each node has a user home directory because I thought that all users must
>> have the same uid and guid across the nodes.
>> The two users (post-docs) who will use the cluster will only log in to
>> the headnode and install their programs and run their jobs from the
>> headnode.
>>
>> Thank you for your help.
>>
>> Best,
>> Eric
>>
>> .___
>> __
>>
>> * Eric F.  Alemany *
>> *System Administrator for Research*
>>
>> Division of Radiation & Cancer  Biology
>> Department of Radiation Oncology
>>
>> Stanford University School of Medicine
>> Stanford, California 94305
>>
>> Tel:1-650-498-7969  No Texting
>> Fax:1-650-723-7382
>>
>> On May 12, 2018, at 00:08, John Hearns  wrote:
>>
>> Eric, I'm sorry to be a little prickly here.
>> Each node has an independent home directory for the user?
>> How then do applications update dot files?
>> How then would as a for instance do the users edit the .bashrc file to
>> bring Anaconda into their paths?
>>
>> Beofre anyone says it, a proper Modules system is the way forward.
>> But I know that when you install Anaconda as a user it adds the path to
>> your .bashrc
>> Which fouls up Gnomes dbus daemon, which is another tale.
>>
>>
>>
>>
>>
>>
>>
>> On 12 May 2018 at 07:09, Eric F. Alemany  wrote:
>>
>>> Hi Chris,
>>>
>>> Thank you for your comments. I will look at Easybuild. There are quite a
>>> few options to automate the creation of software modules.
>>>
>>> I will be doing lots of reading this week-end.
>>>
>>> By the way, i signed up to the Beowulf mailing list.
>>>
>>> Thank you,
>>>
>>> Eric
>>> 
>>> _
>>>
>>> * Eric F.  Alemany *
>>> *System Administrator for Research*
>>>
>>> Division of Radiation & Cancer  Biology
>>> Department of Radiation Oncology
>>>
>>> Stanford University School of Medicine
>>> Stanford, California 94305
>>>
>>> Tel:1-650-498-7969  No Texting
>>> Fax:1-650-723-7382
>>>
>>>
>>>
>>> On May 11, 2018, at 12:56 AM, Chris Samuel  wrote:
>>>
>>> On Friday, 11 May 2018 5:11:38 PM AEST John Hearns wrote:
>>>
>>> Eric, my advice would be to definitely learn the Modules system and
>>> implement modules for your users.
>>>
>>>
>>> I will echo that, and the suggestion of shared storage (we use our
>>> Lustre
>>> filesystem for that).  I would also suggest looking at a system to help
>>> you
>>> automate building of software packages.   Not only does this help
>>> replicate
>>> builds, but it also gives you access to the community who write the
>>> recipes
>>> for them - and that itself can be very valuable.
>>>
>>> We use Easybuild (which also automates the creation of software modules
>>> - and
>>> I would suggest using the Lmod system for that):
>>>
>>> https://easybuilders.github.io/easybuild/
>>>
>>> But there's also Spack too:
>>>
>>> https://spack.io/
>>>
>>> As another resource (as we are going off topic from Slurm here), I would
>>> suggest the Beowulf list as a mailing list that deals with Linux based
>>> HPC
>>> systems of many different scales.  Disclosure: I now caretake the list,
>>> but
>>> it's been going since the 1990s.
>>>
>>> http://beowulf.org/
>>>
>>> All the best!
>>> Chris
>>> --
>>> Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
>>>
>>>
>>>
>>>
>>
>


[slurm-users] Why GPU do not become available when job suspended?

2018-05-12 Thread Zheng Gong
Hi,

I'm reading the GRES Scheduling document. It says:

> If the job is suspended, those resources do not become available for use
> by other jobs.


I think this explains why GPU jobs can not be preempted.

But I don't know why  the GRES cannot be released after the job suspended.
Even if I manually suspend a GPU job, the GPU node cannot be allocated to a
new job.

-- 
Gong, Zheng (龚正)
Doctoral Candidate in Physical Chemistry
School of Chemistry and Chemical Engineering
Shanghai Jiao Tong University
http://sun.sjtu.edu.cn