[gridengine users] deciding spool directory location

2012-01-12 Thread Wolf, Dale
We are in the planning phase for the initial installation of grid engine. The 
initial

configuration initially is a single cluster with 30 SLES 11 machines.  This 
number may

grow to as many as 100 SLES 11 servers.



The Oracle N1 Grid Engine 6 Installation Guide, under sge-root Installation 
Directory,

indicates placing the spool directory under sge-root may be avoided for 
efficiency

reasons.  Later on, under Spool Directories Under the Root Directory, it states



"You do not need to export these directories to other machines. However, 
exporting the entire sge-root tree and making it write-accessible for the 
master host and all executable hosts makes administration easier."



We are trying to determine where the spool directory should reside based on 
performance

Versus ease of administration.  Can somebody explain how ease of administration 
would

be made easier?



Thanks in advance.



Dale




___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] deciding spool directory location

2012-01-12 Thread Rayson Ho
You can reference this HOWTO:

http://gridscheduler.sourceforge.net/howto/nfsreduce.html

You can put everything on NFS, and if the NFS server can't handle the
load, then change the configuration to local spooling instead later
on.

Rayson


On Thu, Jan 12, 2012 at 12:17 PM, Wolf, Dale  wrote:
> We are in the planning phase for the initial installation of grid engine.
> The initial
>
> configuration initially is a single cluster with 30 SLES 11 machines.  This
> number may
>
> grow to as many as 100 SLES 11 servers.
>
>
>
> The Oracle N1 Grid Engine 6 Installation Guide, under sge-root Installation
> Directory,
>
> indicates placing the spool directory under sge-root may be avoided for
> efficiency
>
> reasons.  Later on, under Spool Directories Under the Root Directory, it
> states
>
>
>
> "You do not need to export these directories to other machines. However,
> exporting the entire sge-root tree and making it write-accessible for the
> master host and all executable hosts makes administration easier."
>
>
>
> We are trying to determine where the spool directory should reside based on
> performance
>
> Versus ease of administration.  Can somebody explain how ease of
> administration would
>
> be made easier?
>
>
>
> Thanks in advance.
>
>
>
> Dale
>
>
>
>
>
>
> ___
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users
>

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] deciding spool directory location

2012-01-12 Thread Reuti
Hi,

Am 12.01.2012 um 18:17 schrieb Wolf, Dale:

> We are in the planning phase for the initial installation of grid engine. The 
> initial
> configuration initially is a single cluster with 30 SLES 11 machines.  This 
> number may
> grow to as many as 100 SLES 11 servers.
>  
> The Oracle N1 Grid Engine 6 Installation Guide, under sge-root Installation 
> Directory,
> indicates placing the spool directory under sge-root may be avoided for 
> efficiency
> reasons.  Later on, under Spool Directories Under the Root Directory, it 
> states
>  
> "You do not need to export these directories to other machines. However, 
> exporting the entire sge-root tree and making it write-accessible for the 
> master host and all executable hosts makes administration easier."

Well, the spool directory is inside the $SGE_ROOT/default/spool, but the best 
way for me is in the middle: to export $SGE_ROOT to all machines, while 
redirecting the spool directory to a location like /var/spool/sge, which needs 
only to be writable by the SGE admin user (the "/var/spool/sge/qmaster" needs 
to be created beforehand, while the spool directories for the nodes will be 
created automatcially when the sgeeced starts).

http://arc.liv.ac.uk/SGE/howto/nfsreduce.html

-- Reuti


> We are trying to determine where the spool directory should reside based on 
> performance
> Versus ease of administration.  Can somebody explain how ease of 
> administration would
> be made easier?
>  
> Thanks in advance.
>  
> Dale
>  
>  
> ___
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] deciding spool directory location

2012-01-12 Thread Chris Dagdigian

Hi Dale,


We are trying to determine where the spool directory should reside based on 
performance

 Versus ease of administration.  Can somebody explain how ease of 
administration would
 be made easier?


Here is a short answer:

When the spool directory is shared it is far easier for an administrator 
to troubleshoot node-specific job issues. This is because you can 
see/access all of the spool/location without having to hop to a specific machine.


When spool is not shared your spool data and messages are on local disk 
on the compute nodes. This means that you have to connect to that node 
in order to read or examine the files.


More detail ...


The decision to do shared or not-shared generally revolves around the 
power of your NFS server, what else is talking on that same 
network/subnet/vlan/wire and probably more importantly how many jobs you 
might be running through your system during a day. The number of jobs 
entering and existing the system is the real factor on how often and 
hard your spool share is getting hit. Some of my pharma clusters run 
hours-long jobs and might only do a few hundred or thousand jobs per 
day. Another biotech cluster of similar size might be doing 150,000 jobs 
per day running short chemical simulations.


My gut answer is usually to do shared-spool first and only move away 
from that if performance demands it. Changing the spooling location 
post-install is not a huge deal.


I'm also a classic spooling zealot. I hate berkeleydb spooling and even 
on the 2000 core cluster that does 150,000 jobs per day we still use 
classic spooling on a NFS shared SGE Root and spool. We are, however, 
using Isilon scale-out NAS for the NFS and that means we have no real 
performance issues at all.


My $.02

-Chris



___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] deciding spool directory location

2012-01-13 Thread Chris Dagdigian


Whoa. If there is a tool out there that gives users access to debug and 
info from the spool area I'd love to hear about it and get it out into 
the community.  One of the downsides to spool locations is that they are 
usually only accessible to admins.


One of my minor gripes about Grid Engine is the lack of 
debug/troubleshooting stuff that is available to non-admin users who 
don't have sudo or root access. One of last good systems providing data 
to regular users about "why is my job not scheduled" is now losing 
ground since "schedd_job_info=false" started being deployed on 
high-volume clusters.


Even if there is a tool out there that can't be shared it would be great 
if someone could talk about the methods used -- maybe we can gin up an 
equiv utility for the community...


dag



Dave Love wrote:

Not just the administrator, actually.  There's stuff which isn't
accessible via qacct but can be useful for users to get post mortem
information about failures.  Mark Dixon has a tool which grovels it
(unpublished?, hint).

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] deciding spool directory location

2012-01-13 Thread Reuti
Am 13.01.2012 um 17:33 schrieb Chris Dagdigian:

> Whoa. If there is a tool out there that gives users access to debug and info 
> from the spool area I'd love to hear about it and get it out into the 
> community.  One of the downsides to spool locations is that they are usually 
> only accessible to admins.

Because it is on a different machine like a node? The default permissions allow 
everyone to read it. As small epilog:

 #!/bin/bash
tar -C ${SGE_JOB_SPOOL_DIR%/*} -czf 
${SGE_STDOUT_PATH%/*}/${SGE_JOB_SPOOL_DIR##*/}.tgz ${SGE_JOB_SPOOL_DIR##*/}

and you get an archive where stdout is set to.

-- Reuti


> One of my minor gripes about Grid Engine is the lack of debug/troubleshooting 
> stuff that is available to non-admin users who don't have sudo or root 
> access. One of last good systems providing data to regular users about "why 
> is my job not scheduled" is now losing ground since "schedd_job_info=false" 
> started being deployed on high-volume clusters.
> 
> Even if there is a tool out there that can't be shared it would be great if 
> someone could talk about the methods used -- maybe we can gin up an equiv 
> utility for the community...
> 
> dag
> 
> 
> 
> Dave Love wrote:
>> Not just the administrator, actually.  There's stuff which isn't
>> accessible via qacct but can be useful for users to get post mortem
>> information about failures.  Mark Dixon has a tool which grovels it
>> (unpublished?, hint).
> ___
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


Re: [gridengine users] deciding spool directory location

2012-01-13 Thread Chris Dagdigian


That's an awesome epilog script Reuti! I might modify it so that a user 
can trigger a request for the archive but it's disabled by default. That 
would be a pretty excellent debug tool...


Thanks again!

-dag


Reuti wrote:

Am 13.01.2012 um 17:33 schrieb Chris Dagdigian:


Whoa. If there is a tool out there that gives users access to debug and info 
from the spool area I'd love to hear about it and get it out into the 
community.  One of the downsides to spool locations is that they are usually 
only accessible to admins.


Because it is on a different machine like a node? The default permissions allow 
everyone to read it. As small epilog:

  #!/bin/bash
tar -C ${SGE_JOB_SPOOL_DIR%/*} -czf 
${SGE_STDOUT_PATH%/*}/${SGE_JOB_SPOOL_DIR##*/}.tgz ${SGE_JOB_SPOOL_DIR##*/}

and you get an archive where stdout is set to.

-- Reuti



One of my minor gripes about Grid Engine is the lack of debug/troubleshooting stuff that is 
available to non-admin users who don't have sudo or root access. One of last good systems providing 
data to regular users about "why is my job not scheduled" is now losing ground since 
"schedd_job_info=false" started being deployed on high-volume clusters.

Even if there is a tool out there that can't be shared it would be great if 
someone could talk about the methods used -- maybe we can gin up an equiv 
utility for the community...

dag



Dave Love wrote:

Not just the administrator, actually.  There's stuff which isn't
accessible via qacct but can be useful for users to get post mortem
information about failures.  Mark Dixon has a tool which grovels it
(unpublished?, hint).

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users



___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users