[gridengine users] deciding spool directory location
We are in the planning phase for the initial installation of grid engine. The initial configuration initially is a single cluster with 30 SLES 11 machines. This number may grow to as many as 100 SLES 11 servers. The Oracle N1 Grid Engine 6 Installation Guide, under sge-root Installation Directory, indicates placing the spool directory under sge-root may be avoided for efficiency reasons. Later on, under Spool Directories Under the Root Directory, it states "You do not need to export these directories to other machines. However, exporting the entire sge-root tree and making it write-accessible for the master host and all executable hosts makes administration easier." We are trying to determine where the spool directory should reside based on performance Versus ease of administration. Can somebody explain how ease of administration would be made easier? Thanks in advance. Dale ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] deciding spool directory location
You can reference this HOWTO: http://gridscheduler.sourceforge.net/howto/nfsreduce.html You can put everything on NFS, and if the NFS server can't handle the load, then change the configuration to local spooling instead later on. Rayson On Thu, Jan 12, 2012 at 12:17 PM, Wolf, Dale wrote: > We are in the planning phase for the initial installation of grid engine. > The initial > > configuration initially is a single cluster with 30 SLES 11 machines. This > number may > > grow to as many as 100 SLES 11 servers. > > > > The Oracle N1 Grid Engine 6 Installation Guide, under sge-root Installation > Directory, > > indicates placing the spool directory under sge-root may be avoided for > efficiency > > reasons. Later on, under Spool Directories Under the Root Directory, it > states > > > > "You do not need to export these directories to other machines. However, > exporting the entire sge-root tree and making it write-accessible for the > master host and all executable hosts makes administration easier." > > > > We are trying to determine where the spool directory should reside based on > performance > > Versus ease of administration. Can somebody explain how ease of > administration would > > be made easier? > > > > Thanks in advance. > > > > Dale > > > > > > > ___ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users > ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] deciding spool directory location
Hi, Am 12.01.2012 um 18:17 schrieb Wolf, Dale: > We are in the planning phase for the initial installation of grid engine. The > initial > configuration initially is a single cluster with 30 SLES 11 machines. This > number may > grow to as many as 100 SLES 11 servers. > > The Oracle N1 Grid Engine 6 Installation Guide, under sge-root Installation > Directory, > indicates placing the spool directory under sge-root may be avoided for > efficiency > reasons. Later on, under Spool Directories Under the Root Directory, it > states > > "You do not need to export these directories to other machines. However, > exporting the entire sge-root tree and making it write-accessible for the > master host and all executable hosts makes administration easier." Well, the spool directory is inside the $SGE_ROOT/default/spool, but the best way for me is in the middle: to export $SGE_ROOT to all machines, while redirecting the spool directory to a location like /var/spool/sge, which needs only to be writable by the SGE admin user (the "/var/spool/sge/qmaster" needs to be created beforehand, while the spool directories for the nodes will be created automatcially when the sgeeced starts). http://arc.liv.ac.uk/SGE/howto/nfsreduce.html -- Reuti > We are trying to determine where the spool directory should reside based on > performance > Versus ease of administration. Can somebody explain how ease of > administration would > be made easier? > > Thanks in advance. > > Dale > > > ___ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] deciding spool directory location
Hi Dale, We are trying to determine where the spool directory should reside based on performance Versus ease of administration. Can somebody explain how ease of administration would be made easier? Here is a short answer: When the spool directory is shared it is far easier for an administrator to troubleshoot node-specific job issues. This is because you can see/access all of the spool/location without having to hop to a specific machine. When spool is not shared your spool data and messages are on local disk on the compute nodes. This means that you have to connect to that node in order to read or examine the files. More detail ... The decision to do shared or not-shared generally revolves around the power of your NFS server, what else is talking on that same network/subnet/vlan/wire and probably more importantly how many jobs you might be running through your system during a day. The number of jobs entering and existing the system is the real factor on how often and hard your spool share is getting hit. Some of my pharma clusters run hours-long jobs and might only do a few hundred or thousand jobs per day. Another biotech cluster of similar size might be doing 150,000 jobs per day running short chemical simulations. My gut answer is usually to do shared-spool first and only move away from that if performance demands it. Changing the spooling location post-install is not a huge deal. I'm also a classic spooling zealot. I hate berkeleydb spooling and even on the 2000 core cluster that does 150,000 jobs per day we still use classic spooling on a NFS shared SGE Root and spool. We are, however, using Isilon scale-out NAS for the NFS and that means we have no real performance issues at all. My $.02 -Chris ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] deciding spool directory location
Whoa. If there is a tool out there that gives users access to debug and info from the spool area I'd love to hear about it and get it out into the community. One of the downsides to spool locations is that they are usually only accessible to admins. One of my minor gripes about Grid Engine is the lack of debug/troubleshooting stuff that is available to non-admin users who don't have sudo or root access. One of last good systems providing data to regular users about "why is my job not scheduled" is now losing ground since "schedd_job_info=false" started being deployed on high-volume clusters. Even if there is a tool out there that can't be shared it would be great if someone could talk about the methods used -- maybe we can gin up an equiv utility for the community... dag Dave Love wrote: Not just the administrator, actually. There's stuff which isn't accessible via qacct but can be useful for users to get post mortem information about failures. Mark Dixon has a tool which grovels it (unpublished?, hint). ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] deciding spool directory location
Am 13.01.2012 um 17:33 schrieb Chris Dagdigian: > Whoa. If there is a tool out there that gives users access to debug and info > from the spool area I'd love to hear about it and get it out into the > community. One of the downsides to spool locations is that they are usually > only accessible to admins. Because it is on a different machine like a node? The default permissions allow everyone to read it. As small epilog: #!/bin/bash tar -C ${SGE_JOB_SPOOL_DIR%/*} -czf ${SGE_STDOUT_PATH%/*}/${SGE_JOB_SPOOL_DIR##*/}.tgz ${SGE_JOB_SPOOL_DIR##*/} and you get an archive where stdout is set to. -- Reuti > One of my minor gripes about Grid Engine is the lack of debug/troubleshooting > stuff that is available to non-admin users who don't have sudo or root > access. One of last good systems providing data to regular users about "why > is my job not scheduled" is now losing ground since "schedd_job_info=false" > started being deployed on high-volume clusters. > > Even if there is a tool out there that can't be shared it would be great if > someone could talk about the methods used -- maybe we can gin up an equiv > utility for the community... > > dag > > > > Dave Love wrote: >> Not just the administrator, actually. There's stuff which isn't >> accessible via qacct but can be useful for users to get post mortem >> information about failures. Mark Dixon has a tool which grovels it >> (unpublished?, hint). > ___ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] deciding spool directory location
That's an awesome epilog script Reuti! I might modify it so that a user can trigger a request for the archive but it's disabled by default. That would be a pretty excellent debug tool... Thanks again! -dag Reuti wrote: Am 13.01.2012 um 17:33 schrieb Chris Dagdigian: Whoa. If there is a tool out there that gives users access to debug and info from the spool area I'd love to hear about it and get it out into the community. One of the downsides to spool locations is that they are usually only accessible to admins. Because it is on a different machine like a node? The default permissions allow everyone to read it. As small epilog: #!/bin/bash tar -C ${SGE_JOB_SPOOL_DIR%/*} -czf ${SGE_STDOUT_PATH%/*}/${SGE_JOB_SPOOL_DIR##*/}.tgz ${SGE_JOB_SPOOL_DIR##*/} and you get an archive where stdout is set to. -- Reuti One of my minor gripes about Grid Engine is the lack of debug/troubleshooting stuff that is available to non-admin users who don't have sudo or root access. One of last good systems providing data to regular users about "why is my job not scheduled" is now losing ground since "schedd_job_info=false" started being deployed on high-volume clusters. Even if there is a tool out there that can't be shared it would be great if someone could talk about the methods used -- maybe we can gin up an equiv utility for the community... dag Dave Love wrote: Not just the administrator, actually. There's stuff which isn't accessible via qacct but can be useful for users to get post mortem information about failures. Mark Dixon has a tool which grovels it (unpublished?, hint). ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users