Re: [Toolserver-l] Grid Engine config change

2012-03-04 Thread Dr. Trigon
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04.03.2012 01:06, Merlissimo wrote:
 In both cases the old behavior was without -m a -b y, so
 
 'cronsub [jobname] [command]' has become 'qcronsub -l h_rt=06:00:00
 -l virtual_free=100M -N [jobname] [command]
 
 'cronsub -l [jobname] [command]' has become 'qcronsub -l
 h_rt=INFINITY -l virtual_free=100M -N [jobname] [command]
 
 The -b y option is mostly useful for binaries, e.g. if you don't
 submit the python script itself, but call the binary interpreter
 (python) with an argument. It is just an option if the submitted
 script file should be copied to a local filesystem on execution
 server (which increases performance, makes nfs error impossible and
 was always the default setting) or executed directly from your home
 (if you use -b y). In most cases this option isn't needed and
 copying is the best for most shell scripts.

Thanks for that info! I thought to remember once there was a comment
about cronsub copying (like -b y does) the script...?!? Anyway I had
to use '-b y' else my script closed directly after execution. What am
I doing wrong here? (my script do read and write files in my home, e.g.
config and log files - may be this is related?)

Greetings
DrTrigon
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9TPfcACgkQAXWvBxzBrDDQMwCffVmUUDwH2bkPgTedGhk2W/7f
uq4AoMqopPOTZfz4WnmpsSUAWoUcx8Ve
=v1y9
-END PGP SIGNATURE-

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] Grid Engine config change

2012-03-04 Thread Dr. Trigon
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04.03.2012 01:21, Platonides wrote:
 On 04/03/12 00:17, Dr. Trigon wrote:
 And here my key question arises; you mentioned 'qacct' to get
 more info (thanks for this hint) and this is one of the biggest
 problem I had with the whole SGE stuff; I was not able to get a
 complete docu whether on the toolserver nor else. At the moment,
 on the toolserver commands like 'qstat' or 'qdel' are not
 covered anymore. I (we) would like to know more about this great
 system.
 
 ? They are documented in the server man pages. Just run man qacct 
 Or PAGER=less man qacct as I find it a nicer one.

No this was not my point. If I know there is a command like e.g. 'qacct'
I know how to get help. My question was where do I get a list of all
SGE commands, options and backgrund infos? (it does not need to be a
reference book, but a user manual would be nice)

I searched the net several times for SGE infos and found some in several
places but that have been small parts of the whole docu only. But I
assume there has to be something... ;)

Greetings
DrTrigon
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9TPuAACgkQAXWvBxzBrDCs8gCg2gZ0dDQh/GRykzsZ+h7h6roC
jFoAoL6RkEoc7+8LBQKb9TFpDlJfh7+x
=tAG2
-END PGP SIGNATURE-

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] Grid Engine config change

2012-03-04 Thread Dr. Trigon
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04.03.2012 01:06, Merlissimo wrote:
 Some weeks ago i installed a script that removes empty log files
 for standard error/output stream after job execution. Many people
 used this option to prevent that their homedir contain so many
 empty error logs.

Could you may be explain how to use this script you installed?

Another question I have; Is there a way to finish and start a script
without having to wait until a queue becomes free by re-using to old?
E.g. if I like to restart my ircbot (running for INFTY time) I have
to wait some time until the needed queue becomes available, but what
about just re-using the queue it already had before?

Or is there an alternative way to have such a ircbot script running
continuously without having to wait (e.g. more than 1 minute) during a
restart? Can I trigger a restart (reading the new script from hdd)
while keeping it in the queue?

Thanks again! Greetings
DrTrigon
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9TQoYACgkQAXWvBxzBrDDaagCgvt5IyYPxdi0yXG90qgdLG87E
UkYAoLQ+Zz+tJC8Q8wX8So+slSK6X1mV
=5XZz
-END PGP SIGNATURE-

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] Grid Engine config change

2012-03-04 Thread Platonides
On 04/03/12 01:06, Merlissimo wrote:
 In both cases the old behavior was without -m a -b y, so
 
(...)
 The -b y option is mostly useful for binaries, e.g. if you don't submit
 the python script itself, but call the binary interpreter (python) with
 an argument. (...)

Actually, cronsub works as if providing -b y. It doesn't provide -b y,
but submits a new script which then calls the calling script, so it's
like submitting the script with -b y.

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] Grid Engine config change

2012-03-04 Thread Dr. Trigon
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04.03.2012 16:25, Platonides wrote:
 On 04/03/12 01:06, Merlissimo wrote:
 In both cases the old behavior was without -m a -b y, so
 
 (...)
 The -b y option is mostly useful for binaries, e.g. if you don't
 submit the python script itself, but call the binary interpreter
 (python) with an argument. (...)
 
 Actually, cronsub works as if providing -b y. It doesn't provide -b
 y, but submits a new script which then calls the calling script, so
 it's like submitting the script with -b y.

Aaa-ha! That explains a lot! Thanks for the hint!
Greetings
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9Tnk4ACgkQAXWvBxzBrDBAgwCeLKRGxQm+VtNY8WweJx1167Ar
1HIAmwdqIzlRo8arZz4P0G3eAURhSUnI
=STMa
-END PGP SIGNATURE-

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


[Toolserver-l] Grid Engine config change

2012-03-03 Thread Merlissimo

Hello toolserver users,

as you may know, there were some bigger problems related to sun grid engine 
starting in november 2011. I asked DaB. to become a sge manager for helping 
them to solve these problems.
During the last months i silently started reconfiguring sge in small steps so that it was always possible to use it as before and no downtime was needed. This took some time because i am only a 
volunteer and i had to changes nearly everything. Additional Nosy and DaB. changed some solaris configurations that i proposed.


All scripts that used grid engine before can continue to run without changes. 
But maybe you can increase your script performance by adding additional 
informations.

In the past you were requested to choose a suitable queue (all.q or longrun) 
for your job. Many people choosed a queue that did not fit best for their task. 
So i changed this procedure.

Now you have to add all resources that your job needs during runtime on job submition. Then sge will choose queue and host that fits best for your requirements. So you don't have to care about 
different queues anymore (you may have seen that there are much more queues than before).


All jobs must at least contain informations about maximum runtime (h_rt) and 
peak memory usage (virtual_free). This information may get obligatory in 
future. Currently only a warning message is shown.
You also have to request other resources like sql connections, free temp space, etc. if these are needed by your job. Please read documentation on toolserverwiki i have updated today: 
https://wiki.toolserver.org/view/Job_scheduling

This currently contains the main informations you need to know, but maybe i add 
some more examples later.

I also have added a new script called qcronsub. This is the replacement for cronsub most of you used before. Differently to cronsub it accepts the same arguments as the original qsub command by 
grid engine. So now it is possible the add all resource values at command line.


Please note that you should always use cronie at submit.toolserver.org for submitting jobs to sge by cron. These cron tasks will always be executed even if one host (e.g. clematis or willow) is down. 
This is the suggested usage since about 17 months. Many people have migrated their cron jobs from nightshade to willow during the last weeks. But they will have the same problem again if willow must 
be shut down for a longer time (which hopefully never happens).

--
Example:

This morning Dr. Trigon complained that his job mainbot did not run immediatly and was queued for a long time. I would guess he submitted his job from cron using cronsub mainbot -l 
/home/drtrigon/pywikipedia/mainbot.py

This indicates that the job runs forevery (longrun) with unkown memory usage. 
So grid engine was only able to start this job on willow.
It is not possible to run infinite job on the webservers (only shorter jobs are allowed so that most jobs have finished before high webserver usage is expected during the evening). Nor it was possible 
to run it on the server running mail transfer agent which only have less than 500MB memory free, but much cpu power (expected memory usage is unkown). Other servers like nightshade and yarrow aren't 
currently available.


According to the last run of this job it takes about 2 hours and 30 minutes runtime and had a peek usage of 370 MB memory. I got these values by requesting grid engine about usage statistics of the 
last ten days: qacct -j mainbot -d 10.
To be safe that the job gets always enough resouces i would suggest to raise the values to 4 hours and 500MB memory. It is not a problem if you request more resouces than really needed, but job 
needing more resources than requested may be killed. So the new submit command would be:


qcronsub -N mainbot -l h_rt=4:00:00 -l virtual_free=500MB 
/home/drtrigon/pywikipedia/mainbot.py

This job could run on both webserver during low load and on willow. Grid engine 
also knows that it cannot run on mailservers because of high memory usage.

The job ircbot by drtrigon was started on mailserver last night. This job 
really needs an infinity runtime (-l h_rt=INFINITY), but only uses low memory (40M).

Jobs that have a limited runtime should not be submitted with an infinity runtime value - even if the expected runtime is some days or weeks. E.g. pywikipedia script should be updated regulary from 
svn, so the must be end after some days and restartet. e.g. qcronsub -l h_rt 120:0:0 scriptname submits a job with a maximum runtime of five days.

--

If you have any questions about grid engine usage feel free to ask me or the 
toolserver admins on irc or mailing list.

Toolserver grid currently uses four servers and still has many cpu power and memory available. Only willow is currently very busy. Please do not run process on other servers than on login server 
(willow and nightshade) without sge resource control (except cronie for submitting jobs to grid engine on host submit).



Re: [Toolserver-l] Grid Engine config change

2012-03-03 Thread Platonides
On 03/03/12 22:46, Merlissimo wrote:
 All jobs must at least contain informations about maximum runtime (h_rt)
 and peak memory usage (virtual_free). This information may get
 obligatory in future. Currently only a warning message is shown.
 You also have to request other resources like sql connections, free temp
 space, etc. if these are needed by your job. Please read documentation
 on toolserverwiki i have updated today:
 https://wiki.toolserver.org/view/Job_scheduling
 This currently contains the main informations you need to know, but
 maybe i add some more examples later.

Thanks a lot Merlissimo. The new options look very good.


 I also have added a new script called qcronsub. This is the
 replacement for cronsub most of you used before. Differently to
 cronsub it accepts the same arguments as the original qsub command by
 grid engine. So now it is possible the add all resource values at
 command line.

I don't think qcronsub is an appropiate name for a qsub script which
doesn't enqueue tasks twice. The name recalls the old cronsub but the
interface is completely different. I'd call it qsub-unique or similar.


As the options to configure are now much more, and sge unintuitive
argument names, I think it would be appropiate to have a wizard which
provides generates for you the appropiate command line.
(Bonus if it's a curses program which edits submit crontab directly)

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette