Re: [Toolserver-l] Grid Engine config change
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04.03.2012 01:06, Merlissimo wrote: In both cases the old behavior was without -m a -b y, so 'cronsub [jobname] [command]' has become 'qcronsub -l h_rt=06:00:00 -l virtual_free=100M -N [jobname] [command] 'cronsub -l [jobname] [command]' has become 'qcronsub -l h_rt=INFINITY -l virtual_free=100M -N [jobname] [command] The -b y option is mostly useful for binaries, e.g. if you don't submit the python script itself, but call the binary interpreter (python) with an argument. It is just an option if the submitted script file should be copied to a local filesystem on execution server (which increases performance, makes nfs error impossible and was always the default setting) or executed directly from your home (if you use -b y). In most cases this option isn't needed and copying is the best for most shell scripts. Thanks for that info! I thought to remember once there was a comment about cronsub copying (like -b y does) the script...?!? Anyway I had to use '-b y' else my script closed directly after execution. What am I doing wrong here? (my script do read and write files in my home, e.g. config and log files - may be this is related?) Greetings DrTrigon -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk9TPfcACgkQAXWvBxzBrDDQMwCffVmUUDwH2bkPgTedGhk2W/7f uq4AoMqopPOTZfz4WnmpsSUAWoUcx8Ve =v1y9 -END PGP SIGNATURE- ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] Grid Engine config change
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04.03.2012 01:21, Platonides wrote: On 04/03/12 00:17, Dr. Trigon wrote: And here my key question arises; you mentioned 'qacct' to get more info (thanks for this hint) and this is one of the biggest problem I had with the whole SGE stuff; I was not able to get a complete docu whether on the toolserver nor else. At the moment, on the toolserver commands like 'qstat' or 'qdel' are not covered anymore. I (we) would like to know more about this great system. ? They are documented in the server man pages. Just run man qacct Or PAGER=less man qacct as I find it a nicer one. No this was not my point. If I know there is a command like e.g. 'qacct' I know how to get help. My question was where do I get a list of all SGE commands, options and backgrund infos? (it does not need to be a reference book, but a user manual would be nice) I searched the net several times for SGE infos and found some in several places but that have been small parts of the whole docu only. But I assume there has to be something... ;) Greetings DrTrigon -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk9TPuAACgkQAXWvBxzBrDCs8gCg2gZ0dDQh/GRykzsZ+h7h6roC jFoAoL6RkEoc7+8LBQKb9TFpDlJfh7+x =tAG2 -END PGP SIGNATURE- ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] Grid Engine config change
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04.03.2012 01:06, Merlissimo wrote: Some weeks ago i installed a script that removes empty log files for standard error/output stream after job execution. Many people used this option to prevent that their homedir contain so many empty error logs. Could you may be explain how to use this script you installed? Another question I have; Is there a way to finish and start a script without having to wait until a queue becomes free by re-using to old? E.g. if I like to restart my ircbot (running for INFTY time) I have to wait some time until the needed queue becomes available, but what about just re-using the queue it already had before? Or is there an alternative way to have such a ircbot script running continuously without having to wait (e.g. more than 1 minute) during a restart? Can I trigger a restart (reading the new script from hdd) while keeping it in the queue? Thanks again! Greetings DrTrigon -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk9TQoYACgkQAXWvBxzBrDDaagCgvt5IyYPxdi0yXG90qgdLG87E UkYAoLQ+Zz+tJC8Q8wX8So+slSK6X1mV =5XZz -END PGP SIGNATURE- ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] Grid Engine config change
On 04/03/12 01:06, Merlissimo wrote: In both cases the old behavior was without -m a -b y, so (...) The -b y option is mostly useful for binaries, e.g. if you don't submit the python script itself, but call the binary interpreter (python) with an argument. (...) Actually, cronsub works as if providing -b y. It doesn't provide -b y, but submits a new script which then calls the calling script, so it's like submitting the script with -b y. ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] Grid Engine config change
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04.03.2012 16:25, Platonides wrote: On 04/03/12 01:06, Merlissimo wrote: In both cases the old behavior was without -m a -b y, so (...) The -b y option is mostly useful for binaries, e.g. if you don't submit the python script itself, but call the binary interpreter (python) with an argument. (...) Actually, cronsub works as if providing -b y. It doesn't provide -b y, but submits a new script which then calls the calling script, so it's like submitting the script with -b y. Aaa-ha! That explains a lot! Thanks for the hint! Greetings -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk9Tnk4ACgkQAXWvBxzBrDBAgwCeLKRGxQm+VtNY8WweJx1167Ar 1HIAmwdqIzlRo8arZz4P0G3eAURhSUnI =STMa -END PGP SIGNATURE- ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
[Toolserver-l] Grid Engine config change
Hello toolserver users, as you may know, there were some bigger problems related to sun grid engine starting in november 2011. I asked DaB. to become a sge manager for helping them to solve these problems. During the last months i silently started reconfiguring sge in small steps so that it was always possible to use it as before and no downtime was needed. This took some time because i am only a volunteer and i had to changes nearly everything. Additional Nosy and DaB. changed some solaris configurations that i proposed. All scripts that used grid engine before can continue to run without changes. But maybe you can increase your script performance by adding additional informations. In the past you were requested to choose a suitable queue (all.q or longrun) for your job. Many people choosed a queue that did not fit best for their task. So i changed this procedure. Now you have to add all resources that your job needs during runtime on job submition. Then sge will choose queue and host that fits best for your requirements. So you don't have to care about different queues anymore (you may have seen that there are much more queues than before). All jobs must at least contain informations about maximum runtime (h_rt) and peak memory usage (virtual_free). This information may get obligatory in future. Currently only a warning message is shown. You also have to request other resources like sql connections, free temp space, etc. if these are needed by your job. Please read documentation on toolserverwiki i have updated today: https://wiki.toolserver.org/view/Job_scheduling This currently contains the main informations you need to know, but maybe i add some more examples later. I also have added a new script called qcronsub. This is the replacement for cronsub most of you used before. Differently to cronsub it accepts the same arguments as the original qsub command by grid engine. So now it is possible the add all resource values at command line. Please note that you should always use cronie at submit.toolserver.org for submitting jobs to sge by cron. These cron tasks will always be executed even if one host (e.g. clematis or willow) is down. This is the suggested usage since about 17 months. Many people have migrated their cron jobs from nightshade to willow during the last weeks. But they will have the same problem again if willow must be shut down for a longer time (which hopefully never happens). -- Example: This morning Dr. Trigon complained that his job mainbot did not run immediatly and was queued for a long time. I would guess he submitted his job from cron using cronsub mainbot -l /home/drtrigon/pywikipedia/mainbot.py This indicates that the job runs forevery (longrun) with unkown memory usage. So grid engine was only able to start this job on willow. It is not possible to run infinite job on the webservers (only shorter jobs are allowed so that most jobs have finished before high webserver usage is expected during the evening). Nor it was possible to run it on the server running mail transfer agent which only have less than 500MB memory free, but much cpu power (expected memory usage is unkown). Other servers like nightshade and yarrow aren't currently available. According to the last run of this job it takes about 2 hours and 30 minutes runtime and had a peek usage of 370 MB memory. I got these values by requesting grid engine about usage statistics of the last ten days: qacct -j mainbot -d 10. To be safe that the job gets always enough resouces i would suggest to raise the values to 4 hours and 500MB memory. It is not a problem if you request more resouces than really needed, but job needing more resources than requested may be killed. So the new submit command would be: qcronsub -N mainbot -l h_rt=4:00:00 -l virtual_free=500MB /home/drtrigon/pywikipedia/mainbot.py This job could run on both webserver during low load and on willow. Grid engine also knows that it cannot run on mailservers because of high memory usage. The job ircbot by drtrigon was started on mailserver last night. This job really needs an infinity runtime (-l h_rt=INFINITY), but only uses low memory (40M). Jobs that have a limited runtime should not be submitted with an infinity runtime value - even if the expected runtime is some days or weeks. E.g. pywikipedia script should be updated regulary from svn, so the must be end after some days and restartet. e.g. qcronsub -l h_rt 120:0:0 scriptname submits a job with a maximum runtime of five days. -- If you have any questions about grid engine usage feel free to ask me or the toolserver admins on irc or mailing list. Toolserver grid currently uses four servers and still has many cpu power and memory available. Only willow is currently very busy. Please do not run process on other servers than on login server (willow and nightshade) without sge resource control (except cronie for submitting jobs to grid engine on host submit).
Re: [Toolserver-l] Grid Engine config change
On 03/03/12 22:46, Merlissimo wrote: All jobs must at least contain informations about maximum runtime (h_rt) and peak memory usage (virtual_free). This information may get obligatory in future. Currently only a warning message is shown. You also have to request other resources like sql connections, free temp space, etc. if these are needed by your job. Please read documentation on toolserverwiki i have updated today: https://wiki.toolserver.org/view/Job_scheduling This currently contains the main informations you need to know, but maybe i add some more examples later. Thanks a lot Merlissimo. The new options look very good. I also have added a new script called qcronsub. This is the replacement for cronsub most of you used before. Differently to cronsub it accepts the same arguments as the original qsub command by grid engine. So now it is possible the add all resource values at command line. I don't think qcronsub is an appropiate name for a qsub script which doesn't enqueue tasks twice. The name recalls the old cronsub but the interface is completely different. I'd call it qsub-unique or similar. As the options to configure are now much more, and sge unintuitive argument names, I think it would be appropiate to have a wizard which provides generates for you the appropiate command line. (Bonus if it's a curses program which edits submit crontab directly) ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette