Re: [Toolserver-l] Is the wikidata replication out of sync?
Am 02.03.2014 12:58, schrieb Kolossos: Is there any idea for a workaround to get from 300.000 Wikipedia articles the Wikidata Q-Number? [1] SELECT `ips_item_id` FROM `wb_items_per_site` WHERE `ips_site_id` = 'dewiki' AND `ips_site_page` = 'Bundesanstalt_für_Verwaltungsdienstleistungen'; dewiki and wikidatawiki are on the same database s5, so there is not difference. And replication is ok. Only commonswiki is missing on s5 since two days. Your query should not return any result on both databases because ips_site_page is using spaces instead of underscores. Because of the ü you could also use a wrong character encoding on your connection. For me SELECT @@hostname, `ips_item_id` FROM wikidatawiki_p.`wb_items_per_site` WHERE `ips_site_id` = 'dewiki' AND `ips_site_page` = 'Bundesanstalt für Verwaltungsdienstleistungen'; returns the correct result on toolserver and labs. ++-+ | @@hostname | ips_item_id | ++-+ | z-dat-s5-b |15793045 | ++-+ 1 row in set (0.00 sec) ++-+ | @@hostname | ips_item_id | ++-+ | labsdb1002 |15793045 | ++-+ 1 row in set (0.03 sec) But you could also rewrite your query and request dewiki instead of wikidatawiki: SELECT TRIM(LEADING 'Q' FROM TRIM(LEADING 'q' FROM pp_value)) AS ips_item_id FROM dewiki_p.page INNER JOIN dewiki_p.page_props ON page_id=pp_page WHERE page_namespace=0 AND page_title='Bundesanstalt_für_Verwaltungsdienstleistungen' AND pp_propname='wikibase_item'; +-+ | ips_item_id | +-+ | 15793045| +-+ 1 row in set (0.04 sec) ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
[Toolserver-l] broken database s5-user (dewiki/wikidata) because of full disk
On Wednesday i reported https://jira.toolserver.org/browse/TS-1693 z-dat-s5-b: ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet' No TS-Admin did take care about this initial problem. On Thursday there was no space left on /sql (according to munin). Since then many expected rows are missing on dewiki tables. My bot scans dewiki for pages with missing categories or pagelinks and has founded many wrong results in the last 48 hours. Is there any estimated time when s5-user will be usable again? I think a reimport is needed because of corrupted data (dewiki on sq-s5-rr (cassia) seems to be ok). s5 is growing fast because of wikidata. This week i also reported replication problems with other database servers: * TS-1687: wikidatawiki replication on cassia (sql-s5-rr) stopped at Sept 30th 2013 * TS-1688: commonswiki replication on cassia (sql-s5-rr) stopped at Sept 30th 2013 * TS-1689: commonswiki replication on z-dat-s5-b (sql-s5-user) stopped at Oct 8th 2013 * TS-1690: wikidatawiki replication on z-dat-s6-a (sql-s6-user/rr) stopped at Aug 10th 2013 * TS-1691: wikidatawiki replication on z-dat-s7-a (sql-s7-user/rr) stopped at Aug 10th 2013 * TS-1694: toolserver.servermapping wrong for s5 Merlissimo ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] When is the best time of day to run programs?
Am 10.04.2013 20:54, schrieb Marc A. Pelletier: On 04/10/2013 02:41 PM, Byrial Jensen wrote: unless there is some option I can use to tell that. What Tim mean is that, by default, SGE will schedule your job when sufficient resources are effectively available, rather that trying to predict when that will happen. That said, you /can/ specify both a minimal starting time (with -a) and a deadline (with -dl) creating a window during which SGE will try to run your job but, in general, it's easier and more reliable to let the gridengine pick the time. If your objective is to have your job run only when few others are trying to use the resources, you can also lower its priority (with -p) so that it will only execute your job when there isn't anything better to run. -- Marc If you are using sge you have not really care about. If you can use the hole cluster (linux and solaris) we mostly have enough capacity. It is only important that you can specify which resources (memory, runtime) you need. If you need user database access on s3 you simple add -l sql-s3-user=1. If you rise the number of db-resources replag must be lower to get your job scheduled (e.g. -l sql-s3-user=3 currently gets only scheduled if replag is below 1 hour). deadline option is not available on toolserver. -p mainly changes to priority compared to other jobs of yourself. For the global scheduling order job waiting time and used server resources by your user account in the last hours is more important. Webserver requests which are also causing much database queries are high at 14-23 UTC workdays. Most sge jobs are submittet between 0-3 UTC. Merlissimo ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] Ipv6 issues
Am 06.04.2013 21:27, schrieb DaB.: Hello, At Saturday 06 April 2013 21:22:53 DaB. wrote: Looks like ipv6 is broken at Willow and maybe at more servers. That probably explains some of the problems we seem to be having right now. AFAIS willow can not reached by IPv6 and can't reach anything with IPv6 itself. AFAIS it is not a firewall-issue. My experince with ipv6 at Solaris is very limited so I would prefer that Nosy takes a look first. Use one of the linux-host for bots as workaround if possible. I think nosy needs some sleep because she hasn't slept last night. getent ipnodes willow does not return the ipv6 address configured at /etc/hostname6.bnx0 So record is missing at dns. If dns cannot be changed ipv6 address must be added to /etc/inet/ipnodes (which is always a good idea if dns is not 100% reliable). /usr/lib/inet/in.ndpd is running. ifconfig -a6 shows that the ip address is configured three times as local interface for the same physical interface. Is this expected? And physical interface bnx0 is associated with a link local address only. This shows that the router is not propagating the site prefix. So you must change the router config or add the site prefix locally. Maybe interfaces were not plumbed. If you cannot found the reason you could create a 6to4 tunnel ;-). Just some ideas. Merlissimo ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] SGE queues stalled
Server sql-s1-rr was unavailable during the night. So resource sql-s1-rr was 0. Because i am not a ts admin i could not check that you requested this resource for this jobs. But just now nosy had a look and confirmed my suspicion. The job was started after resource sql-s1-rr was available again. Merlissimo Am 04.12.2012 16:44, schrieb Morten Wang: Looks like the issue got resolved around 09:00UTC, as from the qacct output: jobname opentasks jobnumber 873860 [...] qsub_time Mon Dec 3 22:19:03 2012 start_time Tue Dec 4 09:06:32 2012 end_time Tue Dec 4 09:21:18 2012 If you want to look into it more closely, this job was submitted by me (user: nettrom) through my crontab on the submit servers. Cheers, Morten ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] SGE queues stalled
Am 05.12.2012 16:21, schrieb Morten Wang: Is there a way for me to find that out myself, e.g. using qstat? I had a look at the qstat man-page, but judging by the descriptions it looks like something I'd have to fiddle around with if/when a job gets queued for a long time at some point in the future to figure out how to do. qstat -j jobnumber lists a scheduling info section. Example: qstat -j 799111 scheduling info: queue instance short-...@ortelius.toolserver.org dropped because it is overloaded: np_load_short=1.252930 (= 1.252930 + 0.8 * 0.00 with nproc=4) = 1.1 queue instance longrun-...@willow.toolserver.org dropped because it is overloaded: np_load_short=2.528320 (= 2.528320 + 0.8 * 0.00 with nproc=8) = 2.0 queue instance medium-...@ortelius.toolserver.org dropped because it is overloaded: np_load_short=1.252930 (= 1.252930 + 0.8 * 0.00 with nproc=4) = 0.8 queue instance longrun2-...@clematis.toolserver.org dropped because it is disabled queue instance longrun2-...@hawthorn.toolserver.org dropped because it is disabled (-l h_rt=57600,mem_free=890M,sql=1,sql-s7-rr=3,sqlprocs-s7=3,tmp_free=20M,user_slot=2,virtual_free=890M) cannot run globally because it offers only gc:sql-s7-rr=0.00 As you can see the job cannot run on clematis and hawthorn, because these queues are disabled. queues on willow and ortelius have temporary high load. wolfsbane, nightshade and yarrow are missing in this list so the bot could start on these servers. But the last line cannot run globally because it offers only gc:sql-s7-rr=0.00 shows that resource sql-s7-rr is not available on any server at the moment. That's why the job is queued until s7 database is usable again. Merlissimo ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] SGE queue waiting forever?
Am 24.11.2012 20:43, schrieb Marlen Caemmerer: Hello, a broken nfs mount was the source of the slow login. Dont know if it affected SGE as well but I tried to mount the user-store and I got the error Out of stream resources. There might be something fishy with the local disks too since cat /etc/vfstab took ages 2 times and ls resulted in no such file or directory twice too. But ipmi logs and the raid utility from solaris showed no faults. I rebooted and the system now seems to be running ok. Do you still see any issue? Cheers nosy At 20:32 on Nov 23th sge on turnera stopped and was started at damiana. The qmaster thread started successfully because it responses pings and so on. But the scheduler thread seems not to work. qconf -tsm does not show any status information (which whould be written to logs when is send this command). That's why no new jobs are send to execution clients. So the switch over on the ha-cluster failed. Merlissimo @All: If you are working on big files please copy them to local temp first (on sge $TMP contains an individual temp dir for the job). E.g. piping big files to other slow programs causes much nfs load because data must be read in small packages which cause high load on servers. That's why sge cannot schedule new jobs on nightshade since days. ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] How to have qsub mail output?
Am 01.10.2012 17:33, schrieb Tim Landscheidt: (anonymous) wrote: Thanks. I don't want to fiddle to much with SGE's in- testines, so I will probably either use | mail timl in my script or have my MUA insert the log in the status mail. I looked if I could submit this totally fascinating and innovative idea of mailing the output as a RFE upstream, but amazingly I didn't see a bugtracker at Oracle :-). I would even have had another idea: Impromptu jobs à la echo true | at now :-). Oracle closed-sourced it. There are a number of forks. Quick link: http://gridengine.org/blog/2011/11/23/what-now/ If these two issues are the only things missing in SGE, I think we can stay with it :-). Tim No, we are using an open source version based on SGE 6.2u5 patch 2 which was the last open source version by oracle (so for documentation refer to this version). I used Grid Engine 2011.11p1 but i also added some additional bug patches and special modifications for our toolserver version. But the mail feature you requested could be implemented without modifying any source code by our epilog script. Just open a jira ticket and i will think about this feature. Merlissimo ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] Future of the toolserver
I think its the wrong way to how the migration is done. Currently the plan is to disabled toolserver at the same time as tool Labs is full available. I am running very complex tools and queries which are highly optimized for the toolserver infrastructure so that results are returned in an acceptable time. Migrating these tools to a new environment would take very much time. So to run these tools without an outage there need to be along time both projects must be available. Why is WMF not helping maintaining parts of the toolserver? My impression is that most of load problems caused on the toolserver are database server problems. Many queries are very complex for the mysql database to handle because they are not key based (and they cannot be rewritten to be key based). Why can WMF not maintain only these database replication servers in short-term and make them accessable for toolserver user? Even if these are only rr-server on the first step this would be a big benefit. Yesterday is learned that wmf exmploys 90+ people that should have much experience for administration servers. After sql servers are maintained by wmf admins and hardware the current toolserver database server could be reused for other parts (maybe as webserver). Btw: On sunday i submitted a critical bug to bugzilla because since saturday my interwiki bot shows that there must be some misconfigured api squids (perhaps because they are out of sync). Nobody of these 90 wmf admins has taken care of this bug until now. Maybe solving this is not explicitly contained in the job descrition of most of these admins and so they do not get a point for their year goals. Toolserver also had a problem on sunday and volunteer admin DaB. solved this problem within the red-letter day. Merlissimo Am 25.09.2012 15:20, schrieb Thehelpfulone: On 25 September 2012 14:15, Ariel T. Glenn ar...@wikimedia.org mailto:ar...@wikimedia.org wrote: It might be helpful to put together a list of functions that the toolserver supports but that labs currently does not; such a list could serve as a basis for talks with the WMF. Perhaps the labs folks could makes some guesses at when those functions would be available and stable there, which would give everyone a better idea about how long the transition would realistically take. If I am not mistaken, one of the big items is the ability to run expensive db queries without impacting production. I don't believe this is possible from labs right now, and I'm not sure what their plans are for that. Ariel p.s. this post is by me as a former toolserver user, having nothing to do with my status as a wmf staff member etc. There is a partial list at http://www.mediawiki.org/wiki/Wikimedia_Labs/Toolserver_features_wanted. According to the milestones at http://www.mediawiki.org/wiki/Wikimedia_Engineering/2012-13_Goals#Milestones_by_quarter_2, we should be expecting database replication from production and user databases in January-March 2013. ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] Future of the toolserver
Am 25.09.2012 20:48, schrieb Erik Moeller: Toolserver is in fact hosted by the Wikimedia Foundation today, in our Amsterdam data-center. [..] We also maintain the database replication on our end which enables tools to function. I don't know all internal systems, but i think by maintaining you mean: grant access to mysql binlogs, traffic costs and sometimes creating a dump. Why can WMF not administrate the hole database replication servers for toolserver users in short-term if WMDE should not spend money on this anymore? Setting up new replication servers at production system is done quite often. Adding views for hiding private data and adding access control based on toolserver ldap should be possible. The rest of the toolserver infrastructure won't be touched by this change. Currently the replication of database cluster of s3/s6/s7 (all on server hyacinth) is lagging for more than an hour, performance is very low and complex queries are taking 10 times longer than normal, so that some of my queries can not finish within maximum allowed runtime (which brakes some of my tools since about five days). To solve this problem new hardware is need. I as toolserver user don't care if support comes from WMDE or WMF as long as this problem is fixed. I am the one with the oversized user talk page because other authors asked me why my tools are not working. Merlissimo ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] qcronsub warning: Please add the os this job can run on by adding parameter -l arch='*'|sol|lx
Hi, i added this warning today (according to prior agreement with DaB.) when a job is submitted without arch resource. This has two reasons: # First next week the default setting will change from solaris only to all servers. This was announced in July (http://lists.wikimedia.org/pipermail/toolserver-l/2012-July/005110.html) # Secondly due to some server problems of the last two days many jobs need a longer runtime which lead to higher load on willow. Last night some jobs waited up to hours until willow was available again although other servers had unused cpu and memory at the same time. In most cases you can simply add -l arch='*' as argument to qcronsub/qsub without any problems. Most scripts should run on solaris and linux, but perhaps you should test it before to be sure. If your job is currently only executable on solaris you must add -l arch=sol before the default setting will change next week. For more information check https://wiki.toolserver.org/view/Job_scheduling. I also noticed that on user-store outage on sunday only one job was waiting some hours because of the missing resource fs-user-store, but many people complained about their failed jobs. When your job needs a special resource check if that is requestable on https://wiki.toolserver.org/view/Job_scheduling#Optional_resources. SGE will execute your job only when the requested resource is available. If you job is already running and a needed resource is gone you can also exit you script with code 99. This requeues your job when the resource is available again. @Krinkle You got the message while i was hacking the live jsv script, I simply copied the runtime warning message and then changed it. This was so easy that i save myself to disable jsv while rewriting. Currently in total there is enough cpu and memory free for all user scripts. SGE jobs are executed on five different servers and more server could be added easily. The main problem is the load distribution because many users do not use SGE which is bad on a shared system and leads to overload on few servers. So please use cronie on host submit and qsub/qcronsub to submit jobs to sge instead of running them on a special server directly. Toolserver hardware is getting older and server may go away suddenly because of problems. With sge you do not have to care about it. Merlissimo P.S.: I want to thank DaB. for his engagement to get more money for hardware on toolserver cluster next year. I also think this is really needed especially for the database servers. You can follow the discussion on http://meta.wikimedia.org/wiki/Talk:Wikimedia_Deutschland/2013_annual_plan_draft/de#Toolserver. Am 24.09.2012 18:31, schrieb Krinkle: On Sep 24, 2012, at 6:20 PM, Platonides platoni...@gmail.com wrote: On 24/09/12 18:07, Krinkle wrote: Can someone decode this? What is this? -- Krinkle Begin forwarded message: *From: *r...@toolserver.org mailto:r...@toolserver.org (Cron Daemon) *Subject: **Cron krinkle@hawthorn qcronsub -N dbbot_wm -m n -j y -b y -l h_rt=INFINITY -l virtual_free=90M $HOME/bots/dbbot-wm-start.sh* *Date: *September 24, 2012 6:05:07 PM GMT+02:00 *To: *krin...@toolserver.org mailto:krin...@toolserver.org warning: Please add maximum runtime by adding parameter [33m-l arch=[0msol|lx The text asks you to place a time limit. The parameter (embedded in posix colors despite not being output to a terminal) to specify if it needs a linux or solaris server. However, if I try to execute it, I get a much saner message: $ qcronsub -N dbbot_wm -m n -j y -b y -l h_rt=INFINITY -l virtual_free=90M /home/krinkle/bots/dbbot-wm-start.sh Unable to run job: Script not executable: /home/krinkle/bots/dbbot-wm-start.sh. Exiting. warning: Please add the os this job can run on by adding parameter -l arch='*'|sol|lx For more information read documentation at https://wiki.toolserver.org/view/Job_scheduling As this is a php script, your parameter would be «-l arch='*'» Yes, I've added `-l arch='*'` to it already a minute ago. Warnings are gone, not sure why it nagged about maximum runtime, it already has INFINITY. I'm not sure why arch=x isn't the default though, or maybe it is but outputs the warning anyway? A warning like that may be useful, but do consider that cronie from submit will send e-mails for it. -- Krinkle ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] When to execute cron-tasks
Am 14.09.2012 14:57, schrieb Tim Landscheidt: DaB. wrote: In a ideal world that would be no problem, but in real world that CAN be a problem. Why? Because many users have the same idea and our submit-hosts fail than with (CRON) CAN'T FORK (child_process): Not enough space. Last night 41 tasks were successful started at midnight, an unknown number failed. Of course we could just hit the problem with buying new hardware, but most time of the day these hosts do idle. On solaris cron fixing this problem is easy because you can change the queue config using /etc/cron.d/queuedefs (see man queuedefs for more info). There you could define e.g. c.35j3n17w which means that only 35 jobs are started in parallel and the rest is rescheduled after 17 seconds if there are free slots. The standard solaris config c.100j2n60w would be bad, because it starts more than 41 jobs and the rest is reschuduled after 60 seconds when all the next cron jobs are starting, too. Does anybody know if vixie cron (=cronie on ts) supports sth. similar? That would solve the problem. btw.: This bug only exists because many people on this mailinglist did not like the solaris crontab format and requested to install vixie cron as alternative cron some years ago. Merlissimo ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] Anoter SGE question
All jobs named subster_en were executed successfully by sge (mostly on wolfsbane). The return code of your python script was 0, but the runtime was only about a minute. You can check it by executing e.g. qacct -j subster_en -d 10 So you should check your log files ~drtrigon/subster_en.o2160088, ~drtrigon/subster_en.o2156743, if something is wrong with your python script. For your other questions: stderr and stdout are buffered by sge because they are send over network. At the toolserver configuration it is send to localhost by default because all execution servers have the same filesystems mounted. At a standard cluster configuration out/err files are written on submit host instead. Merlissimo PS:: qcronsub does not output anything if the job was submitted succesfully and all resources are requested correctly. So no need to send the output to /dev/null. On 16.06.2012 23:33, Dr. Trigon wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello nosy! Thanks for your reply! It is this one: 0 1 * * * qcronsub -l h_rt=06:00:00 -l virtual_free=200M -m as -j y -b y -N subster_en $HOME/pywikipedia/bot_control.py -subster -cron - -lang:en/dev/null Greetings DrTrigon On 15.06.2012 17:51, Marlen Caemmerer wrote: Hello, which one is it exactly? Cheers nosy On Fri, 15 Jun 2012, Dr. Trigon wrote: Date: Fri, 15 Jun 2012 10:57:43 From: Dr. Trigon dr.tri...@surfeu.ch Reply-To: toolserver-l@lists.wikimedia.org To: Toolserver-l@lists.wikimedia.org Subject: [Toolserver-l] Anoter SGE question Hello all! I have a bunch of cronie jobs calling qcronsub for several times with very similar settings (just the language of the wiki used changes). In total there are 5 jobs - regurarly (about ever 2nd day) one of those jobs does not get executed and it is always the same one. I do not get any error mail. Any idea? Thanks a lot and greetings DrTrigon ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] How to silence qsub/qcronsub?
Hi, i have changed qcronsub, so that the long output about successful submit and output about rejection because job already queued is now suppressed. All other output (e.g. warning) is still shown. Using -verbose enables all output as it was before my change. If you are using -terse (maybe used by some script experts) the job number is still returned. On 08.03.2012 23:13, Platonides wrote: BTW, why is qcronsub at /sge62/bin/sol-amd64/qcronsub ? Wouldn't /opt/local/bin/qcronsub (like cronsub) be more appropiate? I do not have access to other folders and using this path is easier for me to update scripts for different platforms. On 08.03.2012 09:32, Simon Kågedal Reimer wrote: Ah, sorry, that didn't work - can't run qsub directly from cronie since we need to set some environment variables etc. You can add needed environment variables by adding -v option to qsub/qcronsub e.g.: qcronsub -v MYARG1=myvalue,MYARG2=myvalue script.sh There are also other possibilities like adding variables to job context (-ac). More information detailed information are available on the qsub manpage (man qsub). Merlissimo ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] About cron
Platonides wrote: On 04/03/12 17:42, محمد الجداوي wrote: Hi there. I made a cron for operating /clean_sandbox.py/ every 6 hours, I made a modified copy from /clean_sandbox.py/ for me and uploaded it to my account on toolserver (outside the pywikipedia folder). The problem i face is that i can't write the proper code, I made this code: #!/bin/sh #$ -j y #$ -o /dev/null $HOME python clean_sandbox.py -lang:ar -family:wikipedia But it didn't work. In which folder? I suspect you are getting that run in the wrong folder. Moreover, that $HOME there seems useless. So, let's assume it's in at /home/name/local_clean/clean_sandbox.py First step, check manually that it works: cd /home/name/local_clean/ python clean_sandbox.py -lang:ar -family:wikipedia Does it run? Do you have any problems for eg. not finding the rest of pywikipediabot? Then, when creating the script, make it run in that folder: #!/bin/sh #$ -j y #$ -o /dev/null #$ -l h_rt=00:10:00 #$ -l virtual_free=20M #$ -wd /home/name/local_clean/clean_sandbox.py python clean_sandbox.py -lang:ar -family:wikipedia I'd also recommend you to not run it with -o /dev/null the first time, so you can see the output files if something were wrong. (I also added there a time limit of 10 minutes to clean the sandbox, and an arbitrary memory size of 20M, in line with Merlissimo guidelines) -wd specifies the working _directory_ and not a file. It's needed if you use relative path names as your script does. Joining the error stream into the standard output stream and writing both to /dev/null isn't a good idea if you are searching for an error cause. Merlissimo ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
[Toolserver-l] Grid Engine config change
). Sincerely, Merlissimo ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette