from:"Merlissimo"

Re: [Toolserver-l] Is the wikidata replication out of sync?

2014-03-02 Thread Merlissimo

Am 02.03.2014 12:58, schrieb Kolossos:
 Is there any idea for a workaround to get from 300.000 Wikipedia
 articles the Wikidata Q-Number?
 
 [1]   SELECT `ips_item_id` FROM `wb_items_per_site`
 WHERE `ips_site_id` = 'dewiki'
 AND `ips_site_page` = 'Bundesanstalt_für_Verwaltungsdienstleistungen';

dewiki and wikidatawiki are on the same database s5, so there is not
difference. And replication is ok. Only commonswiki is missing on s5
since two days.
Your query should not return any result on both databases because
ips_site_page is using spaces instead of underscores. Because of the ü
you could also use a wrong character encoding on your connection.
For me

 SELECT @@hostname, `ips_item_id` FROM
wikidatawiki_p.`wb_items_per_site` WHERE `ips_site_id` = 'dewiki' AND
`ips_site_page` = 'Bundesanstalt für Verwaltungsdienstleistungen';

returns the correct result on toolserver and labs.

++-+
| @@hostname | ips_item_id |
++-+
| z-dat-s5-b |15793045 |
++-+
1 row in set (0.00 sec)
++-+
| @@hostname | ips_item_id |
++-+
| labsdb1002 |15793045 |
++-+
1 row in set (0.03 sec)


But you could also rewrite your query and request dewiki instead of
wikidatawiki:

SELECT TRIM(LEADING 'Q' FROM TRIM(LEADING 'q' FROM pp_value)) AS ips_item_id
 FROM dewiki_p.page
  INNER JOIN dewiki_p.page_props ON page_id=pp_page
 WHERE page_namespace=0 AND
page_title='Bundesanstalt_für_Verwaltungsdienstleistungen'
 AND pp_propname='wikibase_item';

+-+
| ips_item_id |
+-+
| 15793045|
+-+
1 row in set (0.04 sec)


___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

[Toolserver-l] broken database s5-user (dewiki/wikidata) because of full disk

2013-10-19 Thread Merlissimo

On Wednesday i reported
https://jira.toolserver.org/browse/TS-1693 z-dat-s5-b: ERROR 2013
(HY000): Lost connection to MySQL server at 'reading initial
communication packet'

No TS-Admin did take care about this initial problem. On Thursday there
was no space left on /sql (according to munin).
Since then many expected rows are missing on dewiki tables. My bot scans
dewiki for pages with missing categories or pagelinks and has founded
many wrong results in the last 48 hours.

Is there any estimated time when s5-user will be usable again? I think a
reimport is needed because of corrupted data (dewiki on sq-s5-rr
(cassia) seems to be ok). s5 is growing fast because of wikidata.

This week i also reported replication problems with other database servers:
* TS-1687: wikidatawiki replication on cassia (sql-s5-rr) stopped at
Sept 30th 2013
* TS-1688: commonswiki replication on cassia (sql-s5-rr) stopped at Sept
30th 2013
* TS-1689: commonswiki replication on z-dat-s5-b (sql-s5-user) stopped
at Oct 8th 2013
* TS-1690: wikidatawiki replication on z-dat-s6-a (sql-s6-user/rr)
stopped at Aug 10th 2013
* TS-1691: wikidatawiki replication on z-dat-s7-a (sql-s7-user/rr)
stopped at Aug 10th 2013
* TS-1694: toolserver.servermapping wrong for s5

Merlissimo

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] When is the best time of day to run programs?

2013-04-10 Thread Merlissimo


Am 10.04.2013 20:54, schrieb Marc A. Pelletier:

On 04/10/2013 02:41 PM, Byrial Jensen wrote:

unless there is some option I can use to tell that.


What Tim mean is that, by default, SGE will schedule your job when
sufficient resources are effectively available, rather that trying to
predict when that will happen.

That said, you /can/ specify both a minimal starting time (with -a) and
a deadline (with -dl) creating a window during which SGE will try to
run your job but, in general, it's easier and more reliable to let the
gridengine pick the time.

If your objective is to have your job run only when few others are
trying to use the resources, you can also lower its priority (with -p)
so that it will only execute your job when there isn't anything better
to run.

-- Marc


If you are using sge you have not really care about. If you can use the 
hole cluster (linux and solaris) we mostly have enough capacity. It is 
only important that you can specify which resources (memory, runtime) 
you need.


If you need user database access on s3 you simple add -l sql-s3-user=1. 
If you rise the number of db-resources replag must be lower to get your 
job scheduled (e.g. -l sql-s3-user=3 currently gets only scheduled if 
replag is below 1 hour).


deadline option is not available on toolserver. -p mainly changes to 
priority compared to other jobs of yourself. For the global scheduling 
order job waiting time and used server resources by your user account in 
the last hours is more important.


Webserver requests which are also causing much database queries are high 
at 14-23 UTC workdays. Most sge jobs are submittet between 0-3 UTC.


Merlissimo

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Ipv6 issues

2013-04-06 Thread Merlissimo


Am 06.04.2013 21:27, schrieb DaB.:

Hello,
At Saturday 06 April 2013 21:22:53 DaB. wrote:

Looks like ipv6 is broken at Willow and maybe at more servers. That
probably explains some of the problems we seem to be having right now.


AFAIS willow can not reached by IPv6 and can't reach anything with IPv6
itself. AFAIS it is not a firewall-issue. My experince with ipv6 at Solaris is
very limited so I would prefer that Nosy takes a look first.
Use one of the linux-host for bots as workaround if possible.


I think nosy needs some sleep because she hasn't slept last night.

getent ipnodes willow does not return the ipv6 address configured at 
/etc/hostname6.bnx0
So record is missing at dns. If dns cannot be changed ipv6 address must 
be added to /etc/inet/ipnodes (which is always a good idea if dns is not 
100% reliable). /usr/lib/inet/in.ndpd is running.


ifconfig -a6 shows that the ip address is configured three times as 
local interface for the same physical interface. Is this expected? And 
physical interface bnx0 is associated with a link local address only. 
This shows that the router is not propagating the site prefix. So you 
must change the router config or add the site prefix locally. Maybe 
interfaces were not plumbed.


If you cannot found the reason you could create a 6to4 tunnel ;-).

Just some ideas.

Merlissimo

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] SGE queues stalled

2012-12-05 Thread Merlissimo

Server sql-s1-rr was unavailable during the night. So resource sql-s1-rr 
was 0.


Because i am not a ts admin i could not check that you requested this 
resource for this jobs. But just now nosy had a look and confirmed my 
suspicion. The job was started after resource sql-s1-rr was available again.


Merlissimo

Am 04.12.2012 16:44, schrieb Morten Wang:

Looks like the issue got resolved around 09:00UTC, as from the qacct output:

jobname opentasks
jobnumber 873860
[...]
qsub_time Mon Dec 3 22:19:03 2012
start_time Tue Dec 4 09:06:32 2012
end_time Tue Dec 4 09:21:18 2012

If you want to look into it more closely, this job was submitted by me
(user: nettrom) through my crontab on the submit servers.


Cheers,
Morten



___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] SGE queues stalled

2012-12-05 Thread Merlissimo


Am 05.12.2012 16:21, schrieb Morten Wang:


Is there a way for me to find that out myself, e.g. using qstat?  I had a
look at the qstat man-page, but judging by the descriptions it looks like
something I'd have to fiddle around with if/when a job gets queued for a
long time at some point in the future to figure out how to do.


qstat -j jobnumber

lists a scheduling info section.

Example:
qstat -j 799111

scheduling info:

queue instance short-...@ortelius.toolserver.org dropped because it is 
overloaded: np_load_short=1.252930 (= 1.252930 + 0.8 * 0.00 with 
nproc=4) = 1.1
queue instance longrun-...@willow.toolserver.org dropped because it is 
overloaded: np_load_short=2.528320 (= 2.528320 + 0.8 * 0.00 with 
nproc=8) = 2.0
queue instance medium-...@ortelius.toolserver.org dropped because it 
is overloaded: np_load_short=1.252930 (= 1.252930 + 0.8 * 0.00 with 
nproc=4) = 0.8
queue instance longrun2-...@clematis.toolserver.org dropped because it 
is disabled
queue instance longrun2-...@hawthorn.toolserver.org dropped because it 
is disabled
(-l 
h_rt=57600,mem_free=890M,sql=1,sql-s7-rr=3,sqlprocs-s7=3,tmp_free=20M,user_slot=2,virtual_free=890M) 
cannot run globally because it offers only gc:sql-s7-rr=0.00


As you can see the job cannot run on clematis and hawthorn, because 
these queues are disabled. queues on willow and ortelius have temporary 
high load. wolfsbane, nightshade and yarrow are missing in this list so 
the bot could start on these servers. But the last line cannot run 
globally because it offers only gc:sql-s7-rr=0.00 shows that 
resource sql-s7-rr is not available on any server at the moment. That's 
why the job is queued until s7 database is usable again.


Merlissimo

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] SGE queue waiting forever?

2012-11-24 Thread Merlissimo


Am 24.11.2012 20:43, schrieb Marlen Caemmerer:

Hello,

a broken nfs mount was the source of the slow login.
Dont know if it affected SGE as well but I tried to mount the user-store
and I got the error Out of stream resources.
There might be something fishy with the local disks too since cat
/etc/vfstab took ages 2 times and ls resulted in no such file or
directory twice too.
But ipmi logs and the raid utility from solaris showed no faults.
I rebooted and the system now seems to be running ok.
Do you still see any issue?

Cheers
 nosy



At 20:32 on Nov 23th sge on turnera stopped and was started at damiana. 
The qmaster thread started successfully because it responses pings and 
so on. But the scheduler thread seems not to work. qconf -tsm does not 
show any status information (which whould be written to logs when is 
send this command). That's why no new jobs are send to execution clients.


So the switch over on the ha-cluster failed.

Merlissimo

@All: If you are working on big files please copy them to local temp 
first (on sge $TMP contains an individual temp dir for the job). E.g. 
piping big files to other slow programs causes much nfs load because 
data must be read in small packages which cause high load on servers. 
That's why sge cannot schedule new jobs on nightshade since days.


___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] How to have qsub mail output?

2012-10-04 Thread Merlissimo


Am 01.10.2012 17:33, schrieb Tim Landscheidt:

(anonymous) wrote:


Thanks.  I don't want to fiddle to much with SGE's in-
testines, so I will probably either use | mail timl in my
script or have my MUA insert the log in the status mail.



   I looked if I could submit this totally fascinating and
innovative idea of mailing the output as a RFE upstream, but
amazingly I didn't see a bugtracker at Oracle :-).  I would
even have had another idea: Impromptu jobs à la echo true |
at now :-).



Oracle closed-sourced it. There are a number of forks.
Quick link: http://gridengine.org/blog/2011/11/23/what-now/


If these two issues are the only things missing in SGE, I
think we can stay with it :-).

Tim


No, we are using an open source version based on SGE 6.2u5 patch 2 which 
was the last open source version by oracle (so for documentation refer 
to this version). I used Grid Engine 2011.11p1 but i also added some 
additional bug patches and special modifications for our toolserver version.


But the mail feature you requested could be implemented without 
modifying any source code by our epilog script. Just open a jira ticket 
and i will think about this feature.


Merlissimo

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Future of the toolserver

2012-09-25 Thread Merlissimo

I think its the wrong way to how the migration is done. Currently the
plan is to disabled toolserver at the same time as tool Labs is full
available.

I am running very complex tools and queries which are highly optimized
for the toolserver infrastructure so that results are returned in an
acceptable time. Migrating these tools to a new environment would take
very much time. So to run these tools without an outage there need to be
along time both projects must be available.

Why is WMF not helping maintaining parts of the toolserver? My
impression is that most of load problems caused on the toolserver are
database server problems. Many queries are very complex for the mysql
database to handle because they are not key based (and they cannot be
rewritten to be key based). Why can WMF not maintain only these database
replication servers in short-term and make them accessable for
toolserver user? Even if these are only rr-server on the first step this
would be a big benefit. Yesterday is learned that wmf exmploys 90+
people that should have much experience for administration servers.
After sql servers are maintained by wmf admins and hardware the current
toolserver database server could be reused for other parts (maybe as
webserver).

Btw: On sunday i submitted a critical bug to bugzilla because since
saturday my interwiki bot shows that there must be some misconfigured
api squids (perhaps because they are out of sync). Nobody of these 90
wmf admins has taken care of this bug until now. Maybe solving this is
not explicitly contained in the job descrition of most of these admins
and so they do not get a point for their year goals. Toolserver also had
a problem on sunday and volunteer admin DaB. solved this problem within
the red-letter day.

Merlissimo

Am 25.09.2012 15:20, schrieb Thehelpfulone:

On 25 September 2012 14:15, Ariel T. Glenn ar...@wikimedia.org
mailto:ar...@wikimedia.org wrote:

It might be helpful to put together a list of functions that the
toolserver supports but that labs currently does not; such a list could
serve as a basis for talks with the WMF. Perhaps the labs folks could
makes some guesses at when those functions would be available and stable
there, which would give everyone a better idea about how long the
transition would realistically take.

If I am not mistaken, one of the big items is the ability to run
expensive db queries without impacting production. I don't believe this
is possible from labs right now, and I'm not sure what their plans are
for that.

Ariel

p.s. this post is by me as a former toolserver user, having nothing to
do with my status as a wmf staff member etc.

There is a partial list at
http://www.mediawiki.org/wiki/Wikimedia_Labs/Toolserver_features_wanted.
According to the milestones at
http://www.mediawiki.org/wiki/Wikimedia_Engineering/2012-13_Goals#Milestones_by_quarter_2,
we should be expecting database replication from production and user
databases in January-March 2013.

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list:
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Future of the toolserver

2012-09-25 Thread Merlissimo


Am 25.09.2012 20:48, schrieb Erik Moeller:

Toolserver is in fact hosted by the Wikimedia Foundation today, in our
Amsterdam data-center. [..] We also maintain the database replication
on our end which enables tools to function.

I don't know all internal systems, but i think by maintaining you mean: 
grant access to mysql binlogs, traffic costs and sometimes creating a dump.


Why can WMF not administrate the hole database replication servers for 
toolserver users in short-term if WMDE should not spend money on this 
anymore? Setting up new replication servers at production system is done 
quite often. Adding views for hiding private data and adding access 
control based on toolserver ldap should be possible. The rest of the 
toolserver infrastructure won't be touched by this change.


Currently the replication of database cluster of s3/s6/s7 (all on server 
hyacinth) is lagging for more than an hour, performance is very low and 
complex queries are taking 10 times longer than normal, so that some of 
my queries can not finish within maximum allowed runtime (which brakes 
some of my tools since about five days). To solve this problem new 
hardware is need. I as toolserver user don't care if support comes from 
WMDE or WMF as long as this problem is fixed. I am the one with the 
oversized user talk page because other authors asked me why my tools are 
not working.


Merlissimo

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] qcronsub warning: Please add the os this job can run on by adding parameter -l arch='*'|sol|lx

2012-09-24 Thread Merlissimo

Hi,
i added this warning today (according to prior agreement with DaB.) when
a job is submitted without arch resource. This has two reasons:

# First next week the default setting will change from solaris only to
all servers. This was announced in July
(http://lists.wikimedia.org/pipermail/toolserver-l/2012-July/005110.html)

# Secondly due to some server problems of the last two days many jobs
need a longer runtime which lead to higher load on willow. Last night
some jobs waited up to hours until willow was available again although
other servers had unused cpu and memory at the same time.

In most cases you can simply add -l arch='*' as argument to
qcronsub/qsub without any problems. Most scripts should run on solaris
and linux, but perhaps you should test it before to be sure. If your job
is currently only executable on solaris you must add -l arch=sol
before the default setting will change next week. For more information
check https://wiki.toolserver.org/view/Job_scheduling.

I also noticed that on user-store outage on sunday only one job was
waiting some hours because of the missing resource fs-user-store, but
many people complained about their failed jobs. When your job needs a
special resource check if that is requestable on
https://wiki.toolserver.org/view/Job_scheduling#Optional_resources.
SGE will execute your job only when the requested resource is available.
If you job is already running and a needed resource is gone you can also
exit you script with code 99. This requeues your job when the resource
is available again.

@Krinkle You got the message while i was hacking the live jsv script, I
simply copied the runtime warning message and then changed it. This was
so easy that i save myself to disable jsv while rewriting.

Currently in total there is enough cpu and memory free for all user
scripts. SGE jobs are executed on five different servers and more server
could be added easily. The main problem is the load distribution because
many users do not use SGE which is bad on a shared system and leads to
overload on few servers. So please use cronie on host submit and
qsub/qcronsub to submit jobs to sge instead of running them on a special
server directly. Toolserver hardware is getting older and server may go
away suddenly because of problems. With sge you do not have to care
about it.

Merlissimo

P.S.: I want to thank DaB. for his engagement to get more money for
hardware on toolserver cluster next year. I also think this is really
needed especially for the database servers. You can follow the
discussion on
http://meta.wikimedia.org/wiki/Talk:Wikimedia_Deutschland/2013_annual_plan_draft/de#Toolserver.

Am 24.09.2012 18:31, schrieb Krinkle:

On Sep 24, 2012, at 6:20 PM, Platonides platoni...@gmail.com wrote:

On 24/09/12 18:07, Krinkle wrote:

Can someone decode this? What is this?

-- Krinkle

Begin forwarded message:

*From: *r...@toolserver.org mailto:r...@toolserver.org (Cron Daemon)
*Subject: **Cron krinkle@hawthorn qcronsub -N dbbot_wm -m n -j y -b
y -l h_rt=INFINITY -l virtual_free=90M $HOME/bots/dbbot-wm-start.sh*
*Date: *September 24, 2012 6:05:07 PM GMT+02:00
*To: *krin...@toolserver.org mailto:krin...@toolserver.org

warning: Please add maximum runtime by adding parameter [33m-l
arch=[0msol|lx

The text asks you to place a time limit. The parameter (embedded in
posix colors despite not being output to a terminal) to specify if it
needs a linux or solaris server.

However, if I try to execute it, I get a much saner message:
$ qcronsub -N dbbot_wm -m n -j y -b y -l h_rt=INFINITY -l
virtual_free=90M /home/krinkle/bots/dbbot-wm-start.sh

Unable to run job: Script not executable: /home/krinkle/bots/dbbot-wm-start.sh.
Exiting.
warning: Please add the os this job can run on by adding parameter -l
arch='*'|sol|lx
For more information read documentation at
https://wiki.toolserver.org/view/Job_scheduling

As this is a php script, your parameter would be «-l arch='*'»

Yes, I've added `-l arch='*'` to it already a minute ago.

Warnings are gone, not sure why it nagged about maximum runtime, it already has
INFINITY.

I'm not sure why arch=x isn't the default though, or maybe it is but outputs
the warning anyway?
A warning like that may be useful, but do consider that cronie from submit will
send e-mails for it.

-- Krinkle
___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list:
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] When to execute cron-tasks

2012-09-14 Thread Merlissimo


Am 14.09.2012 14:57, schrieb Tim Landscheidt:

DaB. wrote:

In a ideal world that would be no problem, but in real world that CAN be a
problem. Why? Because many users have the same idea and our submit-hosts fail
than with

(CRON) CAN'T FORK (child_process): Not enough space.

Last night 41 tasks were successful started at midnight, an unknown number
failed.
Of course we could just hit the problem with buying new hardware, but most
time of the day these hosts do idle.


On solaris cron fixing this problem is easy because you can change the 
queue config using /etc/cron.d/queuedefs (see man queuedefs for more info).


There you could define e.g. c.35j3n17w which means that only 35 jobs 
are started in parallel and the rest is rescheduled after 17 seconds if 
there are free slots. The standard solaris config c.100j2n60w would be 
bad, because it starts more than 41 jobs and the rest is reschuduled 
after 60 seconds when all the next cron jobs are starting, too.


Does anybody know if vixie cron (=cronie on ts) supports sth. similar? 
That would solve the problem.


btw.: This bug only exists because many people on this mailinglist did 
not like the solaris crontab format and requested to install vixie cron 
as alternative cron some years ago.


Merlissimo

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Anoter SGE question

2012-06-16 Thread Merlissimo

All jobs named subster_en were executed successfully by sge (mostly on 
wolfsbane). The return code of your python script was 0, but the runtime 
was only about a minute.

You can check it by executing e.g. qacct -j subster_en -d 10
So you should check your log files ~drtrigon/subster_en.o2160088, 
~drtrigon/subster_en.o2156743,  if something is wrong with your 
python script.


For your other questions: stderr and stdout are buffered by sge because 
they are send over network. At the toolserver configuration it is send 
to localhost by default because all execution servers have the same 
filesystems mounted. At a standard cluster configuration out/err files 
are written on submit host instead.


Merlissimo

PS:: qcronsub does not output anything if the job was submitted 
succesfully and all resources are requested correctly. So no need to 
send the output to /dev/null.


On 16.06.2012 23:33, Dr. Trigon wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello nosy!

Thanks for your reply!

It is this one:

0 1 * * * qcronsub -l h_rt=06:00:00 -l virtual_free=200M -m as -j y -b
y -N subster_en $HOME/pywikipedia/bot_control.py -subster -cron
- -lang:en/dev/null

Greetings
DrTrigon

On 15.06.2012 17:51, Marlen Caemmerer wrote:
   

Hello,

which one is it exactly?

Cheers nosy

On Fri, 15 Jun 2012, Dr. Trigon wrote:

 

Date: Fri, 15 Jun 2012 10:57:43 From: Dr. Trigon
dr.tri...@surfeu.ch  Reply-To: toolserver-l@lists.wikimedia.org
To: Toolserver-l@lists.wikimedia.org Subject: [Toolserver-l]
Anoter SGE question

   

Hello all!

I have a bunch of cronie jobs calling qcronsub for several times
with very similar settings (just the language of the wiki used
changes). In total there are 5 jobs - regurarly (about ever 2nd
day) one of those jobs does not get executed and it is always the
same one. I do not get any error mail. Any idea?

Thanks a lot and greetings DrTrigon
 




___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] How to silence qsub/qcronsub?

2012-03-11 Thread Merlissimo


Hi,
i have changed qcronsub, so that the long output about successful submit and 
output about rejection because job already queued is now suppressed. All other 
output (e.g. warning) is still shown.
Using -verbose enables all output as it was before my change. If you are using 
-terse (maybe used by some script experts) the job number is still returned.

On 08.03.2012 23:13, Platonides wrote:


BTW, why is qcronsub at /sge62/bin/sol-amd64/qcronsub ?
Wouldn't /opt/local/bin/qcronsub (like cronsub) be more appropiate?


I do not have access to other folders and using this path is easier for me to 
update scripts for different platforms.

On 08.03.2012 09:32, Simon Kågedal Reimer wrote:
 Ah, sorry, that didn't work - can't run qsub directly from cronie
 since we need to set some environment variables etc.

You can add needed environment variables by adding -v option to qsub/qcronsub 
e.g.:
qcronsub -v MYARG1=myvalue,MYARG2=myvalue script.sh
There are also other possibilities like adding variables to job context (-ac). 
More information detailed information are available on the qsub manpage (man 
qsub).

Merlissimo

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] About cron

2012-03-04 Thread Merlissimo


Platonides wrote:

On 04/03/12 17:42, محمد الجداوي wrote:

Hi there. I made a cron for operating /clean_sandbox.py/ every 6 hours,
I made a modified copy from /clean_sandbox.py/ for me and uploaded it to
my account on toolserver (outside the pywikipedia folder).
The problem i face is that i can't write the proper code, I made this code:

#!/bin/sh
#$ -j y
#$ -o /dev/null
$HOME
python clean_sandbox.py -lang:ar -family:wikipedia

But it didn't work.


In which folder?
I suspect you are getting that run in the wrong folder.
Moreover, that $HOME there seems useless.

So, let's assume it's in at /home/name/local_clean/clean_sandbox.py

First step, check manually that it works:
cd /home/name/local_clean/
python clean_sandbox.py -lang:ar -family:wikipedia

Does it run? Do you have any problems for eg. not finding the rest of
pywikipediabot?

Then, when creating the script, make it run in that folder:
#!/bin/sh
#$ -j y
#$ -o /dev/null
#$ -l h_rt=00:10:00
#$ -l virtual_free=20M
#$ -wd /home/name/local_clean/clean_sandbox.py
python clean_sandbox.py -lang:ar -family:wikipedia

I'd also recommend you to not run it with -o /dev/null the first time,
so you can see the output files if something were wrong.

(I also added there a time limit of 10 minutes to clean the sandbox, and
an arbitrary memory size of 20M, in line with Merlissimo guidelines)


-wd specifies the working _directory_ and not a file. It's needed if you use 
relative path names as your script does.
Joining the error stream into the standard output stream and writing both to 
/dev/null isn't a good idea if you are searching for an error cause.

Merlissimo

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

[Toolserver-l] Grid Engine config change

2012-03-03 Thread Merlissimo

).


Sincerely,
Merlissimo

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Is the wikidata replication out of sync?

[Toolserver-l] broken database s5-user (dewiki/wikidata) because of full disk

Re: [Toolserver-l] When is the best time of day to run programs?

Re: [Toolserver-l] Ipv6 issues

Re: [Toolserver-l] SGE queues stalled

Re: [Toolserver-l] SGE queues stalled

Re: [Toolserver-l] SGE queue waiting forever?

Re: [Toolserver-l] How to have qsub mail output?

Re: [Toolserver-l] Future of the toolserver

Re: [Toolserver-l] Future of the toolserver

Re: [Toolserver-l] qcronsub warning: Please add the os this job can run on by adding parameter -l arch='*'|sol|lx

Re: [Toolserver-l] When to execute cron-tasks

Re: [Toolserver-l] Anoter SGE question

Re: [Toolserver-l] How to silence qsub/qcronsub?

Re: [Toolserver-l] About cron

[Toolserver-l] Grid Engine config change

16 matches

Site Navigation

Mail list logo

Footer information