Re: [Toolserver-l] Anoter SGE question

2012-07-15 Thread Dr. Trigon
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello all!

Here some final update;

- - one one hand the situation is better now than 2 weeks ago
  (I don't know whether this could be related to the SGE
  update/maintenance)
- - on the other hand I was not able to find why this issues
  came up (and this is bad)

But after all at the moment it works and I just wanted to
thank you all involved here for your help and work!!

Greetings
DrTrigon

On 15.06.2012 10:57, Dr. Trigon wrote:
> Hello all!
> 
> I have a bunch of cronie jobs calling qcronsub for several times 
> with very similar settings (just the language of the wiki used 
> changes). In total there are 5 jobs - regurarly (about ever 2nd 
> day) one of those jobs does not get executed and it is always the 
> same one. I do not get any error mail. Any idea?
> 
> Thanks a lot and greetings DrTrigon
> 
> ___ Toolserver-l
> mailing list (Toolserver-l@lists.wikimedia.org) 
> https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting
> guidelines for this list:
> https://wiki.toolserver.org/view/Mailing_list_etiquette
> 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlAChg0ACgkQAXWvBxzBrDCTigCfc4A/xhxycS72CcpJejQ3U57T
zW0AnAvECzclJPq0LieyFUpDV7aMDm14
=umeG
-END PGP SIGNATURE-

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] Anoter SGE question

2012-06-24 Thread Dr. Trigon
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

So I spent some hours to investigate this on my side and summarized
everything in:

https://jira.toolserver.org/browse/TS-1402#comment-21415

As it looks to me that this issues is getting worse I am really
greateful for every hint in any direction (ok any useful one ;)...!

Thanks a lot and greetings
DrTrigon


On 20.06.2012 02:04, Russell Blau wrote:
> On Sun, Jun 17, 2012, at 02:29 PM, Dr. Trigon wrote:
>> 
>> So why are some of my cronie-jobs or qcronsub calls (typically 1
>> per day) silently dropped?
>> 
> I have been having similar experiences lately, and opened a JIRA
> bug [1] to report it.  So far, after more than a week, there have
> been no other comments on the bug.  If other users are also having
> problems with cron jobs not running, perhaps you could add your
> reports to this bug and maybe this information will help the admins
> to diagnose the problem.
> 
> [1] https://jira.toolserver.org/browse/TS-1402
> 
> 
> 
> ___ Toolserver-l
> mailing list (Toolserver-l@lists.wikimedia.org) 
> https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting
> guidelines for this list:
> https://wiki.toolserver.org/view/Mailing_list_etiquette
> 


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk/nkP0ACgkQAXWvBxzBrDAjvQCdEcakmjhFO3pk3kXIPUlN0so6
oVkAoJZWQZ1vlgHlp02j+7s4CcRK0jCz
=oyDF
-END PGP SIGNATURE-

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] Anoter SGE question

2012-06-19 Thread Russell Blau
On Sun, Jun 17, 2012, at 02:29 PM, Dr. Trigon wrote:
> 
> So why are some of my cronie-jobs or qcronsub calls (typically 1 per
> day) silently dropped?
> 
I have been having similar experiences lately, and opened a JIRA bug [1]
to report it.  So far, after more than a week, there have been no other
comments on the bug.  If other users are also having problems with cron
jobs not running, perhaps you could add your reports to this bug and
maybe this information will help the admins to diagnose the problem.

[1] https://jira.toolserver.org/browse/TS-1402

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Anoter SGE question

2012-06-19 Thread Dr. Trigon
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Today 2 jobs did not execute! Does this mean it is getting worse?
Is the toolserver about to die?

Greetings
DrTrigon


On 15.06.2012 10:57, Dr. Trigon wrote:
> Hello all!
> 
> I have a bunch of cronie jobs calling qcronsub for several times 
> with very similar settings (just the language of the wiki used 
> changes). In total there are 5 jobs - regurarly (about ever 2nd 
> day) one of those jobs does not get executed and it is always the 
> same one. I do not get any error mail. Any idea?
> 
> Thanks a lot and greetings DrTrigon
> 
> ___ Toolserver-l
> mailing list (Toolserver-l@lists.wikimedia.org) 
> https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting
> guidelines for this list:
> https://wiki.toolserver.org/view/Mailing_list_etiquette
> 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk/gqrUACgkQAXWvBxzBrDA+EwCfT8lYR5xv8AG4U0op8FXjlKy1
o3QAoLl/dFRMwVMDIiQLm7IQI1C8IwEv
=QnGe
-END PGP SIGNATURE-

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] Anoter SGE question

2012-06-17 Thread Dr. Trigon
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

> For your other questions: stderr and stdout are buffered by sge
> because they are send over network. At the toolserver configuration
> it is send to localhost by default because all execution servers
> have the same filesystems mounted. At a standard cluster
> configuration out/err files are written on submit host instead.

So would the hint given by Platonides before (btw.: thanks a lot for
this!) be of any help?

On 12.06.2012 22:48, Platonides wrote:
> That looks like line buffering in stdio. You can try prepending the
> python command with: stderr -e0
> 
> (despite the fact that stderr should be unbuffered by default...)
> 
> I'm unsure if it's being buffered at python or if SGE is doing
> caching there, thoguh. It _should_ be simply passing the file
> descriptor but, who knows?

be of any help? Either to get SGE to behave like python? Or vice-versa?

> PS:: qcronsub does not output anything if the job was submitted 
> succesfully and all resources are requested correctly. So no need
> to send the output to /dev/null.

;)) thanks for the hint; I cannot rember but there HAS TO BE a reson why
I finally decided to add '> /dev/null' ... might be just 1 line droped
by SGE or something like that... will check that sometime!! tks! :)

Greetings and thanks for all the hints!
DrTrigon
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk/d1DUACgkQAXWvBxzBrDA1cQCffjfkx3YY5vglnLHcvXVBXVyK
dBUAnRcEP4pZbgpdBSFwaPIJBFGLUu2r
=xpXm
-END PGP SIGNATURE-

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] Anoter SGE question

2012-06-17 Thread Dr. Trigon
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

> Is it by chance your last cron entry? Remember that there must be a
> newline finalising your crontab or the last command won't get
> executed.
> 
> You should get this output: $ crontab -l | tail -c1 | od -c 000
> \n 001

No it is not - but thanks a lot for this hint! I also checked:

drtrigon@clematis:~$ cronie -l | tail -c1 | od -c
000  \n
001

...looks ok to me. Below is the full output of 'cronie -l' for the sake
of completeness:

drtrigon@clematis:~$ cronie -l
# DO NOT EDIT THIS FILE - edit the master and reinstall.
# (/tmp/crontab.ji5t0i/crontab installed on Wed Dec 23 11:02:53 2009)
# (Cron version -- $Id: crontab.c,v 2.13 1994/01/17 03:20:37 vixie Exp $)
# m h  dom mon dow   command

#0 2 * * * cronsub -sl mainbot $HOME/pywikipedia/bot_control.py
- -default -cron
0 2 * * * qcronsub -l h_rt=12:00:00 -l virtual_free=500M -m as -j y -b
y -N mainbot $HOME/pywikipedia/bot_control.py -default -cron >/dev/null
#0 0 */14 * * cronsub -s compbot $HOME/pywikipedia/bot_control.py
- -compress_history:[] -cron
0 0 */14 * * qcronsub -l h_rt=02:00:00 -l virtual_free=100M -m as -j y
- -b y -N compbot $HOME/pywikipedia/bot_control.py -compress_history:[]
- -cron >/dev/null
##0 6 * * * cronsub -s substerbot $HOME/pywikipedia/subster_beta.py
2>> $HOME/public_html/DrTrigonBot/subster.html
#0 0 * * * cronsub -sl ircbot $HOME/pywikipedia/bot_control.py
- -subster_irc -cron
0 0 * * * qcronsub -l h_rt=INFINITY -l virtual_free=200M -m as -j y -b
y -N ircbot $HOME/pywikipedia/bot_control.py -subster_irc -cron >/dev/null
#30 0 * * * cronsub -s subster_frr $HOME/pywikipedia/bot_control.py
- -subster -cron -lang:frr
30 0 * * * qcronsub -l h_rt=06:00:00 -l virtual_free=200M -m as -j y
- -b y -N subster_frr $HOME/pywikipedia/bot_control.py -subster -cron
- -lang:frr >/dev/null
#0 1 * * * cronsub -s subster_en $HOME/pywikipedia/bot_control.py
- -subster -cron -lang:en
0 1 * * * qcronsub -l h_rt=06:00:00 -l virtual_free=200M -m as -j y -b
y -N subster_en $HOME/pywikipedia/bot_control.py -subster -cron
- -lang:en >/dev/null
30 1 * * * cronsub -s subster_nl $HOME/pywikipedia/bot_control.py
- -subster -cron -lang:nl
##30 1 * * * cronsub -s subster_ar $HOME/pywikipedia/bot_control.py
- -subster -cron -lang:ar
#0 * * * * cronsub -s subster_ar $HOME/pywikipedia/bot_control.py
- -subster -cron -lang:ar
0 * * * * qcronsub -l h_rt=02:00:00 -l virtual_free=200M -m as -j y -b
y -N subster_ar $HOME/pywikipedia/bot_control.py -subster -cron
- -lang:ar >/dev/null
0 0 * * * qcronsub -l h_rt=06:00:00 -l virtual_free=200M -m as -j y -b
y -N subster_meta $HOME/pywikipedia/bot_control.py -subster -cron
- -family:meta -lang: >/dev/null

#0 0 * * * cronsub -s maintenance $HOME/warnuserquota.py
0 0 * * * qcronsub -l h_rt=00:05:00 -l virtual_free=50M -m as -j y -b
y -N maintenance $HOME/warnuserquota.py >/dev/null

Greetings
DrTrigon
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk/dzqQACgkQAXWvBxzBrDDNlACfYzTaM9rh5+U207lvwwCd6Ggv
p64AoN1bncFqd40LsgP+wY9jFhyJo5+b
=TKz7
-END PGP SIGNATURE-

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] Anoter SGE question

2012-06-17 Thread Dr. Trigon
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 17.06.2012 00:07, Merlissimo wrote:
> All jobs named "subster_en" were executed successfully by sge
> (mostly on wolfsbane). The return code of your python script was 0,
> but the runtime was only about a minute. You can check it by
> executing e.g. "qacct -j subster_en -d 10" So you should check your
> log files ~drtrigon/subster_en.o2160088, 
> ~drtrigon/subster_en.o2156743,  if something is wrong with
> your python script.

Thanks for the hint - I bet I would have forgot to write the 'q' in
'qacct'... ;)

So as you adviced me I entered:

qacct -j subster_en -d 10

and was supprised to find I was not always the same (this) job that
did not execute. BUT when looking more closely I found e.g.:

* subster_en was NOT executed on: Fri Jun 15
* mainbot was NOT executed on: Son Jun 17

(I got the wrong impression to be always the same because it was
always/mostly 1 script that misses... The ones in between those
dates are harder to say, because I manually started some of them...)

And check all logs you mentioned additionally - but did not found
something related.

So why are some of my cronie-jobs or qcronsub calls (typically 1 per
day) silently dropped?

Thanks and greetings
DrTrigon
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk/dzZMACgkQAXWvBxzBrDD5fQCgjCOll5dN0UxvaPcFh/ck87ND
US0An1csbTsDH4FwUQK8sKyezAVgPQy+
=+Lg+
-END PGP SIGNATURE-

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] Anoter SGE question

2012-06-16 Thread Platonides
On 16/06/12 23:33, Dr. Trigon wrote:
>> I have a bunch of cronie jobs calling qcronsub for several times
>>  with very similar settings (just the language of the wiki used 
>> changes). In total there are 5 jobs - regurarly (about ever 2nd 
>> day) one of those jobs does not get executed and it is always the
>>  same one. I do not get any error mail. Any idea?

Is it by chance your last cron entry?
Remember that there must be a newline finalising your crontab or the
last command won't get executed.

You should get this output:
 $ crontab -l | tail -c1 | od -c
000  \n
001

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] Anoter SGE question

2012-06-16 Thread Merlissimo
All jobs named "subster_en" were executed successfully by sge (mostly on 
wolfsbane). The return code of your python script was 0, but the runtime 
was only about a minute.

You can check it by executing e.g. "qacct -j subster_en -d 10"
So you should check your log files ~drtrigon/subster_en.o2160088, 
~drtrigon/subster_en.o2156743,  if something is wrong with your 
python script.


For your other questions: stderr and stdout are buffered by sge because 
they are send over network. At the toolserver configuration it is send 
to localhost by default because all execution servers have the same 
filesystems mounted. At a standard cluster configuration out/err files 
are written on submit host instead.


Merlissimo

PS:: qcronsub does not output anything if the job was submitted 
succesfully and all resources are requested correctly. So no need to 
send the output to /dev/null.


On 16.06.2012 23:33, Dr. Trigon wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello nosy!

Thanks for your reply!

It is this one:

0 1 * * * qcronsub -l h_rt=06:00:00 -l virtual_free=200M -m as -j y -b
y -N subster_en $HOME/pywikipedia/bot_control.py -subster -cron
- -lang:en>/dev/null

Greetings
DrTrigon

On 15.06.2012 17:51, Marlen Caemmerer wrote:
   

Hello,

which one is it exactly?

Cheers nosy

On Fri, 15 Jun 2012, Dr. Trigon wrote:

 

Date: Fri, 15 Jun 2012 10:57:43 From: Dr. Trigon
  Reply-To: toolserver-l@lists.wikimedia.org
To: Toolserver-l@lists.wikimedia.org Subject: [Toolserver-l]
Anoter SGE question

   

Hello all!

I have a bunch of cronie jobs calling qcronsub for several times
with very similar settings (just the language of the wiki used
changes). In total there are 5 jobs - regurarly (about ever 2nd
day) one of those jobs does not get executed and it is always the
same one. I do not get any error mail. Any idea?

Thanks a lot and greetings DrTrigon
 




___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] Anoter SGE question

2012-06-16 Thread Dr. Trigon
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello nosy!

Thanks for your reply!

It is this one:

0 1 * * * qcronsub -l h_rt=06:00:00 -l virtual_free=200M -m as -j y -b
y -N subster_en $HOME/pywikipedia/bot_control.py -subster -cron
- -lang:en >/dev/null

Greetings
DrTrigon

On 15.06.2012 17:51, Marlen Caemmerer wrote:
> Hello,
> 
> which one is it exactly?
> 
> Cheers nosy
> 
> On Fri, 15 Jun 2012, Dr. Trigon wrote:
> 
>> Date: Fri, 15 Jun 2012 10:57:43 From: Dr. Trigon
>>  Reply-To: toolserver-l@lists.wikimedia.org 
>> To: Toolserver-l@lists.wikimedia.org Subject: [Toolserver-l]
>> Anoter SGE question
>> 
> Hello all!
> 
> I have a bunch of cronie jobs calling qcronsub for several times 
> with very similar settings (just the language of the wiki used 
> changes). In total there are 5 jobs - regurarly (about ever 2nd 
> day) one of those jobs does not get executed and it is always the 
> same one. I do not get any error mail. Any idea?
> 
> Thanks a lot and greetings DrTrigon
>> 
>> ___ Toolserver-l
>> mailing list (Toolserver-l@lists.wikimedia.org) 
>> https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting
>> guidelines for this list: 
>> https://wiki.toolserver.org/view/Mailing_list_etiquette
>> 
> 
> 
> ___ Toolserver-l
> mailing list (Toolserver-l@lists.wikimedia.org) 
> https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting
> guidelines for this list: 
> https://wiki.toolserver.org/view/Mailing_list_etiquette
> 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk/c+6YACgkQAXWvBxzBrDBZsQCfbnzJDBuAdzannKaKuOd/ahMp
2vwAn0puU1CqkpxyDaEVExbY+r1OA6KP
=Rxma
-END PGP SIGNATURE-

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] Anoter SGE question

2012-06-15 Thread Marlen Caemmerer

Hello,

which one is it exactly?

Cheers
nosy

On Fri, 15 Jun 2012, Dr. Trigon wrote:


Date: Fri, 15 Jun 2012 10:57:43
From: Dr. Trigon 
Reply-To: toolserver-l@lists.wikimedia.org
To: Toolserver-l@lists.wikimedia.org
Subject: [Toolserver-l] Anoter SGE question

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello all!

I have a bunch of cronie jobs calling qcronsub for several times
with very similar settings (just the language of the wiki used
changes). In total there are 5 jobs - regurarly (about ever 2nd
day) one of those jobs does not get executed and it is always the
same one. I do not get any error mail. Any idea?

Thanks a lot and greetings
DrTrigon
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk/a+QYACgkQAXWvBxzBrDD5VQCgox3+fPvOxE1CLry5pdA7AMx8
bDQAnjfAsdLAcykRA5j8lyicyVdk8xYC
=UeJJ
-END PGP SIGNATURE-

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette




___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


[Toolserver-l] Anoter SGE question

2012-06-15 Thread Dr. Trigon
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello all!

I have a bunch of cronie jobs calling qcronsub for several times
with very similar settings (just the language of the wiki used
changes). In total there are 5 jobs - regurarly (about ever 2nd
day) one of those jobs does not get executed and it is always the
same one. I do not get any error mail. Any idea?

Thanks a lot and greetings
DrTrigon
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk/a+QYACgkQAXWvBxzBrDD5VQCgox3+fPvOxE1CLry5pdA7AMx8
bDQAnjfAsdLAcykRA5j8lyicyVdk8xYC
=UeJJ
-END PGP SIGNATURE-

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette