Re: [Toolserver-l] SGE queues stalled

2012-12-05 Thread Merlissimo
Server sql-s1-rr was unavailable during the night. So resource sql-s1-rr 
was 0.


Because i am not a ts admin i could not check that you requested this 
resource for this jobs. But just now nosy had a look and confirmed my 
suspicion. The job was started after resource sql-s1-rr was available again.


Merlissimo

Am 04.12.2012 16:44, schrieb Morten Wang:

Looks like the issue got resolved around 09:00UTC, as from the qacct output:

jobname opentasks
jobnumber 873860
[...]
qsub_time Mon Dec 3 22:19:03 2012
start_time Tue Dec 4 09:06:32 2012
end_time Tue Dec 4 09:21:18 2012

If you want to look into it more closely, this job was submitted by me
(user: nettrom) through my crontab on the submit servers.


Cheers,
Morten



___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] SGE queues stalled

2012-12-05 Thread Morten Wang
Ah, didn't think of that, of course the obvious explanation. Thanks for
looking into that!

Is there a way for me to find that out myself, e.g. using qstat?  I had a
look at the qstat man-page, but judging by the descriptions it looks like
something I'd have to fiddle around with if/when a job gets queued for a
long time at some point in the future to figure out how to do.


Regards,
Morten


On 5 December 2012 07:11, Merlissimo m...@toolserver.org wrote:

 Server sql-s1-rr was unavailable during the night. So resource sql-s1-rr
 was 0.

 Because i am not a ts admin i could not check that you requested this
 resource for this jobs. But just now nosy had a look and confirmed my
 suspicion. The job was started after resource sql-s1-rr was available again.

 Merlissimo

 Am 04.12.2012 16:44, schrieb Morten Wang:

  Looks like the issue got resolved around 09:00UTC, as from the qacct
 output:

 jobname opentasks
 jobnumber 873860
 [...]
 qsub_time Mon Dec 3 22:19:03 2012
 start_time Tue Dec 4 09:06:32 2012
 end_time Tue Dec 4 09:21:18 2012

 If you want to look into it more closely, this job was submitted by me
 (user: nettrom) through my crontab on the submit servers.


 Cheers,
 Morten


 ___
 Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
 https://lists.wikimedia.org/mailman/listinfo/toolserver-l
 Posting guidelines for this list:
 https://wiki.toolserver.org/view/Mailing_list_etiquette

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] SGE queues stalled

2012-12-05 Thread Merlissimo

Am 05.12.2012 16:21, schrieb Morten Wang:


Is there a way for me to find that out myself, e.g. using qstat?  I had a
look at the qstat man-page, but judging by the descriptions it looks like
something I'd have to fiddle around with if/when a job gets queued for a
long time at some point in the future to figure out how to do.


qstat -j jobnumber

lists a scheduling info section.

Example:
qstat -j 799111

scheduling info:

queue instance short-...@ortelius.toolserver.org dropped because it is 
overloaded: np_load_short=1.252930 (= 1.252930 + 0.8 * 0.00 with 
nproc=4) = 1.1
queue instance longrun-...@willow.toolserver.org dropped because it is 
overloaded: np_load_short=2.528320 (= 2.528320 + 0.8 * 0.00 with 
nproc=8) = 2.0
queue instance medium-...@ortelius.toolserver.org dropped because it 
is overloaded: np_load_short=1.252930 (= 1.252930 + 0.8 * 0.00 with 
nproc=4) = 0.8
queue instance longrun2-...@clematis.toolserver.org dropped because it 
is disabled
queue instance longrun2-...@hawthorn.toolserver.org dropped because it 
is disabled
(-l 
h_rt=57600,mem_free=890M,sql=1,sql-s7-rr=3,sqlprocs-s7=3,tmp_free=20M,user_slot=2,virtual_free=890M) 
cannot run globally because it offers only gc:sql-s7-rr=0.00


As you can see the job cannot run on clematis and hawthorn, because 
these queues are disabled. queues on willow and ortelius have temporary 
high load. wolfsbane, nightshade and yarrow are missing in this list so 
the bot could start on these servers. But the last line cannot run 
globally because it offers only gc:sql-s7-rr=0.00 shows that 
resource sql-s7-rr is not available on any server at the moment. That's 
why the job is queued until s7 database is usable again.


Merlissimo

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] SGE queues stalled

2012-12-04 Thread Marlen Caemmerer

Hello,

I cant really see any trouble with SGE.
Can you please tell me which user runs which command on which host so I can 
have a closer look?

Regards
nosy

On Mon, 3 Dec 2012, Morten Wang wrote:


Date: Tue, 4 Dec 2012 05:50:36
From: Morten Wang nett...@gmail.com
Reply-To: Wikimedia Toolserver toolserver-l@lists.wikimedia.org
To: Wikimedia Toolserver toolserver-l@lists.wikimedia.org
Subject: [Toolserver-l] SGE queues stalled

I've noticed that one of SuggestBot's hourly jobs has stalled for the past
7 hours, stuck in the qw state. Usually it runs like clockwork. Is there
a problem with the SGE queues?


Regards,
Morten




___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] SGE queues stalled

2012-12-04 Thread Morten Wang
Looks like the issue got resolved around 09:00UTC, as from the qacct output:

jobname opentasks
jobnumber 873860
[...]
qsub_time Mon Dec 3 22:19:03 2012
start_time Tue Dec 4 09:06:32 2012
end_time Tue Dec 4 09:21:18 2012

If you want to look into it more closely, this job was submitted by me
(user: nettrom) through my crontab on the submit servers.


Cheers,
Morten



On 4 December 2012 03:39, Marlen Caemmerer marlen.caemme...@wikimedia.dewrote:

 Hello,

 I cant really see any trouble with SGE.
 Can you please tell me which user runs which command on which host so I
 can have a closer look?

 Regards
 nosy

 On Mon, 3 Dec 2012, Morten Wang wrote:

  Date: Tue, 4 Dec 2012 05:50:36
 From: Morten Wang nett...@gmail.com
 Reply-To: Wikimedia Toolserver 
 toolserver-l@lists.wikimedia.**orgtoolserver-l@lists.wikimedia.org
 
 To: Wikimedia Toolserver 
 toolserver-l@lists.wikimedia.**orgtoolserver-l@lists.wikimedia.org
 
 Subject: [Toolserver-l] SGE queues stalled


 I've noticed that one of SuggestBot's hourly jobs has stalled for the past
 7 hours, stuck in the qw state. Usually it runs like clockwork. Is there
 a problem with the SGE queues?


 Regards,
 Morten



 __**_
 Toolserver-l mailing list 
 (Toolserver-l@lists.wikimedia.**orgToolserver-l@lists.wikimedia.org
 )
 https://lists.wikimedia.org/**mailman/listinfo/toolserver-lhttps://lists.wikimedia.org/mailman/listinfo/toolserver-l
 Posting guidelines for this list: https://wiki.toolserver.org/**
 view/Mailing_list_etiquettehttps://wiki.toolserver.org/view/Mailing_list_etiquette

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

[Toolserver-l] SGE queues stalled

2012-12-03 Thread Morten Wang
I've noticed that one of SuggestBot's hourly jobs has stalled for the past
7 hours, stuck in the qw state. Usually it runs like clockwork. Is there
a problem with the SGE queues?


Regards,
Morten
___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette