Re: [Toolserver-l] Maintenance: Rebooting ortelius web server

2013-05-09 Thread Tim Landscheidt
Marlen Caemmerer marlen.caemme...@wikimedia.de wrote:

 I would like to reboot ortelius, one of the web servers at

 tomorrow, Tuesday 1830 UTC

Apparently, wolfsbane rebooted today as well:

| timl@wolfsbane:~$ uptime
|  16:49pm  up   5:00,  2 users,  load average: 1.16, 1.24, 1.47
| timl@wolfsbane:~$

Perhaps related to that, SGE queues on ortelius and wolfs-
bane are in state au (alarm, unknown):

| timl@wolfsbane:~$ qstat -f -explain a | sed -ne '1,2p' -e 
'/ortelius\|wolfsbane/,/^-/p'
| queuename  qtype resv/used/tot. load_avg arch  
states
| 
-
| short-sol@ortelius.toolserver. B 0/0/8  -NA- sol-amd64 au
| error: no value for np_load_short because execd is in unknown state
| error: no value for np_load_avg because execd is in unknown state
| error: no value for cpu because execd is in unknown state
| error: no value for mem_free because execd is in unknown state
| alarm gf:tmp_free=100G load-threshold=200M
| alarm gf:available=1 load-threshold=0
| 
-
| short-sol@wolfsbane.toolserver B 0/10/12-NA- sol-amd64 au
| error: no value for np_load_short because execd is in unknown state
| error: no value for np_load_avg because execd is in unknown state
| error: no value for cpu because execd is in unknown state
| error: no value for mem_free because execd is in unknown state
| alarm gf:tmp_free=100G load-threshold=200M
| alarm gf:available=1 load-threshold=0
| 
-
| medium-sol@ortelius.toolserver B 0/0/4  -NA- sol-amd64 au
| error: no value for np_load_short because execd is in unknown state
| error: no value for np_load_avg because execd is in unknown state
| error: no value for np_load_long because execd is in unknown state
| error: no value for cpu because execd is in unknown state
| error: no value for mem_free because execd is in unknown state
| alarm gf:tmp_free=100G load-threshold=100M
| alarm gf:available=1 load-threshold=0
| 
-
| medium-sol@wolfsbane.toolserve B 0/3/4  -NA- sol-amd64 au
| error: no value for np_load_short because execd is in unknown state
| error: no value for np_load_avg because execd is in unknown state
| error: no value for np_load_long because execd is in unknown state
| error: no value for cpu because execd is in unknown state
| error: no value for mem_free because execd is in unknown state
| alarm gf:tmp_free=100G load-threshold=100M
| alarm gf:available=1 load-threshold=0
| 
-
| timl@wolfsbane:~$

Tim


___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Maintenance: Rebooting ortelius web server

2013-05-09 Thread Platonides
On 09/05/13 18:57, Tim Landscheidt wrote:
 Marlen Caemmerer marlen.caemme...@wikimedia.de wrote:
 
 I would like to reboot ortelius, one of the web servers at
 
 tomorrow, Tuesday 1830 UTC
 
 Apparently, wolfsbane rebooted today as well:
 
 | timl@wolfsbane:~$ uptime
 |  16:49pm  up   5:00,  2 users,  load average: 1.16, 1.24, 1.47
 | timl@wolfsbane:~$
 
 Perhaps related to that, SGE queues on ortelius and wolfs-
 bane are in state au (alarm, unknown):

Yes, sge_execd seems not to be running on them.

Plus medium and longrun queues in yarrow are in error state. I tried
cleaning them, but they failed again.

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

[Toolserver-l] Maintenance: Rebooting ortelius web server

2013-05-06 Thread Marlen Caemmerer

Hey,

I would like to reboot ortelius, one of the web servers at

tomorrow, Tuesday 1830 UTC

Cheers
Marlen/nosy


___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette