Re: [Toolserver-l] New Rule: SGE-constraint for bots

2013-02-11 Thread Platonides
On 11/01/13 23:01, DaB. wrote:
 c.) if you start a bot by hand for testing 
 (no screen, no cron, no while).
Does this forrbid to run a bot by hand with an attached screen?
Ie. you have run screen because you are afraid your local
computer/connection could reset, not to ‘run  forget’. You would be
attached eg. 90% of time, and pay (some) attention at the screen output...

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

[Toolserver-l] Postmortem: Partial Toolserver-outage

2013-02-11 Thread DaB.
Hello all,

great parts of the toolserver-cluster were down or very slow in the last few 
hours. AFAIS it was a problem with the user-store or rosemary (where the user-
store is physically connected). I rebooted rosemary, but the reboot showed 
problems with its IPv6-address. I tried to fix that what caused several other 
reboots. Rosemary is now up and running but the user-store is not available 
(looks like Nosy just mounted it without updating the fstab-file). So I was 
forced to remove the user-store everywhere (beside on willow because it need a 
reboot to do that and a reboot is scheduled already later for today).
I will try if I can find the partition for user-store and mount it but I have 
not much hope (there are way to many devices to try) – just to be clear: There 
is no data lost. Also away will be munin, because its data is also mounted on 
that host. I fear that we have to wait for Nosy to recover before we get the 
user-store back.

tl;dr: TS had problems, user-store is away.

Sincerely,
DaB.


-- 
Userpage: [[:w:de:User:DaB.]] — PGP: 0x2d3ee2d42b255885


signature.asc
Description: This is a digitally signed message part.
___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Postmortem: Partial Toolserver-outage

2013-02-11 Thread Tim Landscheidt
(anonymous) wrote:

 great parts of the toolserver-cluster were down or very slow in the last few
 hours. AFAIS it was a problem with the user-store or rosemary (where the user-
 store is physically connected). I rebooted rosemary, but the reboot showed
 problems with its IPv6-address. I tried to fix that what caused several other
 reboots. Rosemary is now up and running but the user-store is not available
 (looks like Nosy just mounted it without updating the fstab-file). So I was
 forced to remove the user-store everywhere (beside on willow because it need a
 reboot to do that and a reboot is scheduled already later for today).
 I will try if I can find the partition for user-store and mount it but I have
 not much hope (there are way to many devices to try) – just to be clear: There
 is no data lost. Also away will be munin, because its data is also mounted on
 that host. I fear that we have to wait for Nosy to recover before we get the
 user-store back.

 [...]

Couldn't the search not be automated à la (untested):

| mkdir t
| for DEVICE in part1 part2; do
|   mount -o ro $DEVICE t || continue
|   [ -e t/sge ]  echo Found ./sge on $DEVICE
|   umount t
| done

(with sge being just the first example of a directory be-
neath /mnt/user-store that I can remember).

Tim


___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] [Toolserver-announce] Reboot of willow Monday

2013-02-11 Thread DaB.
Hello all,

the reboot worked without a problem, please take a look if everything of you 
runs like normal.
Please notice again that all bots have to run by SGE now (for details see 
[1]).

Sincerely,
DaB.

[1] http://lists.wikimedia.org/pipermail/toolserver-announce/2013-
January/000557.html

Userpage: [[:w:de:User:DaB.]] — PGP: 0x2d3ee2d42b255885


signature.asc
Description: This is a digitally signed message part.
___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Postmortem: Partial Toolserver-outage

2013-02-11 Thread Tim Landscheidt
DaB. w...@daniel.baur4.info wrote:

 Couldn't the search not be automated à la (untested):

 several important partitions (for example database-partitions) live on this
 SAN. I have no idea what happens if a partition is mounted on 2 hosts, so I
 like to avoid blind mounting.

On Linux and ext3/ext4, you should be pretty safe with -o
ro,noload, but given the risk compared to the possible
gain, I think waiting for nosy's (family's) convalescence is
more prudent :-).

Tim


___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette