Re: [Toolserver-l] New Rule: SGE-constraint for bots
On 11/01/13 23:01, DaB. wrote: c.) if you start a bot by hand for testing (no screen, no cron, no while). Does this forrbid to run a bot by hand with an attached screen? Ie. you have run screen because you are afraid your local computer/connection could reset, not to ‘run forget’. You would be attached eg. 90% of time, and pay (some) attention at the screen output... ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
[Toolserver-l] Postmortem: Partial Toolserver-outage
Hello all, great parts of the toolserver-cluster were down or very slow in the last few hours. AFAIS it was a problem with the user-store or rosemary (where the user- store is physically connected). I rebooted rosemary, but the reboot showed problems with its IPv6-address. I tried to fix that what caused several other reboots. Rosemary is now up and running but the user-store is not available (looks like Nosy just mounted it without updating the fstab-file). So I was forced to remove the user-store everywhere (beside on willow because it need a reboot to do that and a reboot is scheduled already later for today). I will try if I can find the partition for user-store and mount it but I have not much hope (there are way to many devices to try) – just to be clear: There is no data lost. Also away will be munin, because its data is also mounted on that host. I fear that we have to wait for Nosy to recover before we get the user-store back. tl;dr: TS had problems, user-store is away. Sincerely, DaB. -- Userpage: [[:w:de:User:DaB.]] — PGP: 0x2d3ee2d42b255885 signature.asc Description: This is a digitally signed message part. ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] Postmortem: Partial Toolserver-outage
(anonymous) wrote: great parts of the toolserver-cluster were down or very slow in the last few hours. AFAIS it was a problem with the user-store or rosemary (where the user- store is physically connected). I rebooted rosemary, but the reboot showed problems with its IPv6-address. I tried to fix that what caused several other reboots. Rosemary is now up and running but the user-store is not available (looks like Nosy just mounted it without updating the fstab-file). So I was forced to remove the user-store everywhere (beside on willow because it need a reboot to do that and a reboot is scheduled already later for today). I will try if I can find the partition for user-store and mount it but I have not much hope (there are way to many devices to try) – just to be clear: There is no data lost. Also away will be munin, because its data is also mounted on that host. I fear that we have to wait for Nosy to recover before we get the user-store back. [...] Couldn't the search not be automated à la (untested): | mkdir t | for DEVICE in part1 part2; do | mount -o ro $DEVICE t || continue | [ -e t/sge ] echo Found ./sge on $DEVICE | umount t | done (with sge being just the first example of a directory be- neath /mnt/user-store that I can remember). Tim ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] [Toolserver-announce] Reboot of willow Monday
Hello all, the reboot worked without a problem, please take a look if everything of you runs like normal. Please notice again that all bots have to run by SGE now (for details see [1]). Sincerely, DaB. [1] http://lists.wikimedia.org/pipermail/toolserver-announce/2013- January/000557.html Userpage: [[:w:de:User:DaB.]] — PGP: 0x2d3ee2d42b255885 signature.asc Description: This is a digitally signed message part. ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Re: [Toolserver-l] Postmortem: Partial Toolserver-outage
DaB. w...@daniel.baur4.info wrote: Couldn't the search not be automated à la (untested): several important partitions (for example database-partitions) live on this SAN. I have no idea what happens if a partition is mounted on 2 hosts, so I like to avoid blind mounting. On Linux and ext3/ext4, you should be pretty safe with -o ro,noload, but given the risk compared to the possible gain, I think waiting for nosy's (family's) convalescence is more prudent :-). Tim ___ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette