[Cloud] [Cloud-announce] Toolforge: need to stop webservice and start it again following changes to labels

2021-10-13 Thread Brooke Storm
n a deployment retroactively. I apologize for any inconvenience. In summary: If you haven’t run a webservice stop since 2021-09-29 on your Kubernetes web service, it would be a good idea to stop and start your webservice now to prevent any confusing behavior from webservice in the future. -- Brooke S

[Cloud-announce] Toolforge: need to stop webservice and start it again following changes to labels

2021-10-13 Thread Brooke Storm
n a deployment retroactively. I apologize for any inconvenience. In summary: If you haven’t run a webservice stop since 2021-09-29 on your Kubernetes web service, it would be a good idea to stop and start your webservice now to prevent any confusing behavior from webservice in the future. -- Brooke S

[Cloud] [Cloud-announce] Rebooting login.toolforge.org in 10 minutes

2021-07-28 Thread Brooke Storm
Since there seems to be some error with sssd (LDAP and name services daemon) on the main Toolforge bastion, I am going to reboot it at 21:33 UTC today. Sorry for the inconvenience. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org

[Cloud-announce] Rebooting login.toolforge.org in 10 minutes

2021-07-28 Thread Brooke Storm
Since there seems to be some error with sssd (LDAP and name services daemon) on the main Toolforge bastion, I am going to reboot it at 21:33 UTC today. Sorry for the inconvenience. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org

[Cloud] [Cloud-announce] 2021-07-26@1530 UTC Toolforge Kubernetes Upgrade

2021-07-23 Thread Brooke Storm
Tools admins will be upgrading Toolforge Kubernetes to version 1.19 on Monday July 26th at 1530UTC to catch up to the upstream release cycle. This should be mostly invisible to end users with the occasional pod restarting. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org

[Cloud-announce] 2021-07-26@1530 UTC Toolforge Kubernetes Upgrade

2021-07-23 Thread Brooke Storm
Tools admins will be upgrading Toolforge Kubernetes to version 1.19 on Monday July 26th at 1530UTC to catch up to the upstream release cycle. This should be mostly invisible to end users with the occasional pod restarting. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org

[Cloud] [Cloud-announce] 2021-07-21@1500 UTC PAWS Kubernetes upgrade

2021-07-20 Thread Brooke Storm
user impact at tall, so that should also be quiet and require no user action. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org ___ Cloud-announce mailing list -- cloud-annou...@lists.wikimedia.org List information: https

[Cloud-announce] 2021-07-21@1500 UTC PAWS Kubernetes upgrade

2021-07-20 Thread Brooke Storm
user impact at tall, so that should also be quiet and require no user action. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org ___ Cloud-announce mailing list -- cloud-announce@lists.wikimedia.org List information: https

[Cloud] [Cloud-announce] 2021-07-20@1500 UTC Maps and Scratch NFS briefly unavailable

2021-07-16 Thread Brooke Storm
econds, not minutes). Therefore, we will not be failing them over. WMCS will keep an eye on the impact to client VMs and will remediate problems where necessary. If all goes well, most services won’t notice. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimed

[Cloud-announce] 2021-07-20@1500 UTC Maps and Scratch NFS briefly unavailable

2021-07-16 Thread Brooke Storm
econds, not minutes). Therefore, we will not be failing them over. WMCS will keep an eye on the impact to client VMs and will remediate problems where necessary. If all goes well, most services won’t notice. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimed

[Cloud] [Cloud-announce] Re: Webservice release changes coming this week

2021-07-01 Thread Brooke Storm
These changes have been released. Please report issues in the #wikimedia-cloud channel on libre.chat irc. Thanks! > On Jun 29, 2021, at 4:25 PM, Brooke Storm wrote: > > Hello cloud users, > We will be releasing webservice version 0.75 this week to Toolforge. Most of > the

[Cloud-announce] Re: Webservice release changes coming this week

2021-07-01 Thread Brooke Storm
These changes have been released. Please report issues in the #wikimedia-cloud channel on libre.chat irc. Thanks! > On Jun 29, 2021, at 4:25 PM, Brooke Storm wrote: > > Hello cloud users, > We will be releasing webservice version 0.75 this week to Toolforge. Most of > the

[Cloud] [Cloud-announce] Re: 2021-07-01 scratch and maps NFS maintenance

2021-07-01 Thread Brooke Storm
The user-facing portion of this should be complete now. > On Jul 1, 2021, at 9:07 AM, Brooke Storm wrote: > > This is starting. > > Brooke Storm > Staff SRE > Wikimedia Cloud Services > bst...@wikimedia.org <mailto:bst...@wikimedia.org> > > > >&

[Cloud-announce] Re: 2021-07-01 scratch and maps NFS maintenance

2021-07-01 Thread Brooke Storm
The user-facing portion of this should be complete now. > On Jul 1, 2021, at 9:07 AM, Brooke Storm wrote: > > This is starting. > > Brooke Storm > Staff SRE > Wikimedia Cloud Services > bst...@wikimedia.org <mailto:bst...@wikimedia.org> > > > >&

[Cloud] [Cloud-announce] Re: 2021-07-01 scratch and maps NFS maintenance

2021-07-01 Thread Brooke Storm
This is starting. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org > On Jun 30, 2021, at 9:34 AM, Brooke Storm wrote: > > The NFS servers used for scratch and maps mounts (/data/project and /home in > the maps project and /data/scratch in other projects) w

[Cloud-announce] Re: 2021-07-01 scratch and maps NFS maintenance

2021-07-01 Thread Brooke Storm
This is starting. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org > On Jun 30, 2021, at 9:34 AM, Brooke Storm wrote: > > The NFS servers used for scratch and maps mounts (/data/project and /home in > the maps project and /data/scratch in other projects) w

[Cloud] [Cloud-announce] 2021-07-01 scratch and maps NFS maintenance

2021-06-30 Thread Brooke Storm
systems. The process could start later than 1600 UTC if there are sync issues initially as I try to get as much of the data as possible transferred. More details here https://phabricator.wikimedia.org/T224747 <https://phabricator.wikimedia.org/T224747> Brooke Storm Staff SRE Wikimedia

[Cloud-announce] 2021-07-01 scratch and maps NFS maintenance

2021-06-30 Thread Brooke Storm
systems. The process could start later than 1600 UTC if there are sync issues initially as I try to get as much of the data as possible transferred. More details here https://phabricator.wikimedia.org/T224747 <https://phabricator.wikimedia.org/T224747> Brooke Storm Staff SRE Wikimedia

[Cloud] [Cloud-announce] Webservice release changes coming this week

2021-06-29 Thread Brooke Storm
d a followup email and record it in SAL. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org ___ Cloud-announce mailing list -- cloud-annou...@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.l

[Cloud-announce] Webservice release changes coming this week

2021-06-29 Thread Brooke Storm
d a followup email and record it in SAL. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org ___ Cloud-announce mailing list -- cloud-announce@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.org/

[Cloud] [Cloud-announce] Mounts for Scratch NFS changing a bit tomorrow (2021-05-27 around 20:00 UTC)

2021-05-26 Thread Brooke Storm
se let us know in Libera.chat: #wikimedia-cloud Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org ___ Cloud-announce mailing list -- cloud-annou...@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists

[Cloud-announce] Mounts for Scratch NFS changing a bit tomorrow (2021-05-27 around 20:00 UTC)

2021-05-26 Thread Brooke Storm
se let us know in Libera.chat: #wikimedia-cloud Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org ___ Cloud-announce mailing list -- cloud-announce@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists

[Cloud] Re: Porting the output of qstat to a web page

2021-05-11 Thread Brooke Storm
/dist/util/resources/schemas/qstat/qstat.xsd;> So maybe that will help. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org > On May 11, 2021, at 1:15 PM, Huji Lee wrote: > > Hi, > > Some of the jobs I submit to the grid take a long time (say,

[Cloud] [Cloud-announce] DNS for wikireplicas shifting to the new cluster today

2021-05-04 Thread Brooke Storm
<https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign>, please reach out on Phabricator, the #wikimedia-cloud IRC channel, or the Cloud mailing list. Thanks! Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org _

[Cloud-announce] DNS for wikireplicas shifting to the new cluster today

2021-05-04 Thread Brooke Storm
<https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign>, please reach out on Phabricator, the #wikimedia-cloud IRC channel, or the Cloud mailing list. Thanks! Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org _

Re: [Cloud-announce] [Cloud] cloud-vps maintenance at 14:00 UTC

2021-04-27 Thread Brooke Storm
Toolforge NFS services are not accessible to the VMs. Something went wrong during the upgrade and the team is working to restore services as quickly as possible. I have shut off the bastions to protect their local disks. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org

Re: [Cloud] [Cloud-announce] cloud-vps maintenance at 14:00 UTC

2021-04-27 Thread Brooke Storm
Toolforge NFS services are not accessible to the VMs. Something went wrong during the upgrade and the team is working to restore services as quickly as possible. I have shut off the bastions to protect their local disks. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org

Re: [Wikitech-l] toolforge down?

2021-04-27 Thread Brooke Storm
A network issue related to the upgrade has caused NFS to be unmounted. I’ve gone and shut off the bastions for now at this point because things are getting written to the local disks while we get this resolved. Sorry. This was not a planned outage. Brooke Storm Staff SRE Wikimedia Cloud

Re: [Cloud] [Cloud-announce] Wikireplicas: old cluster migrations start in 2 weeks. Please test your code with the new cluster

2021-04-06 Thread Brooke Storm
> On Apr 6, 2021, at 11:49 AM, Huji Lee wrote: > > Pardon my potentially dumb question, but how do the shorter DB names get > resolved? > > I have python code that looks like this: > > conn = mysqldb.connect(host="enwiki.labsdb", db="enwiki_p", > read_default_file="~/replica.my.cnf") > >

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Brooke Storm
> On Mar 31, 2021, at 5:18 PM, Roy Smith wrote: > > I'm just playing around on tools-sgebastion-08. I can dump the first 1 > million image names about half a minute: > >> tools.spi-tools-dev:xw-join$ time mysql >> --defaults-file=$HOME/replica.my.cnf -h >>

Re: [Cloud] [Cloud-announce] New Wikireplicas available, timeline update, and Quarry migration

2021-03-31 Thread Brooke Storm
omething like a month now. As a result, it should be a pretty good list. I just did a pull from that tool and can try to script a concept of who is doing cross-wiki joins and let you know. It’s possible it would be quite doable once I’ve got the list parsed out. Brooke Storm Staff SRE Wikimed

Re: [Cloud] Getting a lot of 502, 503 server errors on toolforge ???

2021-01-04 Thread Brooke Storm
oesn’t clear up with a restart, please make a Phabricator task to help coordinate. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org > On Jan 4, 2021, at 3:27 PM, Arthur Smith wrote: > > My toolforge service (https://author-disambiguator.toolforge.org/ &g

Re: [Cloud] [Cloud-announce] Toolforge and Cloud VPS NFS maintenance today

2020-12-22 Thread Brooke Storm
Work on this is done. Thanks to this maintenance, NFS performance (especially read performance) should be much improved. Failover will also now be quicker and more reliable from now on. We appreciate your patience. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org > On

Re: [Cloud-announce] Toolforge and Cloud VPS NFS maintenance today

2020-12-22 Thread Brooke Storm
Work on this is done. Thanks to this maintenance, NFS performance (especially read performance) should be much improved. Failover will also now be quicker and more reliable from now on. We appreciate your patience. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org > On

Re: [Cloud] [Cloud-announce] Toolforge and Cloud VPS NFS maintenance today

2020-12-22 Thread Brooke Storm
Unfortunately, the failover has not gone well due to some upgrades to the secondary it seems. I apologize for the service disruptions, and we are trying to bring everything back as fast as possible. Brooke Storm > On Dec 22, 2020, at 8:56 AM, Brooke Storm wrote: > > We will b

Re: [Cloud-announce] Toolforge and Cloud VPS NFS maintenance today

2020-12-22 Thread Brooke Storm
Unfortunately, the failover has not gone well due to some upgrades to the secondary it seems. I apologize for the service disruptions, and we are trying to bring everything back as fast as possible. Brooke Storm > On Dec 22, 2020, at 8:56 AM, Brooke Storm wrote: > > We will b

[Cloud] [Cloud-announce] Toolforge and Cloud VPS NFS maintenance today

2020-12-22 Thread Brooke Storm
We will be failing over the Toolforge and Project NFS in 10 minutes to move the main interface to 10Gb Ethernet. The previous work should make this fairly non-disruptive, but that was believed in the past as well. Brooke Storm Cloud Service Team

[Cloud-announce] Toolforge and Cloud VPS NFS maintenance today

2020-12-22 Thread Brooke Storm
We will be failing over the Toolforge and Project NFS in 10 minutes to move the main interface to 10Gb Ethernet. The previous work should make this fairly non-disruptive, but that was believed in the past as well. Brooke Storm Cloud Service Team

Re: [Cloud] [Cloud-announce] ToolsDB Maintenance - 2020-12-16 @ 1700 UTC

2020-12-16 Thread Brooke Storm
Toolsdb is finally back AND replicated —Brooke > On Dec 16, 2020, at 9:04 AM, Brooke Storm wrote: > > This is happening in about an hour. We will be taking ToolsDB down for > maintenance. > > Brooke Storm > Staff SRE > Wikimedia Cloud Services > bst..

Re: [Cloud-announce] ToolsDB Maintenance - 2020-12-16 @ 1700 UTC

2020-12-16 Thread Brooke Storm
Toolsdb is finally back AND replicated —Brooke > On Dec 16, 2020, at 9:04 AM, Brooke Storm wrote: > > This is happening in about an hour. We will be taking ToolsDB down for > maintenance. > > Brooke Storm > Staff SRE > Wikimedia Cloud Services > bst..

Re: [Cloud] [Cloud-announce] ToolsDB Maintenance - 2020-12-16 @ 1700 UTC

2020-12-16 Thread Brooke Storm
This is happening in about an hour. We will be taking ToolsDB down for maintenance. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org IRC: bstorm_ > On Dec 8, 2020, at 4:05 PM, Brooke Storm wrote: > > In yet another effort to restore replication and preserve the r

Re: [Cloud-announce] ToolsDB Maintenance - 2020-12-16 @ 1700 UTC

2020-12-16 Thread Brooke Storm
This is happening in about an hour. We will be taking ToolsDB down for maintenance. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org IRC: bstorm_ > On Dec 8, 2020, at 4:05 PM, Brooke Storm wrote: > > In yet another effort to restore replication and preserve the r

Re: [Cloud-announce] Toolforge kubernetes maintenance today 2020-12-10 @ 15:30 UTC

2020-12-11 Thread Brooke Storm
This upgrade is now complete. The cluster should be stable now. Thank you for your patience if there were any noticeable interruptions. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org IRC: bstorm_ > On Dec 10, 2020, at 8:36 AM, Arturo Borrero Gonzalez >

[Cloud] [Cloud-announce] ToolsDB Maintenance - 2020-12-16 @ 1700 UTC

2020-12-08 Thread Brooke Storm
period. That should be an additional hour or so. We appreciate your patience with this process. It is very important that we establish a second copy of this database, especially in light of recent crashes (https://phabricator.wikimedia.org/T253738 <https://phabricator.wikimedia.org/T253738&g

[Cloud-announce] ToolsDB Maintenance - 2020-12-16 @ 1700 UTC

2020-12-08 Thread Brooke Storm
period. That should be an additional hour or so. We appreciate your patience with this process. It is very important that we establish a second copy of this database, especially in light of recent crashes (https://phabricator.wikimedia.org/T253738 <https://phabricator.wikimedia.org/T253738&g

Re: [Cloud] [Cloud-announce] Wiki Replicas 2020 Redesign

2020-11-17 Thread Brooke Storm
ACN: Thanks! We’ve created a ticket for that one to help collaborate and surface the process here: https://phabricator.wikimedia.org/T267992 <https://phabricator.wikimedia.org/T267992> Anybody working on that, please add info there. Brooke Storm Staff SRE Wikimedia Cloud Servic

Re: [Cloud] [Cloud-announce] 2020-11-10 ToolsDB (User databases in Toolforge) read-only downtime

2020-11-11 Thread Brooke Storm
. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org <mailto:bst...@wikimedia.org> IRC: bstorm > On Nov 11, 2020, at 3:23 PM, Brooke Storm wrote: > > Update: I don’t think it is going to be done in 10 minutes. I’m surprised, > but the process is still runnin

Re: [Cloud-announce] 2020-11-10 ToolsDB (User databases in Toolforge) read-only downtime

2020-11-11 Thread Brooke Storm
. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org <mailto:bst...@wikimedia.org> IRC: bstorm > On Nov 11, 2020, at 3:23 PM, Brooke Storm wrote: > > Update: I don’t think it is going to be done in 10 minutes. I’m surprised, > but the process is still runnin

Re: [Cloud-announce] 2020-11-10 ToolsDB (User databases in Toolforge) read-only downtime

2020-11-11 Thread Brooke Storm
Update: I don’t think it is going to be done in 10 minutes. I’m surprised, but the process is still running. I still believe it will complete today. > On Nov 11, 2020, at 8:31 AM, Brooke Storm wrote: > > Update: > ToolsDB remains in read-only mode while the data loads on

Re: [Cloud-announce] 2020-11-10 ToolsDB (User databases in Toolforge) read-only downtime

2020-11-11 Thread Brooke Storm
://phabricator.wikimedia.org/T266587 <https://phabricator.wikimedia.org/T266587> > On Nov 10, 2020, at 8:51 AM, Brooke Storm wrote: > > This will be happening in around 10 minutes. ToolsDB will be read-only until > we can get a consistent dump to rebuild replication. > > Brooke Storm >

Re: [Cloud] [Cloud-announce] Wiki Replicas 2020 Redesign

2020-11-10 Thread Brooke Storm
ly is quite tricky. The closest I’ve seen ready-made tools come to that is ProxySQL, and that focuses on sharding, which is not exactly the same thing. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org <mailto:bst...@wikimedia.org> IRC: bstorm > On Nov 10, 2020, at

Re: [Cloud] [Cloud-announce] 2020-11-10 ToolsDB (User databases in Toolforge) read-only downtime

2020-11-10 Thread Brooke Storm
This will be happening in around 10 minutes. ToolsDB will be read-only until we can get a consistent dump to rebuild replication. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org <mailto:bst...@wikimedia.org> IRC: bstorm > On Nov 6, 2020, at 12:12 PM, Brooke Sto

Re: [Cloud-announce] 2020-11-10 ToolsDB (User databases in Toolforge) read-only downtime

2020-11-10 Thread Brooke Storm
This will be happening in around 10 minutes. ToolsDB will be read-only until we can get a consistent dump to rebuild replication. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org <mailto:bst...@wikimedia.org> IRC: bstorm > On Nov 6, 2020, at 12:12 PM, Brooke Sto

[Cloud-announce] 2020-11-10 ToolsDB (User databases in Toolforge) read-only downtime

2020-11-06 Thread Brooke Storm
mode, I hope the backup will not take terribly long. Please see https://phabricator.wikimedia.org/T266587 <https://phabricator.wikimedia.org/T266587> for additional information. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org <mailto:bst...@wikimedia.org>

[Cloud] [Cloud-announce] 2020-11-10 ToolsDB (User databases in Toolforge) read-only downtime

2020-11-06 Thread Brooke Storm
mode, I hope the backup will not take terribly long. Please see https://phabricator.wikimedia.org/T266587 <https://phabricator.wikimedia.org/T266587> for additional information. Brooke Storm Staff SRE Wikimedia Cloud Services bst...@wikimedia.org <mailto:bst...@wikimedia.org>

Re: [Cloud] [Cloud-announce] 2020-10-27@1600 UTC Database server maintenance for ToolsDB (user-writeable database service)

2020-10-27 Thread Brooke Storm
Reminder that this is happening today in around 30 minutes. > On Oct 20, 2020, at 1:13 PM, Brooke Storm wrote: > > On Tuesday 2020-10-27 at 1600 UTC, ToolsDB, the user database service > provided by Toolforge > (https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database

Re: [Cloud-announce] 2020-10-27@1600 UTC Database server maintenance for ToolsDB (user-writeable database service)

2020-10-27 Thread Brooke Storm
Reminder that this is happening today in around 30 minutes. > On Oct 20, 2020, at 1:13 PM, Brooke Storm wrote: > > On Tuesday 2020-10-27 at 1600 UTC, ToolsDB, the user database service > provided by Toolforge > (https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database

[Cloud] [Cloud-announce] 2020-10-27@1600 UTC Database server maintenance for ToolsDB (user-writeable database service)

2020-10-20 Thread Brooke Storm
hour in read-only mode, but things could take longer or shorter time depending on volume of data to copy and issues encountered during the process. Details are on Phabricator at https://phabricator.wikimedia.org/T263679 <https://phabricator.wikimedia.org/T263679> Brooke Storm Staff SRE Wikime

[Cloud-announce] New PAWS cluster ready for testing

2020-07-31 Thread Brooke Storm
sqlite database for now, and that may reduce the performance for things like launching notebooks under load. It will be using Toolsdb like the existing PAWS cluster after the final cut over. For further details, please see: https://wikitech.wikimedia.org/wiki/News/2020_PAWS_migration -- Brooke

Re: [Cloud] [Cloud-announce] NFS maintenance tomorrow 2020-06-11

2020-06-11 Thread Brooke Storm
This is done! Sorry it was a bit more "freeze/hang" prone than hoped, but it was smoother than in the past and improvements will be made based on our findings. On 6/11/20 8:52 AM, Brooke Storm wrote: > This is starting in around 10 minutes. > > On 6/10/20 2:03 PM, Brooke Storm

Re: [Cloud-announce] NFS maintenance tomorrow 2020-06-11

2020-06-11 Thread Brooke Storm
This is done! Sorry it was a bit more "freeze/hang" prone than hoped, but it was smoother than in the past and improvements will be made based on our findings. On 6/11/20 8:52 AM, Brooke Storm wrote: > This is starting in around 10 minutes. > > On 6/10/20 2:03 PM, Brooke Storm

Re: [Cloud-announce] NFS maintenance tomorrow 2020-06-11

2020-06-11 Thread Brooke Storm
This is starting in around 10 minutes. On 6/10/20 2:03 PM, Brooke Storm wrote: > Tomorrow (June 11th) at 1600 UTC, we will be failing over the primary > NFS server to do maintenance and upgrades on it. The secondary partner > in the cluster is already upgraded and ready, and recen

[Cloud] [Cloud-announce] NFS maintenance tomorrow 2020-06-11

2020-06-10 Thread Brooke Storm
. If it doesn't proceed smoothly, it will be a slightly longer period of high load and NFS lockup as failover completes (10-20 min or so). After maintenance it will be failed back, which will also, hopefully, be quick and painless. -- Brooke Storm SRE Wikimedia Cloud Services bst...@wikimedia.org IRC

[Cloud-announce] NFS maintenance tomorrow 2020-06-11

2020-06-10 Thread Brooke Storm
. If it doesn't proceed smoothly, it will be a slightly longer period of high load and NFS lockup as failover completes (10-20 min or so). After maintenance it will be failed back, which will also, hopefully, be quick and painless. -- Brooke Storm SRE Wikimedia Cloud Services bst...@wikimedia.org IRC

[Cloud-announce] Planned changes to the views on the wiki-replicas

2020-05-11 Thread Brooke Storm
. Progress on this action will be tracked on this Phabricator task - https://phabricator.wikimedia.org/T252219. -- Brooke Storm SRE Wikimedia Cloud Services bst...@wikimedia.org IRC: bstorm_ ___ Wikimedia Cloud Services announce mailing list Cloud-announce@lis

Re: [Cloud] User tables

2020-05-08 Thread Brooke Storm
VPS instance or ToolsDB? Brooke Storm SRE Wikimedia Cloud Services bst...@wikimedia.org IRC: bstorm_ On 5/8/20 12:31 PM, Huji Lee wrote: > Hi all, > Is it possible to store data into user tables through queries on > Wikireplica DBs? Or is it only possible by mysqldump'ing from the >

Re: [Cloud] [Cloud-announce] Planned NFS maintenance 2020-02-26@1800 UTC

2020-02-27 Thread Brooke Storm
NFS should be ok again. The iptables rules are fixed. Sorry for the noise and issues. Brooke Storm SRE Wikimedia Cloud Services bst...@wikimedia.org IRC: bstorm_ On 2/27/20 10:12 AM, Bryan Davis wrote: > On Thu, Feb 27, 2020 at 10:10 AM Maarten Dammers wrote: >> Hi Brooke, >>

Re: [Cloud] [Cloud-announce] Planned NFS maintenance 2020-02-26@1800 UTC

2020-02-26 Thread Brooke Storm
The maintenance is finished. Cron jobs likely would have failed during the maintenance and other issues. Please report new issues as they show up.  Brooke Storm SRE Wikimedia Cloud Services bst...@wikimedia.org IRC: bstorm_ On 2/26/20 10:47 AM, Brooke Storm wrote: > This is just a remin

Re: [Cloud-announce] Planned NFS maintenance 2020-02-26@1800 UTC

2020-02-26 Thread Brooke Storm
The maintenance is finished. Cron jobs likely would have failed during the maintenance and other issues. Please report new issues as they show up.  Brooke Storm SRE Wikimedia Cloud Services bst...@wikimedia.org IRC: bstorm_ On 2/26/20 10:47 AM, Brooke Storm wrote: > This is just a remin

Re: [Cloud] [Cloud-announce] Planned NFS maintenance 2020-02-26@1800 UTC

2020-02-26 Thread Brooke Storm
This is just a reminder of this maintenance and that it is starting in 15 min.  Interruptions will be likely for an hour or so. Brooke Storm SRE Wikimedia Cloud Services bst...@wikimedia.org IRC: bstorm_ On 2/20/20 3:14 PM, Brooke Storm wrote: > On Wed 26-Feb, we are making a large change to

Re: [Cloud-announce] Planned NFS maintenance 2020-02-26@1800 UTC

2020-02-26 Thread Brooke Storm
This is just a reminder of this maintenance and that it is starting in 15 min.  Interruptions will be likely for an hour or so. Brooke Storm SRE Wikimedia Cloud Services bst...@wikimedia.org IRC: bstorm_ On 2/20/20 3:14 PM, Brooke Storm wrote: > On Wed 26-Feb, we are making a large change to

Re: [Cloud] Workflow for kubernetes?

2020-02-21 Thread Brooke Storm
k is: > > Terminal window on MacOS >   ssh -t dev.tools.wmflabs.org <http://dev.tools.wmflabs.org> tmux new > -As spi-tools-dev >     become spi-tools-dev >       webservice --backend=kubernetes python3.7 shell > > >> On Feb 21, 2020, at 12:35 PM, Brooke Storm >

Re: [Cloud] Workflow for kubernetes?

2020-02-21 Thread Brooke Storm
with Kubernetes. In some shells, I know you need to use SHIFT-CTRL-P. I have no issues on my Mac. What client are you using the connect via SSH? I might be able to recreate the issue and figure out a fix. -- Brooke Storm SRE Wikimedia Cloud Services bst...@wikimedia.org IRC: bstorm_ On 2/21/20 7:40 AM

[Cloud] [Cloud-announce] Planned NFS maintenance 2020-02-26@1800 UTC

2020-02-20 Thread Brooke Storm
On Wed 26-Feb, we are making a large change to how NFS is mounted in Cloud Services https://gerrit.wikimedia.org/r/c/operations/puppet/+/571821. This will impact any Cloud VPS projects that mount NFS for home directories, project directories and scratch, including Toolforge.  During this change,

[Cloud-announce] Planned NFS maintenance 2020-02-26@1800 UTC

2020-02-20 Thread Brooke Storm
On Wed 26-Feb, we are making a large change to how NFS is mounted in Cloud Services https://gerrit.wikimedia.org/r/c/operations/puppet/+/571821. This will impact any Cloud VPS projects that mount NFS for home directories, project directories and scratch, including Toolforge.  During this change,

[Wikitech-l] PyCon Financial Assistance and Development Sprints Info

2020-01-23 Thread Brooke Storm
sprint around some Wikimedia Cloud Services and Toolforge code. Brooke Storm SRE Wikimedia Cloud Services bst...@wikimedia.org <mailto:bst...@wikimedia.org> IRC: bstorm_ - Forwarded message I wanted to mention - feel free to pass this on publicly and in personal invitations - that

Re: [Cloud] etcd has no leader?

2020-01-16 Thread Brooke Storm
That is not a good thing! The algorithm does allow for that to happen temporarily without issue, but I appreciate the report. We will keep an eye out for further issues. Brooke Storm Senior SRE Wikimedia Cloud Services bst...@wikimedia.org <mailto:bst...@wikimedia.org> IRC: bstorm_ >

Re: [Cloud] [Cloud-announce] [Toolforge] New Kubernetes cluster open for beta testers

2020-01-13 Thread Brooke Storm
Per that ticket, I no longer think there is any issue with the images for python at least. There was an issue with some nodes (that is being/mostly fixed). I’ll take a look at glamtools Brooke Storm Senior SRE Wikimedia Cloud Services bst...@wikimedia.org <mailto:bst...@wikimedia.org&g

Re: [Cloud] [Cloud-announce] [Toolforge] New Kubernetes cluster open for beta testers

2020-01-13 Thread Brooke Storm
I’ve created Phabricator task T242632 to track this. Please coordinate there as well for anyone with information and time. Brooke Storm Senior SRE Wikimedia Cloud Services bst...@wikimedia.org <mailto:bst...@wikimedia.org> IRC: bstorm_ > On Jan 13, 2020, at 9:10 AM, Brooke Storm wrot

Re: [Cloud] [Cloud-announce] [Toolforge] New Kubernetes cluster open for beta testers

2020-01-13 Thread Brooke Storm
I suspect this will affect containers that run Debian Buster packages. I see php7.3 and python3.7. I’d suggest not even restarting web services on those runtimes until we have it fixed. For anyone who has done so, we are working on it. Any logs could be helpful. Brooke Storm Senior SRE

Re: [Cloud] [Cloud-announce] [Toolforge] New Kubernetes cluster open for beta testers

2020-01-13 Thread Brooke Storm
I see a problem with at least one container image (which has nothing to do with the new cluster, I can see it on the old cluster as well). It looks like I’m going to be trying to fix that now. (Magnus, this is probably what you are seeing as well). Brooke Storm Senior SRE Wikimedia Cloud

Re: [Cloud] [Cloud-announce] [Toolforge] New Kubernetes cluster open for beta testers

2020-01-12 Thread Brooke Storm
> been created on Wikitech [0] outlining the self-service migration >>>>>>>>>> process. >>>>>>>>>> >>>>>>>>>> Timeline: >>>>>>>>>> * 2020-01-09: 2020 Kubernetes clu

Re: [Cloud] [Cloud-announce] [Toolforge] New Kubernetes cluster open for beta testers

2020-01-12 Thread Brooke Storm
Hi Nux, I took a look, and I see you have DNA running on Grid Engine. Has it ever run ok on either Kubernetes backend (the old “default” or the new “toolforge”)? Brooke Storm Senior SRE Wikimedia Cloud Services bst...@wikimedia.org <mailto:bst...@wikimedia.org> IRC: bstorm_ > On Jan

[Cloud] [Cloud-announce] Brief ToolsDB Outage - Thursday 10/24 @11am UTC

2019-10-21 Thread Brooke Storm
loudvirt1019 hypervisor, which is why it is in scope. We sincerely apologize for the short notice. Brooke Storm Senior SRE Wikimedia Cloud Services bst...@wikimedia.org <mailto:bst...@wikimedia.org> IRC: bstorm_ signature.asc Description: Message s

[Cloud-announce] Brief ToolsDB Outage - Thursday 10/24 @11am UTC

2019-10-21 Thread Brooke Storm
loudvirt1019 hypervisor, which is why it is in scope. We sincerely apologize for the short notice. Brooke Storm Senior SRE Wikimedia Cloud Services bst...@wikimedia.org <mailto:bst...@wikimedia.org> IRC: bstorm_ signature.asc Description: Message s

[Cloud-announce] 20190726 - afl_log_id field to be removed from wiki replica views

2019-07-26 Thread Brooke Storm
: https://phabricator.wikimedia.org/T226851 <https://phabricator.wikimedia.org/T226851> Some of the reasons why and a bit more context: https://phabricator.wikimedia.org/T214592 <https://phabricator.wikimedia.org/T214592> Brooke Storm Operations Engineer Wikimedia Cloud

[issue37369] Issue with pip in venv on Powershell in Windows

2019-06-26 Thread Brooke Storm
Brooke Storm added the comment: To answer Steve's question, the ver command gives me: Microsoft Windows [Version 10.0.18917.1000] Thank you for looking into this. -- ___ Python tracker <https://bugs.python.org/issue37

[issue37369] Issue with pip in venv on Powershell in Windows

2019-06-21 Thread Brooke Storm
Brooke Storm added the comment: I should add that, after testing a bit, it isn't actually working in cmd.exe. That is simply using the overarching python install. It's not using the virtualenv at all. The virtualenv that was created by the venv module appears to be non-functional

[issue37369] Issue with pip in venv on Powershell in Windows

2019-06-21 Thread Brooke Storm
New submission from Brooke Storm : I am finding that, using Powershell on Windows 10 and the current version of Python 3.7.3 installed from the Microsoft Store, when I create a virtualenv via "python -m venv " and activate it in Powershell with the Activate.ps1 script that is gene

[Cloud-announce] New views on Wiki Replicas to help with slow actor and comment queries

2019-06-10 Thread Brooke Storm
related to the actor and comment tables. A Phabricator task is already open to update the MediaWiki documentation related (https://phabricator.wikimedia.org/T225007), but it is likely that there are bits around wikitech to update as well. Brooke Storm Operations Engineer Wikimedia Cloud Services bst

Re: [Cloud-announce] Dropping user text columns from replica views 2019-05-27 now 2019-06-03

2019-06-04 Thread Brooke Storm
iki/News/Actor_storage_changes_on_the_Wiki_Replicas> I’m hoping to collect more useful information and tips there as we go. Brooke Storm Operations Engineer Wikimedia Cloud Services bst...@wikimedia.org <mailto:bst...@wikimedia.org> IRC: bstorm_ > On Jun 3, 2019, at 8:27 AM, Broo

Re: [Cloud-announce] Dropping user text columns from replica views 2019-05-27 now 2019-06-03

2019-06-03 Thread Brooke Storm
Work on this is beginning. Brooke Storm Operations Engineer Wikimedia Cloud Services bst...@wikimedia.org <mailto:bst...@wikimedia.org> IRC: bstorm_ > On May 20, 2019, at 12:38 PM, Brooke Storm wrote: > > I’d like to announce that after taking considerable user

Re: [Cloud] [Cloud-announce] NFS scratch mount changes 2019-05-28@1800 UTC

2019-05-28 Thread Brooke Storm
As puppet runs propagate, this is being rolled out. When it appears complete, I’ll be unmounting the labstore1003 mount point. Brooke Storm Operations Engineer Wikimedia Cloud Services bst...@wikimedia.org <mailto:bst...@wikimedia.org> IRC: bstorm_ > On May 28, 2019, at 10:49 AM, Bro

Re: [Cloud-announce] NFS scratch mount changes 2019-05-28@1800 UTC

2019-05-28 Thread Brooke Storm
As puppet runs propagate, this is being rolled out. When it appears complete, I’ll be unmounting the labstore1003 mount point. Brooke Storm Operations Engineer Wikimedia Cloud Services bst...@wikimedia.org <mailto:bst...@wikimedia.org> IRC: bstorm_ > On May 28, 2019, at 10:49 AM, Bro

[Cloud-announce] NFS scratch mount changes 2019-05-28@1800 UTC

2019-05-22 Thread Brooke Storm
-impact since it isn’t affecting /data/project or /home. Brooke Storm Operations Engineer Wikimedia Cloud Services bst...@wikimedia.org <mailto:bst...@wikimedia.org> IRC: bstorm_ ___ Wikimedia Cloud Services announce mailing list Cloud-an

Re: [Cloud-announce] Dropping user text columns from replica views 2019-05-27 now 2019-06-03

2019-05-20 Thread Brooke Storm
hat we are extending the date to Monday, June 3rd instead to begin dropping the fields in the views. This should give additional time to fix things up as well as find issues in existing views if any more arise. Brooke Storm Operations Engineer Wikimedia Cloud Services bst...@wikimedia.org <m

[Cloud-announce] Dropping user text columns from replica views 2019-05-27

2019-05-17 Thread Brooke Storm
table, which won’t be changing in a user-visible way at this time. Brooke Storm Operations Engineer Wikimedia Cloud Services bst...@wikimedia.org <mailto:bst...@wikimedia.org> IRC: bstorm_ ___ Wikimedia Cloud Services announce mailing list Cl

Re: [Cloud] [Cloud-announce] OSMDB migration to new server 20190404@1700 UTC

2019-04-04 Thread Brooke Storm
not impact users of the server at all. Apologies for the less smooth transition. I’ll add docs and/or technical fixes to make future failover more smooth. Brooke Storm Operations Engineer Wikimedia Cloud Services bst...@wikimedia.org <mailto:bst...@wikimedia.org> IRC: bstorm_ > On Apr 4, 20

Re: [Cloud-announce] OSMDB migration to new server 20190404@1700 UTC

2019-04-04 Thread Brooke Storm
not impact users of the server at all. Apologies for the less smooth transition. I’ll add docs and/or technical fixes to make future failover more smooth. Brooke Storm Operations Engineer Wikimedia Cloud Services bst...@wikimedia.org <mailto:bst...@wikimedia.org> IRC: bstorm_ > On Apr 4, 20

Re: [Cloud] [Cloud-announce] OSMDB migration to new server 20190404@1700 UTC

2019-04-04 Thread Brooke Storm
Sadly, there was a permissions issue that caused a brief crash during the promotion of the server to master. The server should be up in read-write mode now on the new server. Adjusting things to get the rsync jobs moved and working as well at this point. Brooke Storm Operations Engineer

  1   2   >