Followup on this:

The WMCS team is pretty sure that all user-facing services have been restored. If you encounter any current unexpected breakage, please email me directly or use !help on IRC.

There's still a fair bit of less-urgent cleanup left to do.  Puppet will remain disabled on most VMs until that's finished, which may take a day or two.

-Andrew + the WMCS team.


On 6/4/20 10:18 AM, Bryan Davis wrote:
At 2020-06-04T11:12 UTC a change was merged to the
operations/puppet.git repository which resulted in data loss for Cloud
VPS projects using a local Puppetmaster
(role::puppetmaster::standalone). The specific data loss is removal of
any local to the Puppetmaster instance commits overlaid on the
upstream labs/private.git repository. These patches would have
contained passwords, ssh keys, TLS certificates, and similar
authentication information for Puppet managed configuration.

The majority of Cloud VPS projects are not affected by this
configuration data loss. Several highly used and visible projects,
including Toolforge (tools) and Beta Cluster (deployment-prep), have
some impact. We have disabled Puppet across all Cloud VPS instances
that were reachable by our central command and control service (cumin)
and are currently evaluating impact and recovering data from
/var/logs/puppet.log change logs where available.

More information will be collected at
<https://phabricator.wikimedia.org/T254491> and an incident report
will also be prepared once the initial response is complete.

Bryan



_______________________________________________
Wikimedia Cloud Services announce mailing list
Cloud-announce@lists.wikimedia.org (formerly labs-annou...@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud-announce

Reply via email to