Re: [openstack-dev] [TripleO] Fixing Swift rings when upscaling/replacing nodes in TripleO deployments

Arkady.Kanevsky Thu, 05 Jan 2017 06:59:28 -0800

I have concern to rely on undercloud for overcloud swift.
Undercloud is not HA (yet) so it may not be operational when disk failed or 
swift overcloud node is added/deleted.

-----Original Message-----
From: Christian Schwede [mailto:cschw...@redhat.com] 
Sent: Thursday, January 05, 2017 6:14 AM
To: OpenStack Development Mailing List <openstack-dev@lists.openstack.org>
Subject: [openstack-dev] [TripleO] Fixing Swift rings when upscaling/replacing 
nodes in TripleO deployments

Hello everyone,

there was an earlier discussion on $subject last year [1] regarding a bug when 
upscaling or replacing nodes in TripleO [2].

Shortly summarized: Swift rings are built on each node separately, and if 
adding or replacing nodes (or disks) this will break the rings because they are 
no longer consistent across the nodes. What's needed are the previous ring 
builder files on each node before changing the rings.

My former idea in [1] was to build the rings in advance on the undercloud, and 
also using introspection data to gather a set of disks on each node for the 
rings.

However, this changes the current way of deploying significantly, and also 
requires more work in TripleO and Mistral (for example to trigger a ring build 
on the undercloud after the nodes have been started, but before the deployment 
triggers the Puppet run).

I prefer smaller steps to keep everything stable for now, and therefore I 
changed my patches quite a bit. This is my updated proposal:

1. Two temporary undercloud Swift URLs (one PUT, one GET) will be computed 
before Mistral starts the deployments. A new Mistral action to create such URLs 
is required for this [3].
2. Each overcloud node will try to fetch rings from the undercloud Swift 
deployment before updating it's set of rings locally using the temporary GET 
url. This guarantees that each node uses the same source set of builder files. 
This happens in step 2. [4] 3. puppet-swift runs like today, updating the rings 
if required.
4. Finally, at the end of the deployment (in step 5) the nodes will upload 
their modified rings to the undercloud using the temporary PUT urls. 
swift-recon will run before this, ensuring that all rings across all nodes are 
consistent.

The two required patches [3][4] are not overly complex IMO, but they solve the 
problem of adding or replacing nodes without changing the current workflow 
significantly. It should be even easy to backport them if needed.

I'll continue working on an improved way of deploying Swift rings (using 
introspection data), but using this approach it could be even done using todays 
workflow, feeding data into puppet-swift (probably with some updates to 
puppet-swift/tripleo-heat-templates to allow support for regions, zones, 
different disk layouts and the like). However, all of this could be built on 
top of these two patches.

I'm curious about your thoughts and welcome any feedback or reviews!

Thanks,

-- Christian

[1]
http://lists.openstack.org/pipermail/openstack-dev/2016-August/100720.html
[2] https://bugs.launchpad.net/tripleo/+bug/1609421
[3] https://review.openstack.org/#/c/413229/
[4] https://review.openstack.org/#/c/414460/

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] Fixing Swift rings when upscaling/replacing nodes in TripleO deployments

Reply via email to