Thanks Steven for your feedback! Please see my answers inline. On 02.08.16 23:46, Steven Hardy wrote: > On Tue, Aug 02, 2016 at 09:36:45PM +0200, Christian Schwede wrote: >> Hello everyone, >> >> I'd like to improve the Swift deployments done by TripleO. There are a >> few problems today when deployed with the current defaults: > > Thanks for digging into this, I'm aware this has been something of a > known-issue for some time, so it's great to see it getting addressed :) > > Some comments inline; > >> 1. Adding new nodes (or replacing existing nodes) is not possible, >> because the rings are built locally on each host and a new node doesn't >> know about the "history" of the rings. Therefore rings might become >> different on the nodes, and that results in an unusable state eventually. >> >> 2. The rings are only using a single device, and it seems that this is >> just a directory and not a mountpoint with a real device. Therefore data >> is stored on the root device - even if you have 100TB disk space in the >> background. If not fixed manually your root device will run out of space >> eventually. >> >> 3. Even if a real disk is mounted in /srv/node, replacing a faulty disk >> is much more troublesome. Normally you would simply unmount a disk, and >> then replace the disk sometime later. But because mount_check is set to >> False in the storage servers data will be written to the root device in >> the meantime; and when you finally mount the disk again, you can't >> simply cleanup. >> >> 4. In general, it's not possible to change cluster layout (using >> different zones/regions/partition power/device weight, slowly adding new >> devices to avoid 25% of the data will be moved immediately when adding >> new nodes to a small cluster, ...). You could manually manage your >> rings, but they will be overwritten finally when updating your overcloud. >> >> 5. Missing erasure coding support (or storage policies in general) >> >> This sounds bad, however most of the current issues can be fixed using >> customized templates and some tooling to create the rings in advance on >> the undercloud node. >> >> The information about all the devices can be collected from the >> introspection data, and by using node placement the nodenames in the >> rings are known in advance if the nodes are not yet powered on. This >> ensures a consistent ring state, and an operator can modify the rings if >> needed and to customize the cluster layout. >> >> Using some customized templates we can already do the following: >> - disable rinbguilding on the nodes >> - create filesystems on the extra blockdevices >> - copy ringfiles from the undercloud, using pre-built rings >> - enable mount_check by default >> - (define storage policies if needed) >> >> I started working on a POC using tripleo-quickstart, some custom >> templates and a small Python tool to build rings based on the >> introspection data: >> >> https://github.com/cschwede/tripleo-swift-ring-tool >> >> I'd like to get some feedback on the tool and templates. >> >> - Does this make sense to you? > > Yes, I think the basic workflow described should work, and it's good to see > that you're passing the ring data via swift as this is consistent with how > we already pass some data to nodes via our DeployArtifacts interface: > > https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/deploy-artifacts.yaml > > Note however that there are no credentials to access the undercloud swift > on the nodes, so you'll need to pass a tempurl reference in (which is what > we do for deploy artifacts, obviously you will have credentials to create > the container & tempurl on the undercloud).
Ah, that's very useful! I updated my POC; makes one less customized template and less code to support in the Python tool. Works as expected! > One slight concern I have is mandating the use of predictable placement - > it'd be nice to think about ways we might avoid that but the undercloud > centric approach seems OK for a first pass (in either case I think the > delivery via swift will be the same). Do you mean the predictable artifact filename? We could just add a randomized prefix to the filename IMO. >> - How (and where) could we integrate this upstream? > > So I think the DeployArtefacts interface may work for this, and we have a > helper script that can upload data to swift: > > https://github.com/openstack/tripleo-common/blob/master/scripts/upload-swift-artifacts > > This basically pushes a tarball to swift, creates a tempurl, then creates a > file ($HOME/.tripleo/environments/deployment-artifacts.yaml) which is > automatically read by tripleoclient on deployment. > > DeployArtifactURLs is already a list, but we'll need to test and confirm we > can pass both e.g swift ring data and updated puppet modules at the same > time. If I see this correct the artifacts are deployed just before Puppet runs; and the Swift rings doesn't affect the Puppet modules, so that should be fine? At least it's working in my tests this morning. > The part that actually builds the rings on the undercloud will probably > need to be created as a custom mistral action: > > https://github.com/openstack/tripleo-common/tree/master/tripleo_common/actions > > These are then driven as part of the deployment workflow (although the > final workflow where this will wire in hasn't yet landed): > > https://review.openstack.org/#/c/298732/ Alright, I'll have a look how to integrate this. >> - Templates might be included in tripleo-heat-templates? > > Yes, although by the look of it there may be few template changes required. > > If you want to remove the current ringbuilder puppet step completely, you > can simply remove OS::TripleO::Services::SwiftRingBuilder from the > ControllerServices/ObjectStorageServices list: > > https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud.yaml#L393 > https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud.yaml#L492 > > Or, map the current implementation to OS::Heat::None: > > cat no_ringbuild_env.yaml > > resource_registry: > OS::TripleO::Services::SwiftRingBuilder: OS::Heat::None That would only work on current master, right? Setting RingBuild to False would make it possible to use that on Mitaka too. > Obviously this same approach could be used to easily map in an alternative > template (replacing puppet/services/swift-ringbuilder.yaml) but it sounds > like the primary integration point here will on the undercloud? Indeed, the action itself should happen on the undercloud. >> IMO the most important change would be to avoid overwriting rings on the >> overcloud. There is a good chance to mess up your cluster if the >> template to disable ring building isn't used and you already have >> working rings in place. Same for the mount_check option. >> >> I'm curious about your thoughts! > > This all sounds pretty good - I'd be pleased if you could raise some bugs > (either one, or one per logical issue, your choice), and let me know asap > if this is something you're likely to be trying to land for Newton, clearly > time is running out and we'll have to prioritize already very overloaded > reviewer resources but this is clearly an important thing to fix. I created only a single bug; I think the topics are closely tied to each other, so IMO it makes sense to have a single reference for them: https://bugs.launchpad.net/tripleo/+bug/1609421 I would be happy to see land some improvements in the Newton release, but I fully understand the tight schedule for this. So my idea would be to submit a few patches to make this easily consumable: - add a loopback device to the gate nodes to enable testing - one tht template to partition all blockdevices except root devices - submit a tripleo-common patch for the ring building script - submit a tripleo-docs patch how to use this using a customized env - eventually make the switch to the new workflow This sounds doable to me? Please let me know if there is a better way to handle this. Otherwise I start hacking on this :) -- Christian __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev