On Thu, Jul 19, 2018 at 7:13 PM, Ben Nemec <openst...@nemebean.com> wrote: > > > On 07/19/2018 03:37 PM, Emilien Macchi wrote: >> >> Today I played a little bit with Standalone deployment [1] to deploy a >> single OpenStack cloud without the need of an undercloud and overcloud. >> The use-case I am testing is the following: >> "As an operator, I want to deploy a single node OpenStack, that I can >> extend with remote compute nodes on the edge when needed." >> >> We still have a bunch of things to figure out so it works out of the box, >> but so far I was able to build something that worked, and I found useful to >> share it early to gather some feedback: >> https://gitlab.com/emacchi/tripleo-standalone-edge >> >> Keep in mind this is a proof of concept, based on upstream documentation >> and re-using 100% what is in TripleO today. The only thing I'm doing is to >> change the environment and the roles for the remote compute node. >> I plan to work on cleaning the manual steps that I had to do to make it >> working, like hardcoding some hiera parameters and figure out how to >> override ServiceNetmap. >> >> Anyway, feel free to test / ask questions / provide feedback. > > > What is the benefit of doing this over just using deployed server to install > a remote server from the central management system? You need to have > connectivity back to the central location anyway. Won't this become > unwieldy with a large number of edge nodes? I thought we told people not to > use Packstack for multi-node deployments for exactly that reason. > > I guess my concern is that eliminating the undercloud makes sense for > single-node PoC's and development work, but for what sounds like a > production workload I feel like you're cutting off your nose to spite your > face. In the interest of saving one VM's worth of resources, now all of > your day 2 operations have no built-in orchestration. Every time you want > to change a configuration it's "copy new script to system, ssh to system, > run script, repeat for all systems. So maybe this is a backdoor way to make > Ansible our API? ;-)
I believe Emilien was looking at this POC in part because of some input from me, so I will attempt to address your questions constructively. What you're looking at here is exactly a POC. The deployment is a POC using the experimental standalone code. I think the use case as presented by Emilien is something worth considering: >> "As an operator, I want to deploy a single node OpenStack, that I can >> extend with remote compute nodes on the edge when needed." I wouldn't interpret that to mean much of anything around eliminating the undercloud, other than what is stated for the use case. I feel that jumping to eliminating the undercloud would be an over simplification. The goal of the POC isn't packstack parity, or even necessarily a packstack like architecture. One of the goals is to see if we can deploy separate disconnected stacks for Control and Compute. The standalone work happens to be a good way to test out some of the work around that. The use case was written to help describe and provide an overall picture of what is going on with this specific POC, with a focus towards the edge use case. You make some points about centralized management and connectivity back to the central location. Those are the exact sorts of things we are thinking about when we consider how we will address edge deployments. If you haven't had a chance yet, check out the Edge Computing whitepaper from the foundation: https://www.openstack.org/assets/edge/OpenStack-EdgeWhitepaper-v3-online.pdf Particularly the challenges outlined around management and deployment tooling. For lack of anything better I'm calling these the 3 D's: - Decentralized - Distributed - Disconnected How can TripleO address any of these? For Decentralized, I'd like to see better separation between the planning and application of the deployment in TripleO. TripleO has had the concept of a plan for quite a while, and we've been using it very effectively for our deployment, but it is somewhat hidden from the operator. It's not entirely clear to the user that there is any separation between the plan and the stack, and what benefit there even is in the plan. I'd like to address some of that through API improvements around plan management and making the plan the top level thing being managed instead of a deployment. We're already moving in this direction with config-download and a lot of the changes we've made during Queens. For better or worse, some other tools like Terraform call this out as one their main differentiators: https://www.terraform.io/intro/vs/cloudformation.html (3rd paragraph). TripleO has long separated the planning and application phases. We just need to do a better job at developing useful features around that work. The UI has been taking advantage of it more than anything else at this point. I'd like to focus a bit more on what benefits we get from the plan, and how we can turn these into operator value. Imagine a scenario where you have a plan that has been deployed, and you want to make some changes. You upload a new plan, the plan is processed, we update a copy of the deployed stack (or perhaps ephemeral stack), run config-download, and the operator has the immediate feedback about what *would* be changed. Heat plays a role here in giving us a way to orchestrate the plan into a deployment model. Ansible also plays a role in that we could take things a step further and run with --check to provide further feedback before anything is ever applied or updated. Ongoing work around new baremetal management workflows via metalsmith will give us more insight into planning the baremetal deployment. These tools (Heat/Ansible/Metalsmith/etc), they are technology choices. They are not architectures in and of themselves. You have centralized management of the planning phase, whose output could be a set of playbooks applied in a decentralized way, such as provided via an API and downloaded to a remote site where an operator is sitting in a emergency response scenario with some "hardware in a box" that they want to deploy local compute/storage resources on to, and connect to a local network. Connectivity back to the centralized platform may or may not be required depending on what services are deployed. For Distributed, I think of git. We have built-in git management of the config-download output. We are discussing (further) git management of the templates and processed plan. This gives operators some ability to manage the output in a distributive fashion, and make new changes outside of the centralized platform. Perhaps in the future, we could offer an API/interface around pulling any changes back into the represented plan based on what an operator had changed. Sort of like a pull request for the plan, but by starting with the output. Obviously, this needs a lot more definition and refining other than just "use git". Again, these efforts are about experimenting with use cases, not technology choices. To get us to those experiments quickly, it may look like we are making rash decisions about use X or Y, but that's not the driver here. For Disconnected, it also ties into how we'd address decentralized and distributed. The choice of tooling helps, but it's not as simple as "use Ansible". Part of the reason we are looking at this POC, and how to deploy it easily is to investigate questions such as what happens to the deployed workloads if the compute loses connectivity to the control plane or management platform. We want to make sure TripleO can deploy something that can handle these sorts of scenarios. During periods of disconnection at the edge or other remote sites, operators may still need to make changes (see points about distributed above). Using the standalone deployment can help us quickly answer these questions and develop a "Steel Thread"[1] to build upon. Ultimately, this is the sort of high level designs and architectures we are beginning to investigate. We are trying to let the use cases and operator need address the design, even while the use cases are still being better understood (see above whitepaper). It's not about "just use Ansible" or "rewrite the API". [1] http://www.agiledevelopment.org/agile-talk/111-defining-acceptance-criteria-using-the-steel-thread-concept -- -- James Slagle -- __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev