The full etherpad for cells discussions at the PTG is here [1].

We mostly talked about the limitations with multiple cells identified in Pike [2] and priorities.

Top priorities for cells in Queens
----------------------------------

* Alternate hosts: with multiple cells in a tiered (super) conductor mode, we don't have reschedules happening when a server build fails on a compute. Ed Leafe has already started working on the code to build an object to pass from the scheduler to the super conductor. We'll then send that from the super conductor down to the compute service in the cell and then reschedules can happen within a cell using that provided list of alternative hosts (and pre-determined allocation requests for Placement provided by the scheduler). We agreed that we should get this done early in Queens so that we have ample time to flush out and fix bugs.

* Instance listing across multiple cells: this is going to involve sorting the instance lists we get back from multiple cells, which today are filtered/sorted in each cell and then returned out of the API in a "barber pole" pattern. We are not going to use Searchlight for this, but instead do it with more efficient cross-cell DB queries. Dan Smith is going to work on this.

Dealing with up-calls
---------------------

In a multi-cell or tiered (super) conductor mode, the cell conductor and compute services cannot reach the top-level database or message queue. This breaks a few existing things today.

* Instance affinity reporting from the computes to the scheduler won't work without the MQ up-call. There is also a check that happens late in the build process on the compute which checks to see if server group affinity/anti-affinity policies are maintained which is an up-call to the API database. Both of these will be solved long-term when we model distance in Placement, but we are deferring that from Queens. The late affinity check in the compute is not an issue if you're running a single cell (not using a tiered super conductor mode deployment) and if you're running multiple cells, you can configure the cell conductors to have access to the API database as a workaround. We wouldn't test with this workaround in CI, but it's an option for people that need it.

* There is a host aggregate up-call when performing live migration with the xen driver and you're letting the driver determine if block migration should be used. We decided to just put a note in the code that this doesn't work and leave it as a limitation for that driver and scenario, which xen driver maintainers or users can fix if they want, but we aren't going to make it a priority.

* There is a host aggregate up-call when doing boot from volume and the compute service creates the volume, it checks to see if the instance AZ and volume AZ match when [cinder]/cross_az_attach is False (not the default). Checking the AZ for the instance involves getting the host aggregates that the instance is in, and those are in the API database. We agreed that for now, people running multiple cells and using this cross_az_attach=False setting can configure the cell conductor to reach the API database, like the late affinity check described above. Sylvain Bauza is also looking at reasons why we even do this check if the user did not request a specific AZ, so there could be other general changes in the design for this cross_az_check later. That is being discussed here [3].

Other discussion
----------------

* We have a utility to concurrently run database queries against multiple cells. We are going to look to see if we can retrofit some linear paths of the code with this utility to improve performance.

* Making the consoleauth service run per-cell is going to be low priority until some large cells v2 deployments start showing up and saying that a global consoleauth service is not scaling and it needs to be fixed.

* We talked about using the "GET /usages" Placement API for counting quotas rather than iterating that information from the cells, but there are quite a few open questions about design and edge cases like move operations and Ironic with custom resource classes. So while this is something that should make counting quotas perform better, it's complicated and not a priority for Queens.

* Finally, we also talked about the future of cells v1 and when we can officially deprecate and remove it. We've already been putting warnings in the code, docs and config options for a long time about not using cells v1 and it being replaced with cells v2. *We agreed that if we can get efficient multi-cell instance listing fixed in Queens, we'll remove both cells v1 and nova-network in Rocky.* We've been asking that large cells v1 deployments start checking out cells v2 and what issues they run into with the transition, at least since the Boston Pike summit, and so far we haven't gotten any feedback, so we're hoping this timeline will spur some movement on that front. Dan Smith also called dibs on the code removal.

[1] https://etherpad.openstack.org/p/nova-ptg-queens-cells
[2] https://docs.openstack.org/nova/latest/user/cellsv2_layout.html#caveats-of-a-multi-cell-deployment [3] http://lists.openstack.org/pipermail/openstack-operators/2017-September/014200.html

--

Thanks,

Matt

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Reply via email to