On Thu, Jun 21, 2018 at 09:36:58AM -0400, Jay Pipes wrote: > On 06/18/2018 10:16 AM, Artom Lifshitz wrote: > > Hey all, > > > > For Rocky I'm trying to get live migration to work properly for > > instances that have a NUMA topology [1]. > > > > A question that came up on one of patches [2] is how to handle > > resources claims on the destination, or indeed whether to handle that > > at all. > > > > The previous attempt's approach [3] (call it A) was to use the > > resource tracker. This is race-free and the "correct" way to do it, > > but the code is pretty opaque and not easily reviewable, as evidenced > > by [3] sitting in review purgatory for literally years. > > > > A simpler approach (call it B) is to ignore resource claims entirely > > for now and wait for NUMA in placement to land in order to handle it > > that way. This is obviously race-prone and not the "correct" way of > > doing it, but the code would be relatively easy to review. > > > > For the longest time, live migration did not keep track of resources > > (until it started updating placement allocations). The message to > > operators was essentially "we're giving you this massive hammer, don't > > break your fingers." Continuing to ignore resource claims for now is > > just maintaining the status quo. In addition, there is value in > > improving NUMA live migration *now*, even if the improvement is > > incomplete because it's missing resource claims. "Best is the enemy of > > good" and all that. Finally, making use of the resource tracker is > > just work that we know will get thrown out once we start using > > placement for NUMA resources. > > > > For all those reasons, I would favor approach B, but I wanted to ask > > the community for their thoughts. > > Side question... does either approach touch PCI device management during > live migration? > > I ask because the only workloads I've ever seen that pin guest vCPU threads > to specific host processors -- or make use of huge pages consumed from a > specific host NUMA node -- have also made use of SR-IOV and/or PCI > passthrough. [1]
Not really. There are lot of virtual switches that we do support like OVS-DPDK, Contrail Virtual Router... that support vhostuser interfaces which is one use-case. (We do support live-migration of vhostuser interface) > If workloads that use PCI passthrough or SR-IOV VFs cannot be live migrated > (due to existing complications in the lower-level virt layers) I don't see > much of a point spending lots of developer resources trying to "fix" this > situation when in the real world, only a mythical workload that uses CPU > pinning or huge pages but *doesn't* use PCI passthrough or SR-IOV VFs would > be helped by it. > > Best, > -jay > > [1 I know I'm only one person, but every workload I've seen that requires > pinned CPUs and/or huge pages is a VNF that has been essentially an ASIC > that a telco OEM/vendor has converted into software and requires the same > guarantees that the ASIC and custom hardware gave the original > hardware-based workload. These VNFs, every single one of them, used either > PCI passthrough or SR-IOV VFs to handle latency-sensitive network I/O. > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
