Am Wed, 28 Jun 2017 11:34:42 +0200 schrieb Sahid Orentino Ferdjaoui <sferd...@redhat.com>:
> On Tue, Jun 27, 2017 at 04:00:35PM +0200, Henning Schild wrote: > > Am Tue, 27 Jun 2017 09:44:22 +0200 > > schrieb Sahid Orentino Ferdjaoui <sferd...@redhat.com>: > > > > > On Mon, Jun 26, 2017 at 10:19:12AM +0200, Henning Schild wrote: > > > > Am Sun, 25 Jun 2017 10:09:10 +0200 > > > > schrieb Sahid Orentino Ferdjaoui <sferd...@redhat.com>: > > > > > > > > > On Fri, Jun 23, 2017 at 10:34:26AM -0600, Chris Friesen > > > > > wrote: > > > > > > On 06/23/2017 09:35 AM, Henning Schild wrote: > > > > > > > Am Fri, 23 Jun 2017 11:11:10 +0200 > > > > > > > schrieb Sahid Orentino Ferdjaoui > > > > > > > <sferd...@redhat.com>: > > > > > > > > > > > > > > In Linux RT context, and as you mentioned, the non-RT > > > > > > > > vCPU can acquire some guest kernel lock, then be > > > > > > > > pre-empted by emulator thread while holding this lock. > > > > > > > > This situation blocks RT vCPUs from doing its work. So > > > > > > > > that is why we have implemented [2]. For DPDK I don't > > > > > > > > think we have such problems because it's running in > > > > > > > > userland. > > > > > > > > > > > > > > > > So for DPDK context I think we could have a mask like we > > > > > > > > have for RT and basically considering vCPU0 to handle > > > > > > > > best effort works (emulator threads, SSH...). I think > > > > > > > > it's the current pattern used by DPDK users. > > > > > > > > > > > > > > DPDK is just a library and one can imagine an application > > > > > > > that has cross-core communication/synchronisation needs > > > > > > > where the emulator slowing down vpu0 will also slow down > > > > > > > vcpu1. You DPDK application would have to know which of > > > > > > > its cores did not get a full pcpu. > > > > > > > > > > > > > > I am not sure what the DPDK-example is doing in this > > > > > > > discussion, would that not just be cpu_policy=dedicated? I > > > > > > > guess normal behaviour of dedicated is that emulators and > > > > > > > io happily share pCPUs with vCPUs and you are looking for > > > > > > > a way to restrict emulators/io to a subset of pCPUs > > > > > > > because you can live with some of them beeing not > > > > > > > 100%. > > > > > > > > > > > > Yes. A typical DPDK-using VM might look something like > > > > > > this: > > > > > > > > > > > > vCPU0: non-realtime, housekeeping and I/O, handles all > > > > > > virtual interrupts and "normal" linux stuff, emulator runs > > > > > > on same pCPU vCPU1: realtime, runs in tight loop in > > > > > > userspace processing packets vCPU2: realtime, runs in tight > > > > > > loop in userspace processing packets vCPU3: realtime, runs > > > > > > in tight loop in userspace processing packets > > > > > > > > > > > > In this context, vCPUs 1-3 don't really ever enter the > > > > > > kernel, and we've offloaded as much kernel work as possible > > > > > > from them onto vCPU0. This works pretty well with the > > > > > > current system. > > > > > > > > For RT we have to isolate the emulator threads to an > > > > > > > > additional pCPU per guests or as your are suggesting to > > > > > > > > a set of pCPUs for all the guests running. > > > > > > > > > > > > > > > > I think we should introduce a new option: > > > > > > > > > > > > > > > > - hw:cpu_emulator_threads_mask=^1 > > > > > > > > > > > > > > > > If on 'nova.conf' - that mask will be applied to the > > > > > > > > set of all host CPUs (vcpu_pin_set) to basically pack > > > > > > > > the emulator threads of all VMs running here (useful > > > > > > > > for RT context). > > > > > > > > > > > > > > That would allow modelling exactly what we need. > > > > > > > In nova.conf we are talking absolute known values, no need > > > > > > > for a mask and a set is much easier to read. Also using > > > > > > > the same name does not sound like a good idea. > > > > > > > And the name vcpu_pin_set clearly suggest what kind of > > > > > > > load runs here, if using a mask it should be called > > > > > > > pin_set. > > > > > > > > > > > > I agree with Henning. > > > > > > > > > > > > In nova.conf we should just use a set, something like > > > > > > "rt_emulator_vcpu_pin_set" which would be used for running > > > > > > the emulator/io threads of *only* realtime instances. > > > > > > > > > > I'm not agree with you, we have a set of pCPUs and we want to > > > > > substract some of them for the emulator threads. We need a > > > > > mask. The only set we need is to selection which pCPUs Nova > > > > > can use (vcpus_pin_set). > > > > > > > > At that point it does not really matter whether it is a set or a > > > > mask. They can both express the same where a set is easier to > > > > read/configure. With the same argument you could say that > > > > vcpu_pin_set should be a mask over the hosts pcpus. > > > > > > > > As i said before: vcpu_pin_set should be renamed because all > > > > sorts of threads are put here (pcpu_pin_set?). But that would > > > > be a bigger change and should be discussed as a seperate issue. > > > > > > > > So far we talked about a compute-node for realtime only doing > > > > realtime. In that case vcpu_pin_set + emulator_io_mask would > > > > work. If you want to run regular VMs on the same host, you can > > > > run a second nova, like we do. > > > > > > > > We could also use vcpu_pin_set + rt_vcpu_pin_set(/mask). I think > > > > that would allow modelling all cases in just one nova. Having > > > > all in one nova, you could potentially repurpose rt cpus to > > > > best-effort and back. Some day in the future ... > > > > > > That is not something we should allow or at least > > > advertise. compute-node can't run both RT and non-RT guests and > > > that because the nodes should have a kernel RT. We can't > > > guarantee RT if both are on same nodes. > > > > An RT capable kernel can run best-effort applications just fine, so > > you can run regular and RT VMs on such a host. At the moment we use > > two novas on one host, but are still having trouble to configure > > that for mitaka. > > Sure RT kernel can run non-RT VMs but you also have to configure the > host, route the device interrupts on CPUs which do not take part of > the one for RT, change the rcuc kernel threads priority, exclude the > isolated CPUs from the writeback workqueue... and a bunch of other > things where Nova does not have the scheduling granularity to take > that into account. Exactly, that is complex and all pCPUs that you configured like that go into your vcpu_pin_set_rt or into vcpu_pin_set of your rt-nova. All the other pCPUs can be given to another nova or both sets can be configured in one. I would like to see both in one. In that case you could even imagine doing all the tuning on-demand and being dynamic with the sets some day. > So even if it is possible to spawn non-RT VMs I don't think we want > to support such scenario. I think we need to support a scenario where one machine hosts both kinds of VMs. When thinking of rt-OpenStack deployments you should not just think big but also small. A handfull or even less compute nodes. Realtime means my compute is physically close to a physical process i need to control. So not your big datacentre that is far away from everywhere. But smallish Compute-Racks distributed all over the place. > > As far as i remember it was not straight forward to get two novas > > onto one host in the older release, i am not surprised that causing > > trouble with the update to mitaka. If we agree on 2 novas and > > aggregates as the recommended way we should make sure the 2 novas > > is a supported feature, covered in test-cases and documented. > > Dedicating a whole machine to either RT or nonRT would imho be no > > viable option. > > > > > The realtime nodes should be isolated by aggregates as you seem > > > to do. > > > > Yes, with two novas on one machine. They share one libvirt using > > different instrance-prefixes and have some other config options > > set, so they do not collide on resources. > > It's clearly not what I was suggesting, you should have 2 groups of > compute hosts. One aggregate with hosts for the non-RT VMs and an > other one for hosts with RT VMs. Ok, but that is what people currently need to do if they have one machine hosting both kinds of VMs. Which - i have to stress it again - is an important use-case. Henning > > > > > > We may also want to have "rt_emulator_overcommit_ratio" to > > > > > > control how many threads/instances we allow per pCPU. > > > > > > > > > > Not really sure to have understand this point? If it is to > > > > > indicate that for a pCPU isolated we want X guest emulator > > > > > threads, the same behavior is achieved by the mask. A host for > > > > > realtime is dedicated for realtime, no overcommitment and the > > > > > operators know the number of host CPUs, they can easily > > > > > deduct a ratio and so the corresponding mask. > > > > > > > > Agreed. > > > > > > > > > > > > If on flavor extra-specs It will be applied to the vCPUs > > > > > > > > dedicated for the guest (useful for DPDK context). > > > > > > > > > > > > > > And if both are present the flavor wins and nova.conf is > > > > > > > ignored? > > > > > > > > > > > > In the flavor I'd like to see it be a full bitmask, not an > > > > > > exclusion mask with an implicit full set. Thus the end-user > > > > > > could specify "hw:cpu_emulator_threads_mask=0" and get the > > > > > > emulator threads to run alongside vCPU0. > > > > > > > > > > Same here, I'm not agree, the only set is the vCPUs of the > > > > > guest. Then we want a mask to substract some of them. > > > > > > > > The current mask is fine, but using the same name in nova.conf > > > > and in the flavor does not seem like a good idea. > > > > > > I do not see any problem with that, only operators are going to > > > set this option on nova.conf or flavor extra-specs. > > > > > > I think we are agree on the general aspect. I'm going to update > > > the current spec for Q and see whether we could make it. > > > > Cool. In the meantime we are working on an implementation as patch > > on mitaka. Lets see if we hit unexpected cases we did not yet > > consider. > > > > Henning > > > > > s. > > > > > > > Henning > > > > > > > > > > Henning, there is no conflict, the nova.conf setting and the > > > > > > flavor setting are used for two different things. > > > > > > > > > > > > Chris > > > > > > > > > > > > __________________________________________________________________________ > > > > > > OpenStack Development Mailing List (not for usage questions) > > > > > > Unsubscribe: > > > > > > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > > > > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > > > > > > > > > > > > __________________________________________________________________________ > > > > > OpenStack Development Mailing List (not for usage questions) > > > > > Unsubscribe: > > > > > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > > > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev