[dpdk-dev] [PATCH v4 00/10] VM Power Management
On 12/12/2014 17:10, Thomas Monjalon wrote: > > Ok, this looks specific enough that an out-of-band solution within DPDK > > sounds like the best approach. It seems unnecessary to involve the > > hypervisor (neither KVM nor QEMU). > > Paolo, I don't understand why you don't imagine controlling frequency scaling > of a pinned vCPU transparently? Probably because I don't imagine controlling frequency scaling from the application on bare metal, either. :) It seems to me that this is just working around limitations of the kernel. Paolo > In my understanding, we currently cannot control frequency scaling without > knowing wether we are in a VM or not.
[dpdk-dev] [PATCH v4 00/10] VM Power Management
2014-12-12 15:50, Paolo Bonzini: > On 12/12/2014 14:00, Carew, Alan wrote: > > The problem is deterministic control of host CPU frequency and the DPDK > > usage > > model. > > A hands-off power governor will scale based on workload, whether this is a > > host > > application or VM, so no problems or bug there. > > > > Where this solution fits is where an application wants to control its own > > power policy, for example l3fwd_power uses librte_power library to change > > frequency via apci_cpufreq based on application heuristics rather than > > relying on an inbuilt policy for example ondemand or performance. > > > > This ability has existed in DPDK for host usage for some time and VM power > > management allows this use case to be extended to cater for virtual machines > > by re-using the librte_power interface to encapsulate the VM->Host > > comms and provide an example means of managing such communications. > > > > I hope this clears it up a bit. > > Ok, this looks specific enough that an out-of-band solution within DPDK > sounds like the best approach. It seems unnecessary to involve the > hypervisor (neither KVM nor QEMU). Paolo, I don't understand why you don't imagine controlling frequency scaling of a pinned vCPU transparently? In my understanding, we currently cannot control frequency scaling without knowing wether we are in a VM or not. -- Thomas
[dpdk-dev] [PATCH v4 00/10] VM Power Management
On 12/12/2014 14:00, Carew, Alan wrote: > The problem is deterministic control of host CPU frequency and the DPDK usage > model. > A hands-off power governor will scale based on workload, whether this is a > host > application or VM, so no problems or bug there. > > Where this solution fits is where an application wants to control its own > power policy, for example l3fwd_power uses librte_power library to change > frequency via apci_cpufreq based on application heuristics rather than > relying on an inbuilt policy for example ondemand or performance. > > This ability has existed in DPDK for host usage for some time and VM power > management allows this use case to be extended to cater for virtual machines > by re-using the librte_power interface to encapsulate the VM->Host > comms and provide an example means of managing such communications. > > I hope this clears it up a bit. Ok, this looks specific enough that an out-of-band solution within DPDK sounds like the best approach. It seems unnecessary to involve the hypervisor (neither KVM nor QEMU). Paolo
[dpdk-dev] [PATCH v4 00/10] VM Power Management
Hi Paolo, > 2014-12-09 18:35, Paolo Bonzini: > > Did you make any progress in Qemu/KVM community? > > We need to be sync'ed up with them to be sure we share the same > goal. > > I want also to avoid using a solution which doesn't fit with > > their plan. > > Remember that we already had this problem with ivshmem which > was > > planned to be dropped. > > >>> > > >>> Unfortunately, I have not yet received any feedback: > > >>> http://lists.nongnu.org/archive/html/qemu-devel/2014- > 11/msg01103.h > > >>> tml > > >> > > >> Just to add to what Alan said above, this capability does not exist > > >> in qemu at the moment, and based on there having been no feedback > > >> on th qemu mailing list so far, I think it's reasonable to assume > > >> that it will not be implemented in the immediate future. The VM > > >> Power Management feature has also been designed to allow easy > > >> migration to a qemu-based solution when this is supported in > > >> future. Therefore, I'd be in favour of accepting this feature into DPDK > now. > > >> > > >> It's true that the implementation is a work-around, but there have > > >> been similar cases in DPDK in the past. One recent example that > > >> comes to mind is userspace vhost. The original implementation could > > >> also be considered a work-around, but it met the needs of many in > > >> the community. Now, with support for vhost-user in qemu 2.1, that > > >> implementation is being improved. I'd see VM Power Management > > >> following a similar path when this capability is supported in qemu. > > > > I wonder if this might be papering over a bug in the host cpufreq > > driver. If the guest is not doing much and leaving a lot of idle CPU > > time, the host should scale down the frequency of that CPU. In the > > case of pinned VCPUs this should really "just work". What is the > > problem that is being solved? > > > > Paolo > > Alan, Pablo, please could you explain your logic with VM power > management? > > -- > Thomas The problem is deterministic control of host CPU frequency and the DPDK usage model. A hands-off power governor will scale based on workload, whether this is a host application or VM, so no problems or bug there. Where this solution fits is where an application wants to control its own power policy, for example l3fwd_power uses librte_power library to change frequency via apci_cpufreq based on application heuristics rather than relying on an inbuilt policy for example ondemand or performance. This ability has existed in DPDK for host usage for some time and VM power management allows this use case to be extended to cater for virtual machines by re-using the librte_power interface to encapsulate the VM->Host comms and provide an example means of managing such communications. I hope this clears it up a bit. Thanks, Alan.
[dpdk-dev] [PATCH v4 00/10] VM Power Management
2014-12-09 18:35, Paolo Bonzini: > Did you make any progress in Qemu/KVM community? > We need to be sync'ed up with them to be sure we share the same goal. > I want also to avoid using a solution which doesn't fit with their > plan. > Remember that we already had this problem with ivshmem which was > planned to be dropped. > >>> > >>> Unfortunately, I have not yet received any feedback: > >>> http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01103.html > >> > >> Just to add to what Alan said above, this capability does not exist in > >> qemu at the moment, and based on there having been no feedback on th > >> qemu mailing list so far, I think it's reasonable to assume that it > >> will not be implemented in the immediate future. The VM Power > >> Management feature has also been designed to allow easy migration to a > >> qemu-based solution when this is supported in future. Therefore, I'd > >> be in favour of accepting this feature into DPDK now. > >> > >> It's true that the implementation is a work-around, but there have > >> been similar cases in DPDK in the past. One recent example that comes > >> to mind is userspace vhost. The original implementation could also be > >> considered a work-around, but it met the needs of many in the > >> community. Now, with support for vhost-user in qemu 2.1, that > >> implementation is being improved. I'd see VM Power Management > >> following a similar path when this capability is supported in qemu. > > I wonder if this might be papering over a bug in the host cpufreq > driver. If the guest is not doing much and leaving a lot of idle CPU > time, the host should scale down the frequency of that CPU. In the case > of pinned VCPUs this should really "just work". What is the problem > that is being solved? > > Paolo Alan, Pablo, please could you explain your logic with VM power management? -- Thomas
[dpdk-dev] [PATCH v4 00/10] VM Power Management
I had replied to this message, but my reply never got to the list. Let's try again. I wonder if this might be papering over a bug in the host cpufreq driver. If the guest is not doing much and leaving a lot of idle CPU time, the host should scale down the frequency of that CPU. In the case of pinned VCPUs this should really "just work". What is the problem that is being solved? Paolo On 22/11/2014 18:17, Vincent JARDIN wrote: > Tim, > > cc-ing Paolo and qemu-devel@ again in order to get their take on it. > Did you make any progress in Qemu/KVM community? We need to be sync'ed up with them to be sure we share the same goal. I want also to avoid using a solution which doesn't fit with their plan. Remember that we already had this problem with ivshmem which was planned to be dropped. > >>> Unfortunately, I have not yet received any feedback: >>> http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01103.html >> >> Just to add to what Alan said above, this capability does not exist in >> qemu at the moment, and based on there having been no feedback on the >> qemu mailing list so far, I think it's reasonable to assume that it >> will not be implemented in the immediate future. The VM Power >> Management feature has also been designed to allow easy migration to a >> qemu-based solution when this is supported in future. Therefore, I'd >> be in favour of accepting this feature into DPDK now. >> >> It's true that the implementation is a work-around, but there have >> been similar cases in DPDK in the past. One recent example that comes >> to mind is userspace vhost. The original implementation could also be >> considered a work-around, but it met the needs of many in the >> community. Now, with support for vhost-user in qemu 2.1, that >> implementation is being improved. I'd see VM Power Management >> following a similar path when this capability is supported in qemu. > > Best regards, > Vincent >
[dpdk-dev] [PATCH v4 00/10] VM Power Management
Tim, cc-ing Paolo and qemu-devel@ again in order to get their take on it. >>> Did you make any progress in Qemu/KVM community? >>> We need to be sync'ed up with them to be sure we share the same goal. >>> I want also to avoid using a solution which doesn't fit with their plan. >>> Remember that we already had this problem with ivshmem which was >>> planned to be dropped. >>> >> Unfortunately, I have not yet received any feedback: >> http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01103.html > > Just to add to what Alan said above, this capability does not exist in qemu > at the moment, and based on there having been no feedback on the qemu mailing > list so far, I think it's reasonable to assume that it will not be > implemented in the immediate future. The VM Power Management feature has also > been designed to allow easy migration to a qemu-based solution when this is > supported in future. Therefore, I'd be in favour of accepting this feature > into DPDK now. > > It's true that the implementation is a work-around, but there have been > similar cases in DPDK in the past. One recent example that comes to mind is > userspace vhost. The original implementation could also be considered a > work-around, but it met the needs of many in the community. Now, with support > for vhost-user in qemu 2.1, that implementation is being improved. I'd see VM > Power Management following a similar path when this capability is supported > in qemu. Best regards, Vincent
[dpdk-dev] [PATCH v4 00/10] VM Power Management
Pablo just sent a new patch set. This is a significant effort and it addressed a valid technical problem statement. I express my support to this feature into the DPDK mainline. IMHO, the previous *rejection* reason are not solid. It is important to encourage the real contribution like this. -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of O'driscoll, Tim Sent: Monday, November 10, 2014 10:54 AM To: Carew, Alan; Thomas Monjalon Cc: dev at dpdk.org Subject: Re: [dpdk-dev] [PATCH v4 00/10] VM Power Management > From: Carew, Alan > > > Did you make any progress in Qemu/KVM community? > > We need to be sync'ed up with them to be sure we share the same goal. > > I want also to avoid using a solution which doesn't fit with their plan. > > Remember that we already had this problem with ivshmem which was > > planned to be dropped. > > . . . > > Unfortunately, I have not yet received any feedback: > http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01103.html Just to add to what Alan said above, this capability does not exist in qemu at the moment, and based on there having been no feedback on the qemu mailing list so far, I think it's reasonable to assume that it will not be implemented in the immediate future. The VM Power Management feature has also been designed to allow easy migration to a qemu-based solution when this is supported in future. Therefore, I'd be in favour of accepting this feature into DPDK now. It's true that the implementation is a work-around, but there have been similar cases in DPDK in the past. One recent example that comes to mind is userspace vhost. The original implementation could also be considered a work-around, but it met the needs of many in the community. Now, with support for vhost-user in qemu 2.1, that implementation is being improved. I'd see VM Power Management following a similar path when this capability is supported in qemu. Tim
[dpdk-dev] [PATCH v4 00/10] VM Power Management
> From: Carew, Alan > > > Did you make any progress in Qemu/KVM community? > > We need to be sync'ed up with them to be sure we share the same goal. > > I want also to avoid using a solution which doesn't fit with their plan. > > Remember that we already had this problem with ivshmem which was > > planned to be dropped. > > . . . > > Unfortunately, I have not yet received any feedback: > http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01103.html Just to add to what Alan said above, this capability does not exist in qemu at the moment, and based on there having been no feedback on the qemu mailing list so far, I think it's reasonable to assume that it will not be implemented in the immediate future. The VM Power Management feature has also been designed to allow easy migration to a qemu-based solution when this is supported in future. Therefore, I'd be in favour of accepting this feature into DPDK now. It's true that the implementation is a work-around, but there have been similar cases in DPDK in the past. One recent example that comes to mind is userspace vhost. The original implementation could also be considered a work-around, but it met the needs of many in the community. Now, with support for vhost-user in qemu 2.1, that implementation is being improved. I'd see VM Power Management following a similar path when this capability is supported in qemu. Tim
[dpdk-dev] [PATCH v4 00/10] VM Power Management
Hi Thomas, > Hi Alan, > > Did you make any progress in Qemu/KVM community? > We need to be sync'ed up with them to be sure we share the same goal. > I want also to avoid using a solution which doesn't fit with their plan. > Remember that we already had this problem with ivshmem which was > planned to be dropped. > > Thanks > -- > Thomas > > > 2014-10-16 15:21, Carew, Alan: > > Hi Thomas, > > > > > > However with a DPDK solution it would be possible to re-use the > message bus > > > > to pass information like device stats, application state, D-state > > > > requests > > > > etc. to the host and allow for management layer(e.g. OpenStack) to > make > > > > informed decisions. > > > > > > I think that management informations should be transmitted in a > management > > > channel. Such solution should exist in OpenStack. > > > > Perhaps it does, but this solution is not exclusive to OpenStack and just a > potential use case. > > > > > > > > > Also, the scope of adding power management to qemu/KVM would be > huge; > > > > while the easier path is not always the best and the problem of power > > > > management in VMs is both a DPDK problem (given that librte_power > only > > > > worked on the host) and a general virtualization problem that would be > > > > better solved by those with direct knowledge of Qemu/KVM > architecture > > > > and influence on the direction of the Qemu project. > > > > > > Being a huge effort is not an argument. > > > > I agree completely and was implied by what followed the conjunction. > > > > > Please check with Qemu community, they'll welcome it. > > > > > > > As it stands, the host backend is simply an example application that can > > > > be replaced by a VMM or Orchestration layer, by using Virtio-Serial it > has > > > > obvious leanings to Qemu, but even this could be easily swapped out > for > > > > XenBus, IVSHMEM, IP etc. > > > > > > > > If power management is to be eventually supported by Hypervisors > directly > > > > then we could also enable to option to switch to that environment, > currently > > > > the librte_power implementations (VM or Host) can be selected > dynamically > > > > (environment auto-detection) or explicitly via rte_power_set_env(), > adding > > > > an arbitrary number of environments is relatively easy. > > > > > > Yes, you are adding a new layer to workaround hypervisor lacks. And this > layer > > > will handle native support when it will exist. But if you implement native > > > support now, we don't need this extra layer. > > > > Indeed, but we have a solution implemented now and yes it is a > workaround, that is until Hypervisors support such functionality. It is > possible > that whatever solutions for power management present themselves in the > future may require workarounds also, us-vhost is an example of such a > workaround introduced to DPDK. > > > > > > > > > I hope this helps to clarify the approach. > > > > > > Thanks for your explanation. > > > > Thanks for the feedback. > > > > > > > > -- > > > Thomas > > > > Alan. Unfortunately, I have not yet received any feedback: http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01103.html Alan.
[dpdk-dev] [PATCH v4 00/10] VM Power Management
Hi Alan, Did you make any progress in Qemu/KVM community? We need to be sync'ed up with them to be sure we share the same goal. I want also to avoid using a solution which doesn't fit with their plan. Remember that we already had this problem with ivshmem which was planned to be dropped. Thanks -- Thomas 2014-10-16 15:21, Carew, Alan: > Hi Thomas, > > > > However with a DPDK solution it would be possible to re-use the message > > > bus > > > to pass information like device stats, application state, D-state requests > > > etc. to the host and allow for management layer(e.g. OpenStack) to make > > > informed decisions. > > > > I think that management informations should be transmitted in a management > > channel. Such solution should exist in OpenStack. > > Perhaps it does, but this solution is not exclusive to OpenStack and just a > potential use case. > > > > > > Also, the scope of adding power management to qemu/KVM would be huge; > > > while the easier path is not always the best and the problem of power > > > management in VMs is both a DPDK problem (given that librte_power only > > > worked on the host) and a general virtualization problem that would be > > > better solved by those with direct knowledge of Qemu/KVM architecture > > > and influence on the direction of the Qemu project. > > > > Being a huge effort is not an argument. > > I agree completely and was implied by what followed the conjunction. > > > Please check with Qemu community, they'll welcome it. > > > > > As it stands, the host backend is simply an example application that can > > > be replaced by a VMM or Orchestration layer, by using Virtio-Serial it has > > > obvious leanings to Qemu, but even this could be easily swapped out for > > > XenBus, IVSHMEM, IP etc. > > > > > > If power management is to be eventually supported by Hypervisors directly > > > then we could also enable to option to switch to that environment, > > > currently > > > the librte_power implementations (VM or Host) can be selected dynamically > > > (environment auto-detection) or explicitly via rte_power_set_env(), adding > > > an arbitrary number of environments is relatively easy. > > > > Yes, you are adding a new layer to workaround hypervisor lacks. And this > > layer > > will handle native support when it will exist. But if you implement native > > support now, we don't need this extra layer. > > Indeed, but we have a solution implemented now and yes it is a workaround, > that is until Hypervisors support such functionality. It is possible that > whatever solutions for power management present themselves in the future may > require workarounds also, us-vhost is an example of such a workaround > introduced to DPDK. > > > > > > I hope this helps to clarify the approach. > > > > Thanks for your explanation. > > Thanks for the feedback. > > > > > -- > > Thomas > > Alan.
[dpdk-dev] [PATCH v4 00/10] VM Power Management
Hi Thomas, > > However with a DPDK solution it would be possible to re-use the message bus > > to pass information like device stats, application state, D-state requests > > etc. to the host and allow for management layer(e.g. OpenStack) to make > > informed decisions. > > I think that management informations should be transmitted in a management > channel. Such solution should exist in OpenStack. Perhaps it does, but this solution is not exclusive to OpenStack and just a potential use case. > > > Also, the scope of adding power management to qemu/KVM would be huge; > > while the easier path is not always the best and the problem of power > > management in VMs is both a DPDK problem (given that librte_power only > > worked on the host) and a general virtualization problem that would be > > better solved by those with direct knowledge of Qemu/KVM architecture > > and influence on the direction of the Qemu project. > > Being a huge effort is not an argument. I agree completely and was implied by what followed the conjunction. > Please check with Qemu community, they'll welcome it. > > > As it stands, the host backend is simply an example application that can > > be replaced by a VMM or Orchestration layer, by using Virtio-Serial it has > > obvious leanings to Qemu, but even this could be easily swapped out for > > XenBus, IVSHMEM, IP etc. > > > > If power management is to be eventually supported by Hypervisors directly > > then we could also enable to option to switch to that environment, currently > > the librte_power implementations (VM or Host) can be selected dynamically > > (environment auto-detection) or explicitly via rte_power_set_env(), adding > > an arbitrary number of environments is relatively easy. > > Yes, you are adding a new layer to workaround hypervisor lacks. And this layer > will handle native support when it will exist. But if you implement native > support now, we don't need this extra layer. Indeed, but we have a solution implemented now and yes it is a workaround, that is until Hypervisors support such functionality. It is possible that whatever solutions for power management present themselves in the future may require workarounds also, us-vhost is an example of such a workaround introduced to DPDK. > > > I hope this helps to clarify the approach. > > Thanks for your explanation. Thanks for the feedback. > > -- > Thomas Alan.
[dpdk-dev] [PATCH v4 00/10] VM Power Management
2014-10-14 12:37, Carew, Alan: > > > The following patches add two DPDK sample applications and an alternate > > > implementation of librte_power for use in virtualized environments. > > > The idea is to provide librte_power functionality from within a VM to > > > address > > > the lack of MSRs to facilitate frequency changes from within a VM. > > > It is ideally suited for Haswell which provides per core frequency > > > scaling. > > > > > > The current librte_power affects frequency changes via the acpi-cpufreq > > > 'userspace' power governor, accessed via sysfs. > > > > Something was preventing me from looking deeper in this big codebase, > > but I didn't know what sounds weird. > > Now I realize: the real problem is that virtualization transparency is > > broken for power management. So the right thing to do is to fix it in > > KVM. I think all this patchset is a huge workaround. > > > > Did you try to fix it with Qemu/KVM? > > When looking at the libvirt API it would seem to be a natural fit to have > power management sitting there, so in essence I would agree. > > However with a DPDK solution it would be possible to re-use the message bus > to pass information like device stats, application state, D-state requests > etc. to the host and allow for management layer(e.g. OpenStack) to make > informed decisions. I think that management informations should be transmitted in a management channel. Such solution should exist in OpenStack. > Also, the scope of adding power management to qemu/KVM would be huge; > while the easier path is not always the best and the problem of power > management in VMs is both a DPDK problem (given that librte_power only > worked on the host) and a general virtualization problem that would be > better solved by those with direct knowledge of Qemu/KVM architecture > and influence on the direction of the Qemu project. Being a huge effort is not an argument. Please check with Qemu community, they'll welcome it. > As it stands, the host backend is simply an example application that can > be replaced by a VMM or Orchestration layer, by using Virtio-Serial it has > obvious leanings to Qemu, but even this could be easily swapped out for > XenBus, IVSHMEM, IP etc. > > If power management is to be eventually supported by Hypervisors directly > then we could also enable to option to switch to that environment, currently > the librte_power implementations (VM or Host) can be selected dynamically > (environment auto-detection) or explicitly via rte_power_set_env(), adding > an arbitrary number of environments is relatively easy. Yes, you are adding a new layer to workaround hypervisor lacks. And this layer will handle native support when it will exist. But if you implement native support now, we don't need this extra layer. > I hope this helps to clarify the approach. Thanks for your explanation. -- Thomas
[dpdk-dev] [PATCH v4 00/10] VM Power Management
Hi Thomas, > -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Monday, October 13, 2014 9:26 PM > To: Carew, Alan > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 00/10] VM Power Management > > Hi Alan, > > 2014-10-12 20:36, Alan Carew: > > The following patches add two DPDK sample applications and an alternate > > implementation of librte_power for use in virtualized environments. > > The idea is to provide librte_power functionality from within a VM to > > address > > the lack of MSRs to facilitate frequency changes from within a VM. > > It is ideally suited for Haswell which provides per core frequency scaling. > > > > The current librte_power affects frequency changes via the acpi-cpufreq > > 'userspace' power governor, accessed via sysfs. > > Something was preventing me from looking deeper in this big codebase, > but I didn't know what sounds weird. > Now I realize: the real problem is that virtualization transparency is > broken for power management. So the right thing to do is to fix it in > KVM. I think all this patchset is a huge workaround. > > Did you try to fix it with Qemu/KVM? > > -- > Thomas When looking at the libvirt API it would seem to be a natural fit to have power management sitting there, so in essence I would agree. However with a DPDK solution it would be possible to re-use the message bus to pass information like device stats, application state, D-state requests etc. to the host and allow for management layer(e.g. OpenStack) to make informed decisions. Also, the scope of adding power management to qemu/KVM would be huge; while the easier path is not always the best and the problem of power management in VMs is both a DPDK problem (given that librte_power only worked on the host) and a general virtualization problem that would be better solved by those with direct knowledge of Qemu/KVM architecture and influence on the direction of the Qemu project. As it stands, the host backend is simply an example application that can be replaced by a VMM or Orchestration layer, by using Virtio-Serial it has obvious leanings to Qemu, but even this could be easily swapped out for XenBus, IVSHMEM, IP etc. If power management is to be eventually supported by Hypervisors directly then we could also enable to option to switch to that environment, currently the librte_power implementations (VM or Host) can be selected dynamically(environment auto-detection) or explicitly via rte_power_set_env(), adding an arbitrary number of environments is relatively easy. I hope this helps to clarify the approach. Thanks, Alan.
[dpdk-dev] [PATCH v4 00/10] VM Power Management
Hi Alan, 2014-10-12 20:36, Alan Carew: > The following patches add two DPDK sample applications and an alternate > implementation of librte_power for use in virtualized environments. > The idea is to provide librte_power functionality from within a VM to address > the lack of MSRs to facilitate frequency changes from within a VM. > It is ideally suited for Haswell which provides per core frequency scaling. > > The current librte_power affects frequency changes via the acpi-cpufreq > 'userspace' power governor, accessed via sysfs. Something was preventing me from looking deeper in this big codebase, but I didn't know what sounds weird. Now I realize: the real problem is that virtualization transparency is broken for power management. So the right thing to do is to fix it in KVM. I think all this patchset is a huge workaround. Did you try to fix it with Qemu/KVM? -- Thomas
[dpdk-dev] [PATCH v4 00/10] VM Power Management
Patch name: VM Power Management Brief description: Verify VM power management in virtualized environments Test Flag: Tested-by Tester name:yong.liu at intel.com Test environment: OS: Fedora20 3.11.10-301.fc20.x86_64 GCC: gcc version 4.8.3 20140911 CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz NIC: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] Test Tool Chain information: Qemu: 1.6.1 libvirt: 1.1.3 Guest OS: Fedora20 3.11.10-301.fc20.x86_64 Guest GCC: gcc version 4.8.3 20140624 Commit ID: 72d3e7ad3183f42f8b9fb3bb1c12b3e1b39eef39 Detailed Testing information DPDK SW Configuration: Default x86_64-native-linuxapp-gcc configuration Test Result Summary:Total 7 cases, 7 passed, 0 failed Test Case - name: VM Power Management Channel Test Case - Description: Check vm power management communication channels can successfully connected Test Case -command / instruction: Create folder in system temporary filesystem for power monitor socket mkdir -p /tmp/powermonitor Configure VM XML and pin VCPUs to specified CPUs 5 Configure VM XML to set up virtio serial ports Run power-manager monitor in Host ./build/vm_power_mgr -c 0x3 -n 4 Startup VM and run guest_vm_power_mgr guest_vm_power_mgr -c 0x1f -n 4 -- -i Add vm in host and check vm_power_mgr can get frequency normally vmpower> add_vm vmpower> add_channels all vmpower> get_cpu_freq Check vcpu/cpu mapping can be detected normally vmpower> show_vm Test Case - expected test result: VM power management communication channels can sucessfully connected and host can get vm core information Test Case - name: VM Power Management Numa Test Case - Description: Check vm power management support manage cores in different sockets Test Case -command / instruction: Get core and socket information by cpu_layout ./tools/cpu_layout.py Configure VM XML to pin VCPUs on Socket1: Repeat Case1 Check vcpu/cpu mapping can be detected normally vmpower> show_vm Test Case - expected test result: VM power management communication channels can sucessfully connected and show correct vm core information Test Case - name: VM scale cpu frequency down Test Case - Description: Check vm power management support VM configure self cores frequency down Test Case -command / instruction: Setup VM power management environment Send cpu frequency down hints to Host vmpower(guest)> set_cpu_freq 0 down Verify the frequency of physical CPU has been scaled down correctly vmpower> get_cpu_freq 1 Core 1 frequency: 270 Check other CPUs' frequency is not affected by actions above Check if the other VM works fine (if they use different CPUs) Repeat above actions several times Test Case - expected test result: Frequency for VM's core can be scaling down normally Test Case - name: VM scale cpu frequency up Test Case - Description: Check vm power management support VM configure self cores frequency up Test Case -command / instruction: Setup VM power management environment Send cpu frequency up hints to Host vmpower(guest)> set_cpu_freq 0 up Verify the frequency o
[dpdk-dev] [PATCH v4 00/10] VM Power Management
Virtual Machine Power Management. The following patches add two DPDK sample applications and an alternate implementation of librte_power for use in virtualized environments. The idea is to provide librte_power functionality from within a VM to address the lack of MSRs to facilitate frequency changes from within a VM. It is ideally suited for Haswell which provides per core frequency scaling. The current librte_power affects frequency changes via the acpi-cpufreq 'userspace' power governor, accessed via sysfs. General Overview:(more information in each patch that follows). The VM Power Management solution provides two components: 1)VM: Allows for the a DPDK application in a VM to reuse the librte_power interface. Each lcore opens a Virto-Serial endpoint channel to the host, where the re-implementation of librte_power simply forwards the requests for frequency change to a host based monitor. The host monitor itself uses librte_power. Each lcore channel corresponds to a serial device '/dev/virtio-ports/virtio.serial.port.poweragent.' which is opened in non-blocking mode. While each Virtual CPU can be mapped to multiple physical CPUs it is recommended that each vCPU should be mapped to a single core only. 2)Host: The host monitor is managed by a CLI, it allows for adding qemu/KVM virtual machines and associated channels to the monitor, manually changing CPU frequency, inspecting the state of VMs, vCPU to pCPU pinning and managing channels. Host channel endpoints are Virto-Serial endpoints configured as AF_UNIX file sockets which follow a specific naming convention i.e /tmp/powermonitor/., each channel has an 1:1 mapping to a VM endpoint i.e. /dev/virtio-ports/virtio.serial.port.poweragent. Host channel endpoints are opened in non-blocking mode and are monitored via epoll. Requests over each channel to change frequency are forwarded to the original librte_power. Channels must be manually configured as qemu-kvm command line arguments or libvirt domain definition(xml) e.g. Where multiple channels can be configured by specifying multiple elements, by replacing , . (port number) should be incremented by 1 for each new channel element. More information on Virtio-Serial can be found here: http://fedoraproject.org/wiki/Features/VirtioSerial To enable the Hypervisor creation of channels, the host endpoint directory must be created with qemu permissions: mkdir /tmp/powermonitor chown qemu:qemu /tmp/powermonitor The host application runs on two separate lcores: Core N) CLI: For management of Virtual Machines adding channels to Monitor thread, inspecting state and manually setting CPU frequency [PATCH 02/09] Core N+1) Monitor Thread: An epoll based infinite loop that waits on channel events from VMs and calls the corresponding librte_power functions. A sample application is also provided to run on Virtual Machines, this application provides a CLI to manually set the frequency of a vCPU[PATCH 08/09] The current l3fwd-power sample application can also be run on a VM. Changes in V4: Fixed double free of channel during VM shutdown. Changes in V3: Fixed crash in Guest CLI when host application is not running. Renamed #defines to be more specific to the module they belong Added vCPU pinning via CLI Changes in V2: Runtime selection of librte_power implementations. Updated Unit tests to cover librte_power changes. PATCH[0/3] was sent twice, again as PATCH[0/4] Miscellaneous fixes. Alan Carew (10): Channel Manager and Monitor for VM Power Management(Host). VM Power Management CLI(Host). CPU Frequency Power Management(Host). VM Power Management application and Makefile. VM Power Management CLI(Guest). VM communication channels for VM Power Management(Guest). librte_power common interface for Guest and Host Packet format for VM Power Management(Host and Guest). Build system integration for VM Power Management(Guest and Host) VM Power Management Unit Tests app/test/Makefile | 3 +- app/test/autotest_data.py | 26 + app/test/test_power.c | 445 +--- app/test/test_power_acpi_cpufreq.c | 544 ++ app/test/test_power_kvm_vm.c | 308 examples/vm_power_manager/Makefile | 57 ++ examples/vm_power_manager/channel_manager.c| 804 + examples/vm_power_manager/channel_manager.h| 314 examples/vm_power_manager/channel_monitor.c| 231 ++ examples/vm_power_manager/channel_monitor.h| 102 +++ examples/vm_power_manager/guest_cli/Makefile | 56 ++ examples/vm_power_manager/guest_cli/main.c | 87 +++ examples/vm_power_manager/guest_cli/main.h | 52 ++ .../guest_cli/vm_power_cli_guest.c | 155 .../guest_cli/vm_power_cli_guest.h | 55 ++ examples/vm_power_manager/main.c