[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-12-12 Thread Paolo Bonzini


On 12/12/2014 17:10, Thomas Monjalon wrote:
> > Ok, this looks specific enough that an out-of-band solution within DPDK
> > sounds like the best approach.  It seems unnecessary to involve the
> > hypervisor (neither KVM nor QEMU).
>
> Paolo, I don't understand why you don't imagine controlling frequency scaling
> of a pinned vCPU transparently?

Probably because I don't imagine controlling frequency scaling from the
application on bare metal, either. :)  It seems to me that this is just
working around limitations of the kernel.

Paolo

> In my understanding, we currently cannot control frequency scaling without
> knowing wether we are in a VM or not.


[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-12-12 Thread Thomas Monjalon
2014-12-12 15:50, Paolo Bonzini:
> On 12/12/2014 14:00, Carew, Alan wrote:
> > The problem is deterministic control of host CPU frequency and the DPDK 
> > usage
> > model.
> > A hands-off power governor will scale based on workload, whether this is a 
> > host
> > application or VM, so no problems or bug there.
> > 
> > Where this solution fits is where an application wants to control its own
> > power policy, for example l3fwd_power uses librte_power library to change
> > frequency via apci_cpufreq based on application heuristics rather than
> > relying on an inbuilt policy for example ondemand or performance.
> > 
> > This ability has existed in DPDK for host usage for some time and VM power
> > management allows this use case to be extended to cater for virtual machines
> > by re-using the librte_power interface to encapsulate the VM->Host
> > comms and provide an example means of managing such communications.
> > 
> >  I hope this clears it up a bit.
> 
> Ok, this looks specific enough that an out-of-band solution within DPDK
> sounds like the best approach.  It seems unnecessary to involve the
> hypervisor (neither KVM nor QEMU).

Paolo, I don't understand why you don't imagine controlling frequency scaling
of a pinned vCPU transparently?
In my understanding, we currently cannot control frequency scaling without
knowing wether we are in a VM or not.

-- 
Thomas


[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-12-12 Thread Paolo Bonzini


On 12/12/2014 14:00, Carew, Alan wrote:
> The problem is deterministic control of host CPU frequency and the DPDK usage
> model.
> A hands-off power governor will scale based on workload, whether this is a 
> host
> application or VM, so no problems or bug there.
> 
> Where this solution fits is where an application wants to control its own
> power policy, for example l3fwd_power uses librte_power library to change
> frequency via apci_cpufreq based on application heuristics rather than
> relying on an inbuilt policy for example ondemand or performance.
> 
> This ability has existed in DPDK for host usage for some time and VM power
> management allows this use case to be extended to cater for virtual machines
> by re-using the librte_power interface to encapsulate the VM->Host
> comms and provide an example means of managing such communications.
> 
>  I hope this clears it up a bit.

Ok, this looks specific enough that an out-of-band solution within DPDK
sounds like the best approach.  It seems unnecessary to involve the
hypervisor (neither KVM nor QEMU).

Paolo


[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-12-12 Thread Carew, Alan
Hi Paolo,

> 2014-12-09 18:35, Paolo Bonzini:
> >  Did you make any progress in Qemu/KVM community?
> >  We need to be sync'ed up with them to be sure we share the same
> goal.
> >  I want also to avoid using a solution which doesn't fit with
> >  their plan.
> >  Remember that we already had this problem with ivshmem which
> was
> >  planned to be dropped.
> > >>>
> > >>> Unfortunately, I have not yet received any feedback:
> > >>> http://lists.nongnu.org/archive/html/qemu-devel/2014-
> 11/msg01103.h
> > >>> tml
> > >>
> > >> Just to add to what Alan said above, this capability does not exist
> > >> in qemu at the moment, and based on there having been no feedback
> > >> on th qemu mailing list so far, I think it's reasonable to assume
> > >> that it will not be implemented in the immediate future. The VM
> > >> Power Management feature has also been designed to allow easy
> > >> migration to a qemu-based solution when this is supported in
> > >> future. Therefore, I'd be in favour of accepting this feature into DPDK
> now.
> > >>
> > >> It's true that the implementation is a work-around, but there have
> > >> been similar cases in DPDK in the past. One recent example that
> > >> comes to mind is userspace vhost. The original implementation could
> > >> also be considered a work-around, but it met the needs of many in
> > >> the community. Now, with support for vhost-user in qemu 2.1, that
> > >> implementation is being improved. I'd see VM Power Management
> > >> following a similar path when this capability is supported in qemu.
> >
> > I wonder if this might be papering over a bug in the host cpufreq
> > driver.  If the guest is not doing much and leaving a lot of idle CPU
> > time, the host should scale down the frequency of that CPU.  In the
> > case of pinned VCPUs this should really "just work".  What is the
> > problem that is being solved?
> >
> > Paolo
> 
> Alan, Pablo, please could you explain your logic with VM power
> management?
> 
> --
> Thomas

The problem is deterministic control of host CPU frequency and the DPDK usage
model.
A hands-off power governor will scale based on workload, whether this is a host
application or VM, so no problems or bug there.

Where this solution fits is where an application wants to control its own
power policy, for example l3fwd_power uses librte_power library to change
frequency via apci_cpufreq based on application heuristics rather than
relying on an inbuilt policy for example ondemand or performance.

This ability has existed in DPDK for host usage for some time and VM power
management allows this use case to be extended to cater for virtual machines
by re-using the librte_power interface to encapsulate the VM->Host
comms and provide an example means of managing such communications.

 I hope this clears it up a bit.

Thanks,
Alan.


[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-12-12 Thread Thomas Monjalon
2014-12-09 18:35, Paolo Bonzini:
>  Did you make any progress in Qemu/KVM community?
>  We need to be sync'ed up with them to be sure we share the same goal.
>  I want also to avoid using a solution which doesn't fit with their
>  plan.
>  Remember that we already had this problem with ivshmem which was
>  planned to be dropped.
> >>> 
> >>> Unfortunately, I have not yet received any feedback:
> >>> http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01103.html
> >>
> >> Just to add to what Alan said above, this capability does not exist in
> >> qemu at the moment, and based on there having been no feedback on th
> >> qemu mailing list so far, I think it's reasonable to assume that it
> >> will not be implemented in the immediate future. The VM Power
> >> Management feature has also been designed to allow easy migration to a
> >> qemu-based solution when this is supported in future. Therefore, I'd
> >> be in favour of accepting this feature into DPDK now.
> >>
> >> It's true that the implementation is a work-around, but there have
> >> been similar cases in DPDK in the past. One recent example that comes
> >> to mind is userspace vhost. The original implementation could also be
> >> considered a work-around, but it met the needs of many in the
> >> community. Now, with support for vhost-user in qemu 2.1, that
> >> implementation is being improved. I'd see VM Power Management
> >> following a similar path when this capability is supported in qemu.
> 
> I wonder if this might be papering over a bug in the host cpufreq
> driver.  If the guest is not doing much and leaving a lot of idle CPU
> time, the host should scale down the frequency of that CPU.  In the case
> of pinned VCPUs this should really "just work".  What is the problem
> that is being solved?
> 
> Paolo

Alan, Pablo, please could you explain your logic with VM power management?

-- 
Thomas


[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-12-09 Thread Paolo Bonzini
I had replied to this message, but my reply never got to the list.
Let's try again.

I wonder if this might be papering over a bug in the host cpufreq
driver.  If the guest is not doing much and leaving a lot of idle CPU
time, the host should scale down the frequency of that CPU.  In the case
of pinned VCPUs this should really "just work".  What is the problem
that is being solved?

Paolo

On 22/11/2014 18:17, Vincent JARDIN wrote:
> Tim,
> 
> cc-ing Paolo and qemu-devel@ again in order to get their take on it.
> 
 Did you make any progress in Qemu/KVM community?
 We need to be sync'ed up with them to be sure we share the same goal.
 I want also to avoid using a solution which doesn't fit with their
 plan.
 Remember that we already had this problem with ivshmem which was
 planned to be dropped.

> 
>>> Unfortunately, I have not yet received any feedback:
>>> http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01103.html
>>
>> Just to add to what Alan said above, this capability does not exist in
>> qemu at the moment, and based on there having been no feedback on the
>> qemu mailing list so far, I think it's reasonable to assume that it
>> will not be implemented in the immediate future. The VM Power
>> Management feature has also been designed to allow easy migration to a
>> qemu-based solution when this is supported in future. Therefore, I'd
>> be in favour of accepting this feature into DPDK now.
>>
>> It's true that the implementation is a work-around, but there have
>> been similar cases in DPDK in the past. One recent example that comes
>> to mind is userspace vhost. The original implementation could also be
>> considered a work-around, but it met the needs of many in the
>> community. Now, with support for vhost-user in qemu 2.1, that
>> implementation is being improved. I'd see VM Power Management
>> following a similar path when this capability is supported in qemu.
> 
> Best regards,
>   Vincent
> 


[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-11-22 Thread Vincent JARDIN
Tim,

cc-ing Paolo and qemu-devel@ again in order to get their take on it.

>>> Did you make any progress in Qemu/KVM community?
>>> We need to be sync'ed up with them to be sure we share the same goal.
>>> I want also to avoid using a solution which doesn't fit with their plan.
>>> Remember that we already had this problem with ivshmem which was
>>> planned to be dropped.
>>>

>> Unfortunately, I have not yet received any feedback:
>> http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01103.html
>
> Just to add to what Alan said above, this capability does not exist in qemu 
> at the moment, and based on there having been no feedback on the qemu mailing 
> list so far, I think it's reasonable to assume that it will not be 
> implemented in the immediate future. The VM Power Management feature has also 
> been designed to allow easy migration to a qemu-based solution when this is 
> supported in future. Therefore, I'd be in favour of accepting this feature 
> into DPDK now.
>
> It's true that the implementation is a work-around, but there have been 
> similar cases in DPDK in the past. One recent example that comes to mind is 
> userspace vhost. The original implementation could also be considered a 
> work-around, but it met the needs of many in the community. Now, with support 
> for vhost-user in qemu 2.1, that implementation is being improved. I'd see VM 
> Power Management following a similar path when this capability is supported 
> in qemu.

Best regards,
   Vincent


[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-11-21 Thread Zhu, Heqing
Pablo just sent a new patch set. This is a significant effort and it addressed 
a valid technical problem statement. 
I express my support to this feature into the DPDK mainline. 

IMHO, the previous *rejection* reason are not solid. It is important to 
encourage the real contribution like this. 


-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of O'driscoll, Tim
Sent: Monday, November 10, 2014 10:54 AM
To: Carew, Alan; Thomas Monjalon
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH v4 00/10] VM Power Management

> From: Carew, Alan
> 
> > Did you make any progress in Qemu/KVM community?
> > We need to be sync'ed up with them to be sure we share the same goal.
> > I want also to avoid using a solution which doesn't fit with their plan.
> > Remember that we already had this problem with ivshmem which was 
> > planned to be dropped.
> >
. . .
> 
> Unfortunately, I have not yet received any feedback:
> http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01103.html

Just to add to what Alan said above, this capability does not exist in qemu at 
the moment, and based on there having been no feedback on the qemu mailing list 
so far, I think it's reasonable to assume that it will not be implemented in 
the immediate future. The VM Power Management feature has also been designed to 
allow easy migration to a qemu-based solution when this is supported in future. 
Therefore, I'd be in favour of accepting this feature into DPDK now.

It's true that the implementation is a work-around, but there have been similar 
cases in DPDK in the past. One recent example that comes to mind is userspace 
vhost. The original implementation could also be considered a work-around, but 
it met the needs of many in the community. Now, with support for vhost-user in 
qemu 2.1, that implementation is being improved. I'd see VM Power Management 
following a similar path when this capability is supported in qemu.


Tim


[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-11-10 Thread O'driscoll, Tim
> From: Carew, Alan
> 
> > Did you make any progress in Qemu/KVM community?
> > We need to be sync'ed up with them to be sure we share the same goal.
> > I want also to avoid using a solution which doesn't fit with their plan.
> > Remember that we already had this problem with ivshmem which was
> > planned to be dropped.
> >
. . .
> 
> Unfortunately, I have not yet received any feedback:
> http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01103.html

Just to add to what Alan said above, this capability does not exist in qemu at 
the moment, and based on there having been no feedback on the qemu mailing list 
so far, I think it's reasonable to assume that it will not be implemented in 
the immediate future. The VM Power Management feature has also been designed to 
allow easy migration to a qemu-based solution when this is supported in future. 
Therefore, I'd be in favour of accepting this feature into DPDK now.

It's true that the implementation is a work-around, but there have been similar 
cases in DPDK in the past. One recent example that comes to mind is userspace 
vhost. The original implementation could also be considered a work-around, but 
it met the needs of many in the community. Now, with support for vhost-user in 
qemu 2.1, that implementation is being improved. I'd see VM Power Management 
following a similar path when this capability is supported in qemu.


Tim


[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-11-10 Thread Carew, Alan
Hi Thomas,

> Hi Alan,
> 
> Did you make any progress in Qemu/KVM community?
> We need to be sync'ed up with them to be sure we share the same goal.
> I want also to avoid using a solution which doesn't fit with their plan.
> Remember that we already had this problem with ivshmem which was
> planned to be dropped.
> 
> Thanks
> --
> Thomas
> 
> 
> 2014-10-16 15:21, Carew, Alan:
> > Hi Thomas,
> >
> > > > However with a DPDK solution it would be possible to re-use the
> message bus
> > > > to pass information like device stats, application state, D-state 
> > > > requests
> > > > etc. to the host and allow for management layer(e.g. OpenStack) to
> make
> > > > informed decisions.
> > >
> > > I think that management informations should be transmitted in a
> management
> > > channel. Such solution should exist in OpenStack.
> >
> > Perhaps it does, but this solution is not exclusive to OpenStack and just a
> potential use case.
> >
> > >
> > > > Also, the scope of adding power management to qemu/KVM would be
> huge;
> > > > while the easier path is not always the best and the problem of power
> > > > management in VMs is both a DPDK problem (given that librte_power
> only
> > > > worked on the host) and a general virtualization problem that would be
> > > > better solved by those with direct knowledge of Qemu/KVM
> architecture
> > > > and influence on the direction of the Qemu project.
> > >
> > > Being a huge effort is not an argument.
> >
> > I agree completely and was implied by what followed the conjunction.
> >
> > > Please check with Qemu community, they'll welcome it.
> > >
> > > > As it stands, the host backend is simply an example application that can
> > > > be replaced by a VMM or Orchestration layer, by using Virtio-Serial it
> has
> > > > obvious leanings to Qemu, but even this could be easily swapped out
> for
> > > > XenBus, IVSHMEM, IP etc.
> > > >
> > > > If power management is to be eventually supported by Hypervisors
> directly
> > > > then we could also enable to option to switch to that environment,
> currently
> > > > the librte_power implementations (VM or Host) can be selected
> dynamically
> > > > (environment auto-detection) or explicitly via rte_power_set_env(),
> adding
> > > > an arbitrary number of environments is relatively easy.
> > >
> > > Yes, you are adding a new layer to workaround hypervisor lacks. And this
> layer
> > > will handle native support when it will exist. But if you implement native
> > > support now, we don't need this extra layer.
> >
> > Indeed, but we have a solution implemented now and yes it is a
> workaround, that is until Hypervisors support such functionality. It is 
> possible
> that whatever solutions for power management present themselves in the
> future may require workarounds also, us-vhost is an example of such a
> workaround introduced to DPDK.
> >
> > >
> > > > I hope this helps to clarify the approach.
> > >
> > > Thanks for your explanation.
> >
> > Thanks for the feedback.
> >
> > >
> > > --
> > > Thomas
> >
> > Alan.

Unfortunately, I have not yet received any feedback:
http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01103.html

Alan.




[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-10-28 Thread Thomas Monjalon
Hi Alan,

Did you make any progress in Qemu/KVM community?
We need to be sync'ed up with them to be sure we share the same goal.
I want also to avoid using a solution which doesn't fit with their plan.
Remember that we already had this problem with ivshmem which was planned
to be dropped.

Thanks
-- 
Thomas


2014-10-16 15:21, Carew, Alan:
> Hi Thomas,
> 
> > > However with a DPDK solution it would be possible to re-use the message 
> > > bus
> > > to pass information like device stats, application state, D-state requests
> > > etc. to the host and allow for management layer(e.g. OpenStack) to make
> > > informed decisions.
> > 
> > I think that management informations should be transmitted in a management
> > channel. Such solution should exist in OpenStack.
> 
> Perhaps it does, but this solution is not exclusive to OpenStack and just a 
> potential use case.
> 
> > 
> > > Also, the scope of adding power management to qemu/KVM would be huge;
> > > while the easier path is not always the best and the problem of power
> > > management in VMs is both a DPDK problem (given that librte_power only
> > > worked on the host) and a general virtualization problem that would be
> > > better solved by those with direct knowledge of Qemu/KVM architecture
> > > and influence on the direction of the Qemu project.
> > 
> > Being a huge effort is not an argument.
> 
> I agree completely and was implied by what followed the conjunction.
> 
> > Please check with Qemu community, they'll welcome it.
> > 
> > > As it stands, the host backend is simply an example application that can
> > > be replaced by a VMM or Orchestration layer, by using Virtio-Serial it has
> > > obvious leanings to Qemu, but even this could be easily swapped out for
> > > XenBus, IVSHMEM, IP etc.
> > >
> > > If power management is to be eventually supported by Hypervisors directly
> > > then we could also enable to option to switch to that environment, 
> > > currently
> > > the librte_power implementations (VM or Host) can be selected dynamically
> > > (environment auto-detection) or explicitly via rte_power_set_env(), adding
> > > an arbitrary number of environments is relatively easy.
> > 
> > Yes, you are adding a new layer to workaround hypervisor lacks. And this 
> > layer
> > will handle native support when it will exist. But if you implement native
> > support now, we don't need this extra layer.
> 
> Indeed, but we have a solution implemented now and yes it is a workaround, 
> that is until Hypervisors support such functionality. It is possible that 
> whatever solutions for power management present themselves in the future may 
> require workarounds also, us-vhost is an example of such a workaround 
> introduced to DPDK.
> 
> > 
> > > I hope this helps to clarify the approach.
> > 
> > Thanks for your explanation.
> 
> Thanks for the feedback.
> 
> > 
> > --
> > Thomas
> 
> Alan.



[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-10-16 Thread Carew, Alan
Hi Thomas,

> > However with a DPDK solution it would be possible to re-use the message bus
> > to pass information like device stats, application state, D-state requests
> > etc. to the host and allow for management layer(e.g. OpenStack) to make
> > informed decisions.
> 
> I think that management informations should be transmitted in a management
> channel. Such solution should exist in OpenStack.

Perhaps it does, but this solution is not exclusive to OpenStack and just a 
potential use case.

> 
> > Also, the scope of adding power management to qemu/KVM would be huge;
> > while the easier path is not always the best and the problem of power
> > management in VMs is both a DPDK problem (given that librte_power only
> > worked on the host) and a general virtualization problem that would be
> > better solved by those with direct knowledge of Qemu/KVM architecture
> > and influence on the direction of the Qemu project.
> 
> Being a huge effort is not an argument.

I agree completely and was implied by what followed the conjunction.

> Please check with Qemu community, they'll welcome it.
> 
> > As it stands, the host backend is simply an example application that can
> > be replaced by a VMM or Orchestration layer, by using Virtio-Serial it has
> > obvious leanings to Qemu, but even this could be easily swapped out for
> > XenBus, IVSHMEM, IP etc.
> >
> > If power management is to be eventually supported by Hypervisors directly
> > then we could also enable to option to switch to that environment, currently
> > the librte_power implementations (VM or Host) can be selected dynamically
> > (environment auto-detection) or explicitly via rte_power_set_env(), adding
> > an arbitrary number of environments is relatively easy.
> 
> Yes, you are adding a new layer to workaround hypervisor lacks. And this layer
> will handle native support when it will exist. But if you implement native
> support now, we don't need this extra layer.

Indeed, but we have a solution implemented now and yes it is a workaround, that 
is until Hypervisors support such functionality. It is possible that whatever 
solutions for power management present themselves in the future may require 
workarounds also, us-vhost is an example of such a workaround introduced to 
DPDK.

> 
> > I hope this helps to clarify the approach.
> 
> Thanks for your explanation.

Thanks for the feedback.

> 
> --
> Thomas

Alan.


[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-10-14 Thread Thomas Monjalon
2014-10-14 12:37, Carew, Alan:
> > > The following patches add two DPDK sample applications and an alternate
> > > implementation of librte_power for use in virtualized environments.
> > > The idea is to provide librte_power functionality from within a VM to 
> > > address
> > > the lack of MSRs to facilitate frequency changes from within a VM.
> > > It is ideally suited for Haswell which provides per core frequency 
> > > scaling.
> > >
> > > The current librte_power affects frequency changes via the acpi-cpufreq
> > > 'userspace' power governor, accessed via sysfs.
> > 
> > Something was preventing me from looking deeper in this big codebase,
> > but I didn't know what sounds weird.
> > Now I realize: the real problem is that virtualization transparency is
> > broken for power management. So the right thing to do is to fix it in
> > KVM. I think all this patchset is a huge workaround.
> > 
> > Did you try to fix it with Qemu/KVM?
> 
> When looking at the libvirt API it would seem to be a natural fit to have
> power management sitting there, so in essence I would agree.
> 
> However with a DPDK solution it would be possible to re-use the message bus
> to pass information like device stats, application state, D-state requests
> etc. to the host and allow for management layer(e.g. OpenStack) to make
> informed decisions.

I think that management informations should be transmitted in a management
channel. Such solution should exist in OpenStack.

> Also, the scope of adding power management to qemu/KVM would be huge;
> while the easier path is not always the best and the problem of power
> management in VMs is both a DPDK problem (given that librte_power only
> worked on the host) and a general virtualization problem that would be
> better solved by those with direct knowledge of Qemu/KVM architecture
> and influence on the direction of the Qemu project.

Being a huge effort is not an argument.
Please check with Qemu community, they'll welcome it.

> As it stands, the host backend is simply an example application that can
> be replaced by a VMM or Orchestration layer, by using Virtio-Serial it has
> obvious leanings to Qemu, but even this could be easily swapped out for
> XenBus, IVSHMEM, IP etc.
> 
> If power management is to be eventually supported by Hypervisors directly
> then we could also enable to option to switch to that environment, currently
> the librte_power implementations (VM or Host) can be selected dynamically
> (environment auto-detection) or explicitly via rte_power_set_env(), adding
> an arbitrary number of environments is relatively easy.

Yes, you are adding a new layer to workaround hypervisor lacks. And this layer
will handle native support when it will exist. But if you implement native
support now, we don't need this extra layer.

> I hope this helps to clarify the approach.

Thanks for your explanation.

-- 
Thomas


[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-10-14 Thread Carew, Alan
Hi Thomas,

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Monday, October 13, 2014 9:26 PM
> To: Carew, Alan
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 00/10] VM Power Management
> 
> Hi Alan,
> 
> 2014-10-12 20:36, Alan Carew:
> > The following patches add two DPDK sample applications and an alternate
> > implementation of librte_power for use in virtualized environments.
> > The idea is to provide librte_power functionality from within a VM to 
> > address
> > the lack of MSRs to facilitate frequency changes from within a VM.
> > It is ideally suited for Haswell which provides per core frequency scaling.
> >
> > The current librte_power affects frequency changes via the acpi-cpufreq
> > 'userspace' power governor, accessed via sysfs.
> 
> Something was preventing me from looking deeper in this big codebase,
> but I didn't know what sounds weird.
> Now I realize: the real problem is that virtualization transparency is
> broken for power management. So the right thing to do is to fix it in
> KVM. I think all this patchset is a huge workaround.
> 
> Did you try to fix it with Qemu/KVM?
> 
> --
> Thomas

When looking at the libvirt API it would seem to be a natural fit to have power 
management sitting there, so in essence I would agree.

However with a DPDK solution it would be possible to re-use the message bus to 
pass information like device stats, application state, D-state requests etc. to 
the host and allow for management layer(e.g. OpenStack) to make informed 
decisions.

Also, the scope of adding power management to qemu/KVM would be huge; while the 
easier path is not always the best and the problem of power management in VMs 
is both a DPDK problem (given that librte_power only worked on the host) and a 
general virtualization problem that would be better solved by those with direct 
knowledge of Qemu/KVM architecture and influence on the direction of the Qemu 
project.

As it stands, the host backend is simply an example application that can be 
replaced by a VMM or Orchestration layer, by using Virtio-Serial it has obvious 
leanings to Qemu, but even this could be easily swapped out for XenBus, 
IVSHMEM, IP etc.

If power management is to be eventually supported by Hypervisors directly then 
we could also enable to option to switch to that environment, currently the 
librte_power implementations (VM or Host) can be selected 
dynamically(environment auto-detection) or explicitly via rte_power_set_env(), 
adding an arbitrary number of environments is relatively easy.

I hope this helps to clarify the approach.


Thanks,
Alan.


[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-10-13 Thread Thomas Monjalon
Hi Alan,

2014-10-12 20:36, Alan Carew:
> The following patches add two DPDK sample applications and an alternate
> implementation of librte_power for use in virtualized environments.
> The idea is to provide librte_power functionality from within a VM to address
> the lack of MSRs to facilitate frequency changes from within a VM.
> It is ideally suited for Haswell which provides per core frequency scaling.
> 
> The current librte_power affects frequency changes via the acpi-cpufreq
> 'userspace' power governor, accessed via sysfs.

Something was preventing me from looking deeper in this big codebase,
but I didn't know what sounds weird.
Now I realize: the real problem is that virtualization transparency is
broken for power management. So the right thing to do is to fix it in
KVM. I think all this patchset is a huge workaround.

Did you try to fix it with Qemu/KVM?

-- 
Thomas


[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-10-13 Thread Liu, Yong
Patch name: VM Power Management
Brief description:  Verify VM power management in virtualized environments
Test Flag:  Tested-by 
Tester name:yong.liu at intel.com
Test environment:
OS: Fedora20 3.11.10-301.fc20.x86_64
GCC: gcc version 4.8.3 20140911
CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
NIC: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ 
Network Connection [8086:10fb]
Test Tool Chain information:
Qemu: 1.6.1
libvirt: 1.1.3
Guest OS: Fedora20 3.11.10-301.fc20.x86_64
Guest GCC: gcc version 4.8.3 20140624

Commit ID:  72d3e7ad3183f42f8b9fb3bb1c12b3e1b39eef39

Detailed Testing information
DPDK SW Configuration:
Default x86_64-native-linuxapp-gcc configuration
Test Result Summary:Total 7 cases, 7 passed, 0 failed

Test Case - name:
VM Power Management Channel
Test Case - Description:
Check vm power management communication channels can 
successfully connected
Test Case -command / instruction:
Create folder in system temporary filesystem for power 
monitor socket
mkdir -p /tmp/powermonitor

Configure VM XML and pin VCPUs to specified CPUs
5

 
 
  
  
  


Configure VM XML to set up virtio serial ports






Run power-manager monitor in Host
./build/vm_power_mgr -c 0x3 -n 4

Startup VM and run guest_vm_power_mgr
guest_vm_power_mgr -c 0x1f -n 4 -- -i

Add vm in host and check vm_power_mgr can get frequency 
normally

vmpower> add_vm 
vmpower> add_channels  all
vmpower> get_cpu_freq 

Check vcpu/cpu mapping can be detected normally
vmpower> show_vm 

Test Case - expected test result:
VM power management communication channels can 
sucessfully connected and host can get vm core information

Test Case - name:
VM Power Management Numa
Test Case - Description:
Check vm power management support manage cores in 
different sockets
Test Case -command / instruction:
Get core and socket information by cpu_layout

./tools/cpu_layout.py

Configure VM XML to pin VCPUs on Socket1:

Repeat Case1

Check vcpu/cpu mapping can be detected normally
vmpower> show_vm 

Test Case - expected test result:
VM power management communication channels can 
sucessfully connected and show correct vm core information

Test Case - name:
VM scale cpu frequency down 
Test Case - Description:
Check vm power management support VM configure self 
cores frequency down
Test Case -command / instruction:
Setup VM power management environment

Send cpu frequency down hints to Host 

vmpower(guest)> set_cpu_freq 0 down

Verify the frequency of physical CPU has been scaled 
down correctly
vmpower> get_cpu_freq 1
Core 1 frequency: 270

Check other CPUs' frequency is not affected by actions 
above

Check if the other VM works fine (if they use different 
CPUs)

Repeat above actions several times

Test Case - expected test result:
Frequency for VM's core can be scaling down normally

Test Case - name:
VM scale cpu frequency up 
Test Case - Description:
Check vm power management support VM configure self 
cores frequency up
Test Case -command / instruction:
Setup VM power management environment

Send cpu frequency up hints to Host 

vmpower(guest)> set_cpu_freq 0 up

Verify the frequency o

[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-10-12 Thread Alan Carew
Virtual Machine Power Management.

The following patches add two DPDK sample applications and an alternate
implementation of librte_power for use in virtualized environments.
The idea is to provide librte_power functionality from within a VM to address
the lack of MSRs to facilitate frequency changes from within a VM.
It is ideally suited for Haswell which provides per core frequency scaling.

The current librte_power affects frequency changes via the acpi-cpufreq
'userspace' power governor, accessed via sysfs.

General Overview:(more information in each patch that follows).
The VM Power Management solution provides two components:

 1)VM: Allows for the a DPDK application in a VM to reuse the librte_power
 interface. Each lcore opens a Virto-Serial endpoint channel to the host,
 where the re-implementation of librte_power simply forwards the requests for
 frequency change to a host based monitor. The host monitor itself uses
 librte_power.
 Each lcore channel corresponds to a
 serial device '/dev/virtio-ports/virtio.serial.port.poweragent.'
 which is opened in non-blocking mode.
 While each Virtual CPU can be mapped to multiple physical CPUs it is
 recommended that each vCPU should be mapped to a single core only.

 2)Host: The host monitor is managed by a CLI, it allows for adding qemu/KVM
 virtual machines and associated channels to the monitor, manually changing
 CPU frequency, inspecting the state of VMs, vCPU to pCPU pinning and managing
 channels.
 Host channel endpoints are Virto-Serial endpoints configured as AF_UNIX file
 sockets which follow a specific naming convention
 i.e /tmp/powermonitor/.,
 each channel has an 1:1 mapping to a VM endpoint
 i.e. /dev/virtio-ports/virtio.serial.port.poweragent.
 Host channel endpoints are opened in non-blocking mode and are monitored via 
epoll.
 Requests over each channel to change frequency are forwarded to the original
 librte_power.

Channels must be manually configured as qemu-kvm command line arguments or
libvirt domain definition(xml) e.g.

 


  
  


Where multiple channels can be configured by specifying multiple 
elements, by replacing , .
(port number) should be incremented by 1 for each new channel element.
More information on Virtio-Serial can be found here:
http://fedoraproject.org/wiki/Features/VirtioSerial
To enable the Hypervisor creation of channels, the host endpoint directory
must be created with qemu permissions:
mkdir /tmp/powermonitor
chown qemu:qemu /tmp/powermonitor

The host application runs on two separate lcores:
Core N) CLI: For management of Virtual Machines adding channels to Monitor 
thread,
 inspecting state and manually setting CPU frequency [PATCH 02/09]
Core N+1) Monitor Thread: An epoll based infinite loop that waits on channel 
events
 from VMs and calls the corresponding librte_power functions.

A sample application is also provided to run on Virtual Machines, this
application provides a CLI to manually set the frequency of a 
vCPU[PATCH 08/09]

The current l3fwd-power sample application can also be run on a VM.

Changes in V4:
 Fixed double free of channel during VM shutdown.

Changes in V3:
 Fixed crash in Guest CLI when host application is not running.
 Renamed #defines to be more specific to the module they belong
 Added vCPU pinning via CLI

Changes in V2:
 Runtime selection of librte_power implementations.
 Updated Unit tests to cover librte_power changes.
 PATCH[0/3] was sent twice, again as PATCH[0/4]
 Miscellaneous fixes.

Alan Carew (10):
  Channel Manager and Monitor for VM Power Management(Host).
  VM Power Management CLI(Host).
  CPU Frequency Power Management(Host).
  VM Power Management application and Makefile.
  VM Power Management CLI(Guest).
  VM communication channels for VM Power Management(Guest).
  librte_power common interface for Guest and Host
  Packet format for VM Power Management(Host and Guest).
  Build system integration for VM Power Management(Guest and Host)
  VM Power Management Unit Tests

 app/test/Makefile  |   3 +-
 app/test/autotest_data.py  |  26 +
 app/test/test_power.c  | 445 +---
 app/test/test_power_acpi_cpufreq.c | 544 ++
 app/test/test_power_kvm_vm.c   | 308 
 examples/vm_power_manager/Makefile |  57 ++
 examples/vm_power_manager/channel_manager.c| 804 +
 examples/vm_power_manager/channel_manager.h| 314 
 examples/vm_power_manager/channel_monitor.c| 231 ++
 examples/vm_power_manager/channel_monitor.h| 102 +++
 examples/vm_power_manager/guest_cli/Makefile   |  56 ++
 examples/vm_power_manager/guest_cli/main.c |  87 +++
 examples/vm_power_manager/guest_cli/main.h |  52 ++
 .../guest_cli/vm_power_cli_guest.c | 155 
 .../guest_cli/vm_power_cli_guest.h |  55 ++
 examples/vm_power_manager/main.c