Re: [openstack-dev] [nova] should we allow overcommit for a single VM?

2015-08-24 Thread Daniel P. Berrange
On Mon, Aug 17, 2015 at 01:22:28PM -0600, Chris Friesen wrote:
 
 I tried bringing this up on the irc channel, but nobody took the bait.
 Hopefully this will generate some discussion.
 
 I just filed bug 1485631.  Nikola suggested one way of handling it, but
 there are some complications that I thought I should highlight so we're all
 on the same page.
 
 The basic question is, if a host has X CPUs in total for VMs, and a single
 instance wants X+1 vCPUs, should we allow it?  (Regardless of overcommit
 ratio.)  There is also an equivalent question for RAM.
 
 Currently we have two different answers depending on whether numa topology
 is involved or not.  Should we change one of them to make it consistent with
 the other?  If so, a) which one should we change, and b) how would we do
 that given that it results in a user-visible behaviour change?  (Maybe a
 microversion, even though the actual API doesn't change, just whether the
 request passes the scheduler filter or not?)

I agree with Nikola, that the NUMA impl is the correct one. The existance
of overcommit is motivated by the idea that most users will not in fact
consume all the resources allocated to their VM all of the time and thus
on average you don't need to reserve 100% of resources for every single VM.
Users will usually be able to burst upto 100% of their allocation when
needed, if some portion of other users are mostly inactive.

If you allow a single VM to overcommit against itself though, this breaks
down. It is never possible for their single VM to burst to consume 100%
of the resources allocated to it, since the host if physically incapable
of providing that much resource.

On that basis, I think the correct behaviour is to consider overcommit
to be a factor that applies across a set of VMs only. Never allow a
single VM to overcommit against itself. Which is what the NUMA code
in libvirt currently implements. I think we should align the non-NUMA
codepath with this too.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] should we allow overcommit for a single VM?

2015-08-24 Thread Michael Still
I agree with the rationale here, and will be reviewing
https://review.openstack.org/#/c/215764/ accordingly.

Michael

On Mon, Aug 24, 2015 at 7:56 PM, Daniel P. Berrange berra...@redhat.com
wrote:

 On Mon, Aug 17, 2015 at 01:22:28PM -0600, Chris Friesen wrote:
 
  I tried bringing this up on the irc channel, but nobody took the bait.
  Hopefully this will generate some discussion.
 
  I just filed bug 1485631.  Nikola suggested one way of handling it, but
  there are some complications that I thought I should highlight so we're
 all
  on the same page.
 
  The basic question is, if a host has X CPUs in total for VMs, and a
 single
  instance wants X+1 vCPUs, should we allow it?  (Regardless of overcommit
  ratio.)  There is also an equivalent question for RAM.
 
  Currently we have two different answers depending on whether numa
 topology
  is involved or not.  Should we change one of them to make it consistent
 with
  the other?  If so, a) which one should we change, and b) how would we do
  that given that it results in a user-visible behaviour change?  (Maybe a
  microversion, even though the actual API doesn't change, just whether the
  request passes the scheduler filter or not?)

 I agree with Nikola, that the NUMA impl is the correct one. The existance
 of overcommit is motivated by the idea that most users will not in fact
 consume all the resources allocated to their VM all of the time and thus
 on average you don't need to reserve 100% of resources for every single VM.
 Users will usually be able to burst upto 100% of their allocation when
 needed, if some portion of other users are mostly inactive.

 If you allow a single VM to overcommit against itself though, this breaks
 down. It is never possible for their single VM to burst to consume 100%
 of the resources allocated to it, since the host if physically incapable
 of providing that much resource.

 On that basis, I think the correct behaviour is to consider overcommit
 to be a factor that applies across a set of VMs only. Never allow a
 single VM to overcommit against itself. Which is what the NUMA code
 in libvirt currently implements. I think we should align the non-NUMA
 codepath with this too.

 Regards,
 Daniel
 --
 |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/
 :|
 |: http://libvirt.org  -o- http://virt-manager.org
 :|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/
 :|
 |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc
 :|

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




-- 
Rackspace Australia
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] should we allow overcommit for a single VM?

2015-08-18 Thread Nikola Đipanov
On 08/17/2015 08:22 PM, Chris Friesen wrote:
 
 I tried bringing this up on the irc channel, but nobody took the bait.
 Hopefully this will generate some discussion.
 
 I just filed bug 1485631.  Nikola suggested one way of handling it, but
 there are some complications that I thought I should highlight so we're
 all on the same page.
 
 The basic question is, if a host has X CPUs in total for VMs, and a
 single instance wants X+1 vCPUs, should we allow it?  (Regardless of
 overcommit ratio.)  There is also an equivalent question for RAM.
 
 Currently we have two different answers depending on whether numa
 topology is involved or not.  Should we change one of them to make it
 consistent with the other?  If so, a) which one should we change, and b)
 how would we do that given that it results in a user-visible behaviour
 change?  (Maybe a microversion, even though the actual API doesn't
 change, just whether the request passes the scheduler filter or not?)
 

I would say that the correct behavior is what NUMA fitting logic does,
and that is to not allow instance to over-commit against itself, and we
should fix normal (non-NUMA) over-commit. Allowing the instance to
over-commit against itself does not make a lot of sense, however it is
not something that is likely to happen that often in real world usage -
I would imagine operators are unlikely to create flavors larger than
compute hosts.

I am not sure that this has anything to do with the API thought. This is
mostly a Nova internal implementation detail. Any nova deployment can
fail to boot an instance for any number of reasons, and this does not
affect the API response of the actual boot request.

Hope it helps,
N.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] should we allow overcommit for a single VM?

2015-08-18 Thread Chris Friesen

On 08/18/2015 06:56 AM, Nikola Đipanov wrote:

On 08/17/2015 08:22 PM, Chris Friesen wrote:



The basic question is, if a host has X CPUs in total for VMs, and a
single instance wants X+1 vCPUs, should we allow it?  (Regardless of
overcommit ratio.)  There is also an equivalent question for RAM.

Currently we have two different answers depending on whether numa
topology is involved or not.  Should we change one of them to make it
consistent with the other?  If so, a) which one should we change, and b)
how would we do that given that it results in a user-visible behaviour
change?  (Maybe a microversion, even though the actual API doesn't
change, just whether the request passes the scheduler filter or not?)



I would say that the correct behavior is what NUMA fitting logic does,
and that is to not allow instance to over-commit against itself, and we
should fix normal (non-NUMA) over-commit. Allowing the instance to
over-commit against itself does not make a lot of sense, however it is
not something that is likely to happen that often in real world usage -
I would imagine operators are unlikely to create flavors larger than
compute hosts.


This is a good point, in any real deployment it likely won't be an issue.  We 
only ran into it because we were testing on a minimal-sized compute node running 
in a VM on a designer box.



I am not sure that this has anything to do with the API thought. This is
mostly a Nova internal implementation detail. Any nova deployment can
fail to boot an instance for any number of reasons, and this does not
affect the API response of the actual boot request.


Arguably it would be changing the behaviour of a boot request.  Currently it 
would pass the scheduler and boot up, and we're talking about making it fail the 
scheduler filter.  That's an externally-visible change in behaviour.  (But as 
you say it's unlikely that it will be hit in the real world.)


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] should we allow overcommit for a single VM?

2015-08-17 Thread Chris Friesen


I tried bringing this up on the irc channel, but nobody took the bait. 
Hopefully this will generate some discussion.


I just filed bug 1485631.  Nikola suggested one way of handling it, but there 
are some complications that I thought I should highlight so we're all on the 
same page.


The basic question is, if a host has X CPUs in total for VMs, and a single 
instance wants X+1 vCPUs, should we allow it?  (Regardless of overcommit ratio.) 
 There is also an equivalent question for RAM.


Currently we have two different answers depending on whether numa topology is 
involved or not.  Should we change one of them to make it consistent with the 
other?  If so, a) which one should we change, and b) how would we do that given 
that it results in a user-visible behaviour change?  (Maybe a microversion, even 
though the actual API doesn't change, just whether the request passes the 
scheduler filter or not?)


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev