Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-21 Thread Stephen Gran

On 21/11/13 15:49, Chris Friesen wrote:

On 11/21/2013 02:58 AM, Soren Hansen wrote:

2013/11/20 Chris Friesen :

What about a hybrid solution?
There is data that is only used by the scheduler--for performance
reasons
maybe it would make sense to store that information in RAM as
described at

https://blueprints.launchpad.net/nova/+spec/no-db-scheduler

For the rest of the data, perhaps it could be persisted using some
alternate
backend.


What would that solve?


The scheduler has performance issues. Currently the design is
suboptimal--the compute nodes write resource information to the
database, then the scheduler pulls a bunch of data out of the database,
copies it over into python, and analyzes it in python to do the filtering.

For large clusters this can lead to significant time spent scheduling.

Based on the above, for performance reasons it would be beneficial for
the scheduler to have the necessary data already available in python
rather than needing to pull it out of the database.

For other uses of the database people are proposing alternatives to SQL
in order to get reliability. I don't have any experience with that so I
have no opinion on it. But as long as the data is sitting on-disk (or
even in a database process instead of in the scheduler process) it's
going to slow down the scheduler.

If the primary consumer of a give piece of data (free ram, free cpu,
free disk, etc) is the scheduler, then I think it makes sense for the
compute nodes to report it directly to the scheduler.


I suspect that a large performance gain could be had by 2 fairly simple 
changes:


a) Break the scheduler in two, so that the chunk of code receiving 
updates from the compute nodes can't block the chunk of code scheduling 
instances.


b) Use a memcache backend instead of SQL for compute resource information.

My fear with keeping data local to a scheduler instance is that local 
state destroys scalability.


Just a thought.

Cheers,
--
Stephen Gran
Senior Systems Integrator - theguardian.com
Please consider the environment before printing this email.
--
Visit theguardian.com   

On your mobile, download the Guardian iPhone app theguardian.com/iphone and our iPad edition theguardian.com/iPad   
Save up to 33% by subscribing to the Guardian and Observer - choose the papers you want and get full digital access.

Visit subscribe.theguardian.com

This e-mail and all attachments are confidential and may also
be privileged. If you are not the named recipient, please notify
the sender and delete the e-mail and all attachments immediately.
Do not disclose the contents to another person. You may not use
the information for any purpose, or store, or copy, it in any way.

Guardian News & Media Limited is not liable for any computer
viruses or other material transmitted with or as part of this
e-mail. You should employ virus checking software.

Guardian News & Media Limited

A member of Guardian Media Group plc
Registered Office
PO Box 68164
Kings Place
90 York Way
London
N1P 2AP

Registered in England Number 908396

--


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-21 Thread Chris Friesen

On 11/21/2013 02:58 AM, Soren Hansen wrote:

2013/11/20 Chris Friesen :

What about a hybrid solution?
There is data that is only used by the scheduler--for performance reasons
maybe it would make sense to store that information in RAM as described at

https://blueprints.launchpad.net/nova/+spec/no-db-scheduler

For the rest of the data, perhaps it could be persisted using some alternate
backend.


What would that solve?


The scheduler has performance issues.  Currently the design is 
suboptimal--the compute nodes write resource information to the 
database, then the scheduler pulls a bunch of data out of the database, 
copies it over into python, and analyzes it in python to do the filtering.


For large clusters this can lead to significant time spent scheduling.

Based on the above, for performance reasons it would be beneficial for 
the scheduler to have the necessary data already available in python 
rather than needing to pull it out of the database.


For other uses of the database people are proposing alternatives to SQL 
in order to get reliability.  I don't have any experience with that so I 
have no opinion on it.  But as long as the data is sitting on-disk (or 
even in a database process instead of in the scheduler process) it's 
going to slow down the scheduler.


If the primary consumer of a give piece of data (free ram, free cpu, 
free disk, etc) is the scheduler, then I think it makes sense for the 
compute nodes to report it directly to the scheduler.


Chris


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-21 Thread Chris Friesen

On 11/21/2013 10:52 AM, Stephen Gran wrote:

On 21/11/13 15:49, Chris Friesen wrote:

On 11/21/2013 02:58 AM, Soren Hansen wrote:

2013/11/20 Chris Friesen :

What about a hybrid solution?
There is data that is only used by the scheduler--for performance
reasons
maybe it would make sense to store that information in RAM as
described at

https://blueprints.launchpad.net/nova/+spec/no-db-scheduler




I suspect that a large performance gain could be had by 2 fairly simple
changes:

a) Break the scheduler in two, so that the chunk of code receiving
updates from the compute nodes can't block the chunk of code scheduling
instances.

b) Use a memcache backend instead of SQL for compute resource information.

My fear with keeping data local to a scheduler instance is that local
state destroys scalability.


"a" and "b" are basically what is described in the blueprint above.

Your fear is addressed by having the compute nodes broadcast their 
resource information to all scheduler instances.


As I see it, the scheduler could then make a tentative scheduling 
decision, attempt to reserve the resources from the compute node (which 
would trigger the compute node to send updated resource information in 
all the scheduler instances), and assuming it got the requested 
resources it could then proceed with bringing up the resource.


Chris

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-21 Thread Soren Hansen
2013/11/20 Chris Friesen :
> What about a hybrid solution?
> There is data that is only used by the scheduler--for performance reasons
> maybe it would make sense to store that information in RAM as described at
>
> https://blueprints.launchpad.net/nova/+spec/no-db-scheduler
>
> For the rest of the data, perhaps it could be persisted using some alternate
> backend.

What would that solve?

-- 
Soren Hansen | http://linux2go.dk/
Ubuntu Developer | http://www.ubuntu.com/
OpenStack Developer  | http://www.openstack.org/

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-20 Thread Mike Wilson
I agree heartily with the availability and resiliency aspect.  For me, that
is the biggest reason to consider a NOSQL backend. The other potential
performance benefits are attractive to me also.

-Mike


On Wed, Nov 20, 2013 at 9:06 AM, Soren Hansen  wrote:

> 2013/11/18 Mike Spreitzer :
> > There were some concerns expressed at the summit about scheduler
> > scalability in Nova, and a little recollection of Boris' proposal to
> > keep the needed state in memory.
>
>
> > I also heard one guy say that he thinks Nova does not really need a
> > general SQL database, that a NOSQL database with a bit of
> > denormalization and/or client-maintained secondary indices could
> > suffice.
>
> I may have said something along those lines. Just to clarify -- since
> you started this post by talking about scheduler scalability -- the main
> motivation for using a non-SQL backend isn't scheduler scalability, it's
> availability and resilience. I just don't accept the failure modes that
> MySQL (and derivatives such as Galera) impose.
>
> > Has that sort of thing been considered before?
>
> It's been talked about on and off since... well, probably since we
> started this project.
>
> > What is the community's level of interest in exploring that?
>
> The session on adding a backend using a non-SQL datastore was pretty
> well attended.
>
>
> --
> Soren Hansen | http://linux2go.dk/
> Ubuntu Developer | http://www.ubuntu.com/
> OpenStack Developer  | http://www.openstack.org/
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-20 Thread Chris Friesen

On 11/20/2013 10:06 AM, Soren Hansen wrote:

2013/11/18 Mike Spreitzer :

There were some concerns expressed at the summit about scheduler
scalability in Nova, and a little recollection of Boris' proposal to
keep the needed state in memory.




I also heard one guy say that he thinks Nova does not really need a
general SQL database, that a NOSQL database with a bit of
denormalization and/or client-maintained secondary indices could
suffice.


I may have said something along those lines. Just to clarify -- since
you started this post by talking about scheduler scalability -- the main
motivation for using a non-SQL backend isn't scheduler scalability, it's
availability and resilience. I just don't accept the failure modes that
MySQL (and derivatives such as Galera) impose.


Has that sort of thing been considered before?


It's been talked about on and off since... well, probably since we
started this project.


What is the community's level of interest in exploring that?


The session on adding a backend using a non-SQL datastore was pretty
well attended.


What about a hybrid solution?

There is data that is only used by the scheduler--for performance 
reasons maybe it would make sense to store that information in RAM as 
described at


https://blueprints.launchpad.net/nova/+spec/no-db-scheduler

For the rest of the data, perhaps it could be persisted using some 
alternate backend.


Chris


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-20 Thread Soren Hansen
2013/11/18 Mike Spreitzer :
> There were some concerns expressed at the summit about scheduler
> scalability in Nova, and a little recollection of Boris' proposal to
> keep the needed state in memory.


> I also heard one guy say that he thinks Nova does not really need a
> general SQL database, that a NOSQL database with a bit of
> denormalization and/or client-maintained secondary indices could
> suffice.

I may have said something along those lines. Just to clarify -- since
you started this post by talking about scheduler scalability -- the main
motivation for using a non-SQL backend isn't scheduler scalability, it's
availability and resilience. I just don't accept the failure modes that
MySQL (and derivatives such as Galera) impose.

> Has that sort of thing been considered before?

It's been talked about on and off since... well, probably since we
started this project.

> What is the community's level of interest in exploring that?

The session on adding a backend using a non-SQL datastore was pretty
well attended.


-- 
Soren Hansen | http://linux2go.dk/
Ubuntu Developer | http://www.ubuntu.com/
OpenStack Developer  | http://www.openstack.org/

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-19 Thread Alex Glikson
Another possible approach could be that only part of the 50 succeeds 
(reported back to the user), and then a retry mechanism at a higher level 
would potentially approach the other partition/scheduler - similar to 
today's retries.

Regards,
Alex




From:   Mike Wilson 
To: "OpenStack Development Mailing List (not for usage questions)" 
, 
Date:   20/11/2013 05:53 AM
Subject:    Re: [openstack-dev] [Nova] Does Nova really need an SQL 
database?



I've been thinking about this use case for a DHT-like design, I think I 
want to do what other people have alluded to here and try and intercept 
problematic requests like this one in some sort of "pre sending to 
ring-segment" stage. In this case the "pre-stage" could decide to send 
this off to a scheduler that has a more complete view of the world. 
Alternatively, don't make a single request for 50 instances, just send 50 
requests for one? Is that a viable thing to do for this use case?

-Mike


On Tue, Nov 19, 2013 at 7:03 PM, Joshua Harlow  
wrote:
At yahoo at least 50+ simultaneous will be the common case (maybe we are
special).

Think of what happens on www.yahoo.com say on the olympics, news.yahoo.com
could need 50+ very very quickly (especially if say a gold medal is won by
some famous person). So I wouldn't discount those being the common case
(may not be common for some, but is common for others). In fact any
website with spurious/spikey traffic will have the same desire; so it
might be a target use-case for website like companies (or ones that can't
upfront predict spikes).

Overall though I think what u said about 'don't fill it up' is good
general knowledge. Filling up stuff beyond a certain threshold is
dangerous just in general (one should only push the limits so far before
madness).

On 11/19/13 4:08 PM, "Clint Byrum"  wrote:

>Excerpts from Chris Friesen's message of 2013-11-19 12:18:16 -0800:
>> On 11/19/2013 01:51 PM, Clint Byrum wrote:
>> > Excerpts from Chris Friesen's message of 2013-11-19 11:37:02 -0800:
>> >> On 11/19/2013 12:35 PM, Clint Byrum wrote:
>> >>
>> >>> Each scheduler process can own a different set of resources. If 
they
>> >>> each grab instance requests in a round-robin fashion, then they 
will
>> >>> fill their resources up in a relatively well balanced way until one
>> >>> scheduler's resources are exhausted. At that time it should bow out
>>of
>> >>> taking new instances. If it can't fit a request in, it should kick
>>the
>> >>> request out for retry on another scheduler.
>> >>>
>> >>> In this way, they only need to be in sync in that they need a way 
to
>> >>> agree on who owns which resources. A distributed hash table that
>>gets
>> >>> refreshed whenever schedulers come and go would be fine for that.
>> >>
>> >> That has some potential, but at high occupancy you could end up
>>refusing
>> >> to schedule something because no one scheduler has sufficient
>>resources
>> >> even if the cluster as a whole does.
>> >>
>> >
>> > I'm not sure what you mean here. What resource spans multiple compute
>> > hosts?
>>
>> Imagine the cluster is running close to full occupancy, each scheduler
>> has room for 40 more instances.  Now I come along and issue a single
>> request to boot 50 instances.  The cluster has room for that, but none
>> of the schedulers do.
>>
>
>You're assuming that all 50 come in at once. That is only one use case
>and not at all the most common.
>
>> >> This gets worse once you start factoring in things like heat and
>> >> instance groups that will want to schedule whole sets of resources
>> >> (instances, IP addresses, network links, cinder volumes, etc.) at
>>once
>> >> with constraints on where they can be placed relative to each other.
>>
>> > Actually that is rather simple. Such requests have to be serialized
>> > into a work-flow. So if you say "give me 2 instances in 2 different
>> > locations" then you allocate 1 instance, and then another one with
>> > 'not_in_location(1)' as a condition.
>>
>> Actually, you don't want to serialize it, you want to hand the whole
>>set
>> of resource requests and constraints to the scheduler all at once.
>>
>> If you do them one at a time, then early decisions made with
>> less-than-complete knowledge can result in later scheduling requests
>> failing due to being unable to meet constraints, even if there are
>> actuall

Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-19 Thread Mike Wilson
I've been thinking about this use case for a DHT-like design, I think I
want to do what other people have alluded to here and try and intercept
problematic requests like this one in some sort of "pre sending to
ring-segment" stage. In this case the "pre-stage" could decide to send this
off to a scheduler that has a more complete view of the world.
Alternatively, don't make a single request for 50 instances, just send 50
requests for one? Is that a viable thing to do for this use case?

-Mike


On Tue, Nov 19, 2013 at 7:03 PM, Joshua Harlow wrote:

> At yahoo at least 50+ simultaneous will be the common case (maybe we are
> special).
>
> Think of what happens on www.yahoo.com say on the olympics, news.yahoo.com
> could need 50+ very very quickly (especially if say a gold medal is won by
> some famous person). So I wouldn't discount those being the common case
> (may not be common for some, but is common for others). In fact any
> website with spurious/spikey traffic will have the same desire; so it
> might be a target use-case for website like companies (or ones that can't
> upfront predict spikes).
>
> Overall though I think what u said about 'don't fill it up' is good
> general knowledge. Filling up stuff beyond a certain threshold is
> dangerous just in general (one should only push the limits so far before
> madness).
>
> On 11/19/13 4:08 PM, "Clint Byrum"  wrote:
>
> >Excerpts from Chris Friesen's message of 2013-11-19 12:18:16 -0800:
> >> On 11/19/2013 01:51 PM, Clint Byrum wrote:
> >> > Excerpts from Chris Friesen's message of 2013-11-19 11:37:02 -0800:
> >> >> On 11/19/2013 12:35 PM, Clint Byrum wrote:
> >> >>
> >> >>> Each scheduler process can own a different set of resources. If they
> >> >>> each grab instance requests in a round-robin fashion, then they will
> >> >>> fill their resources up in a relatively well balanced way until one
> >> >>> scheduler's resources are exhausted. At that time it should bow out
> >>of
> >> >>> taking new instances. If it can't fit a request in, it should kick
> >>the
> >> >>> request out for retry on another scheduler.
> >> >>>
> >> >>> In this way, they only need to be in sync in that they need a way to
> >> >>> agree on who owns which resources. A distributed hash table that
> >>gets
> >> >>> refreshed whenever schedulers come and go would be fine for that.
> >> >>
> >> >> That has some potential, but at high occupancy you could end up
> >>refusing
> >> >> to schedule something because no one scheduler has sufficient
> >>resources
> >> >> even if the cluster as a whole does.
> >> >>
> >> >
> >> > I'm not sure what you mean here. What resource spans multiple compute
> >> > hosts?
> >>
> >> Imagine the cluster is running close to full occupancy, each scheduler
> >> has room for 40 more instances.  Now I come along and issue a single
> >> request to boot 50 instances.  The cluster has room for that, but none
> >> of the schedulers do.
> >>
> >
> >You're assuming that all 50 come in at once. That is only one use case
> >and not at all the most common.
> >
> >> >> This gets worse once you start factoring in things like heat and
> >> >> instance groups that will want to schedule whole sets of resources
> >> >> (instances, IP addresses, network links, cinder volumes, etc.) at
> >>once
> >> >> with constraints on where they can be placed relative to each other.
> >>
> >> > Actually that is rather simple. Such requests have to be serialized
> >> > into a work-flow. So if you say "give me 2 instances in 2 different
> >> > locations" then you allocate 1 instance, and then another one with
> >> > 'not_in_location(1)' as a condition.
> >>
> >> Actually, you don't want to serialize it, you want to hand the whole
> >>set
> >> of resource requests and constraints to the scheduler all at once.
> >>
> >> If you do them one at a time, then early decisions made with
> >> less-than-complete knowledge can result in later scheduling requests
> >> failing due to being unable to meet constraints, even if there are
> >> actually sufficient resources in the cluster.
> >>
> >> The "VM ensembles" document at
> >>
> >>
> https://docs.google.com/document/d/1bAMtkaIFn4ZSMqqsXjs_riXofuRvApa--qo4U
> >>Twsmhw/edit?pli=1
> >> has a good example of how one-at-a-time scheduling can cause spurious
> >> failures.
> >>
> >> And if you're handing the whole set of requests to a scheduler all at
> >> once, then you want the scheduler to have access to as many resources
> >>as
> >> possible so that it has the highest likelihood of being able to satisfy
> >> the request given the constraints.
> >
> >This use case is real and valid, which is why I think there is room for
> >multiple approaches. For instance the situation you describe can also be
> >dealt with by just having the cloud stay under-utilized and accepting
> >that when you get over a certain percentage utilized spurious failures
> >will happen. We have a similar solution in the ext3 filesystem on Linux.
> >Don't fill it up, or suffer a huge perfo

Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-19 Thread Joshua Harlow
At yahoo at least 50+ simultaneous will be the common case (maybe we are
special).

Think of what happens on www.yahoo.com say on the olympics, news.yahoo.com
could need 50+ very very quickly (especially if say a gold medal is won by
some famous person). So I wouldn't discount those being the common case
(may not be common for some, but is common for others). In fact any
website with spurious/spikey traffic will have the same desire; so it
might be a target use-case for website like companies (or ones that can't
upfront predict spikes).

Overall though I think what u said about 'don't fill it up' is good
general knowledge. Filling up stuff beyond a certain threshold is
dangerous just in general (one should only push the limits so far before
madness).

On 11/19/13 4:08 PM, "Clint Byrum"  wrote:

>Excerpts from Chris Friesen's message of 2013-11-19 12:18:16 -0800:
>> On 11/19/2013 01:51 PM, Clint Byrum wrote:
>> > Excerpts from Chris Friesen's message of 2013-11-19 11:37:02 -0800:
>> >> On 11/19/2013 12:35 PM, Clint Byrum wrote:
>> >>
>> >>> Each scheduler process can own a different set of resources. If they
>> >>> each grab instance requests in a round-robin fashion, then they will
>> >>> fill their resources up in a relatively well balanced way until one
>> >>> scheduler's resources are exhausted. At that time it should bow out
>>of
>> >>> taking new instances. If it can't fit a request in, it should kick
>>the
>> >>> request out for retry on another scheduler.
>> >>>
>> >>> In this way, they only need to be in sync in that they need a way to
>> >>> agree on who owns which resources. A distributed hash table that
>>gets
>> >>> refreshed whenever schedulers come and go would be fine for that.
>> >>
>> >> That has some potential, but at high occupancy you could end up
>>refusing
>> >> to schedule something because no one scheduler has sufficient
>>resources
>> >> even if the cluster as a whole does.
>> >>
>> >
>> > I'm not sure what you mean here. What resource spans multiple compute
>> > hosts?
>> 
>> Imagine the cluster is running close to full occupancy, each scheduler
>> has room for 40 more instances.  Now I come along and issue a single
>> request to boot 50 instances.  The cluster has room for that, but none
>> of the schedulers do.
>> 
>
>You're assuming that all 50 come in at once. That is only one use case
>and not at all the most common.
>
>> >> This gets worse once you start factoring in things like heat and
>> >> instance groups that will want to schedule whole sets of resources
>> >> (instances, IP addresses, network links, cinder volumes, etc.) at
>>once
>> >> with constraints on where they can be placed relative to each other.
>> 
>> > Actually that is rather simple. Such requests have to be serialized
>> > into a work-flow. So if you say "give me 2 instances in 2 different
>> > locations" then you allocate 1 instance, and then another one with
>> > 'not_in_location(1)' as a condition.
>> 
>> Actually, you don't want to serialize it, you want to hand the whole
>>set 
>> of resource requests and constraints to the scheduler all at once.
>> 
>> If you do them one at a time, then early decisions made with
>> less-than-complete knowledge can result in later scheduling requests
>> failing due to being unable to meet constraints, even if there are
>> actually sufficient resources in the cluster.
>> 
>> The "VM ensembles" document at
>> 
>>https://docs.google.com/document/d/1bAMtkaIFn4ZSMqqsXjs_riXofuRvApa--qo4U
>>Twsmhw/edit?pli=1
>> has a good example of how one-at-a-time scheduling can cause spurious
>> failures.
>> 
>> And if you're handing the whole set of requests to a scheduler all at
>> once, then you want the scheduler to have access to as many resources
>>as 
>> possible so that it has the highest likelihood of being able to satisfy
>> the request given the constraints.
>
>This use case is real and valid, which is why I think there is room for
>multiple approaches. For instance the situation you describe can also be
>dealt with by just having the cloud stay under-utilized and accepting
>that when you get over a certain percentage utilized spurious failures
>will happen. We have a similar solution in the ext3 filesystem on Linux.
>Don't fill it up, or suffer a huge performance penalty.
>
>___
>OpenStack-dev mailing list
>OpenStack-dev@lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-19 Thread Clint Byrum
Excerpts from Chris Friesen's message of 2013-11-19 12:18:16 -0800:
> On 11/19/2013 01:51 PM, Clint Byrum wrote:
> > Excerpts from Chris Friesen's message of 2013-11-19 11:37:02 -0800:
> >> On 11/19/2013 12:35 PM, Clint Byrum wrote:
> >>
> >>> Each scheduler process can own a different set of resources. If they
> >>> each grab instance requests in a round-robin fashion, then they will
> >>> fill their resources up in a relatively well balanced way until one
> >>> scheduler's resources are exhausted. At that time it should bow out of
> >>> taking new instances. If it can't fit a request in, it should kick the
> >>> request out for retry on another scheduler.
> >>>
> >>> In this way, they only need to be in sync in that they need a way to
> >>> agree on who owns which resources. A distributed hash table that gets
> >>> refreshed whenever schedulers come and go would be fine for that.
> >>
> >> That has some potential, but at high occupancy you could end up refusing
> >> to schedule something because no one scheduler has sufficient resources
> >> even if the cluster as a whole does.
> >>
> >
> > I'm not sure what you mean here. What resource spans multiple compute
> > hosts?
> 
> Imagine the cluster is running close to full occupancy, each scheduler 
> has room for 40 more instances.  Now I come along and issue a single 
> request to boot 50 instances.  The cluster has room for that, but none 
> of the schedulers do.
> 

You're assuming that all 50 come in at once. That is only one use case
and not at all the most common.

> >> This gets worse once you start factoring in things like heat and
> >> instance groups that will want to schedule whole sets of resources
> >> (instances, IP addresses, network links, cinder volumes, etc.) at once
> >> with constraints on where they can be placed relative to each other.
> 
> > Actually that is rather simple. Such requests have to be serialized
> > into a work-flow. So if you say "give me 2 instances in 2 different
> > locations" then you allocate 1 instance, and then another one with
> > 'not_in_location(1)' as a condition.
> 
> Actually, you don't want to serialize it, you want to hand the whole set 
> of resource requests and constraints to the scheduler all at once.
> 
> If you do them one at a time, then early decisions made with 
> less-than-complete knowledge can result in later scheduling requests 
> failing due to being unable to meet constraints, even if there are 
> actually sufficient resources in the cluster.
> 
> The "VM ensembles" document at
> https://docs.google.com/document/d/1bAMtkaIFn4ZSMqqsXjs_riXofuRvApa--qo4UTwsmhw/edit?pli=1
>  
> has a good example of how one-at-a-time scheduling can cause spurious 
> failures.
> 
> And if you're handing the whole set of requests to a scheduler all at 
> once, then you want the scheduler to have access to as many resources as 
> possible so that it has the highest likelihood of being able to satisfy 
> the request given the constraints.

This use case is real and valid, which is why I think there is room for
multiple approaches. For instance the situation you describe can also be
dealt with by just having the cloud stay under-utilized and accepting
that when you get over a certain percentage utilized spurious failures
will happen. We have a similar solution in the ext3 filesystem on Linux.
Don't fill it up, or suffer a huge performance penalty.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-19 Thread Chris Friesen

On 11/19/2013 01:51 PM, Clint Byrum wrote:

Excerpts from Chris Friesen's message of 2013-11-19 11:37:02 -0800:

On 11/19/2013 12:35 PM, Clint Byrum wrote:


Each scheduler process can own a different set of resources. If they
each grab instance requests in a round-robin fashion, then they will
fill their resources up in a relatively well balanced way until one
scheduler's resources are exhausted. At that time it should bow out of
taking new instances. If it can't fit a request in, it should kick the
request out for retry on another scheduler.

In this way, they only need to be in sync in that they need a way to
agree on who owns which resources. A distributed hash table that gets
refreshed whenever schedulers come and go would be fine for that.


That has some potential, but at high occupancy you could end up refusing
to schedule something because no one scheduler has sufficient resources
even if the cluster as a whole does.



I'm not sure what you mean here. What resource spans multiple compute
hosts?


Imagine the cluster is running close to full occupancy, each scheduler 
has room for 40 more instances.  Now I come along and issue a single 
request to boot 50 instances.  The cluster has room for that, but none 
of the schedulers do.



This gets worse once you start factoring in things like heat and
instance groups that will want to schedule whole sets of resources
(instances, IP addresses, network links, cinder volumes, etc.) at once
with constraints on where they can be placed relative to each other.



Actually that is rather simple. Such requests have to be serialized
into a work-flow. So if you say "give me 2 instances in 2 different
locations" then you allocate 1 instance, and then another one with
'not_in_location(1)' as a condition.


Actually, you don't want to serialize it, you want to hand the whole set 
of resource requests and constraints to the scheduler all at once.


If you do them one at a time, then early decisions made with 
less-than-complete knowledge can result in later scheduling requests 
failing due to being unable to meet constraints, even if there are 
actually sufficient resources in the cluster.


The "VM ensembles" document at
https://docs.google.com/document/d/1bAMtkaIFn4ZSMqqsXjs_riXofuRvApa--qo4UTwsmhw/edit?pli=1 
has a good example of how one-at-a-time scheduling can cause spurious 
failures.


And if you're handing the whole set of requests to a scheduler all at 
once, then you want the scheduler to have access to as many resources as 
possible so that it has the highest likelihood of being able to satisfy 
the request given the constraints.


Chris

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-19 Thread Clint Byrum
Excerpts from Chris Friesen's message of 2013-11-19 11:37:02 -0800:
> On 11/19/2013 12:35 PM, Clint Byrum wrote:
> 
> > Each scheduler process can own a different set of resources. If they
> > each grab instance requests in a round-robin fashion, then they will
> > fill their resources up in a relatively well balanced way until one
> > scheduler's resources are exhausted. At that time it should bow out of
> > taking new instances. If it can't fit a request in, it should kick the
> > request out for retry on another scheduler.
> >
> > In this way, they only need to be in sync in that they need a way to
> > agree on who owns which resources. A distributed hash table that gets
> > refreshed whenever schedulers come and go would be fine for that.
> 
> That has some potential, but at high occupancy you could end up refusing 
> to schedule something because no one scheduler has sufficient resources 
> even if the cluster as a whole does.
> 

I'm not sure what you mean here. What resource spans multiple compute
hosts?

> This gets worse once you start factoring in things like heat and 
> instance groups that will want to schedule whole sets of resources 
> (instances, IP addresses, network links, cinder volumes, etc.) at once 
> with constraints on where they can be placed relative to each other.
> 

Actually that is rather simple. Such requests have to be serialized
into a work-flow. So if you say "give me 2 instances in 2 different
locations" then you allocate 1 instance, and then another one with
'not_in_location(1)' as a condition.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-19 Thread Chris Friesen

On 11/19/2013 12:27 PM, Joshua Harlow wrote:

Personally I would prefer #3 from the below. #2 I think will still have to
deal with consistency issues, just switching away from a DB doesn't make
magical ponies and unicorns appear (in-fact it can potentially make the
problem worse if its done incorrectly - and its pretty easy to get it
wrong IMHO). #1 could also work, but then u hit a vertical scaling limit
(works if u paid oracle for there DB or IBM for DB2 I suppose). I prefer
#3 since I think it is honestly needed under all solutions.


Personally I think we need a combination of #3 (resource reservation) 
with something else to speed up scheduling.


We have multiple filters that currently loop over all the compute nodes, 
gathering a bunch of data from the DB and then ignoring most of that 
data while doing some simple logic in python.


There is really no need for the bulk of the resource information to be 
stored in the DB.  The compute nodes could broadcast their current state 
to all scheduler nodes, and the scheduler nodes could reserve resources 
directly from the compute nodes (triggering an update of all the other 
scheduler nodes).


Failing that, it should be possible to push at least some of the 
filtering down into the DB itself. Stuff like ramfilter or cpufilter 
would be trival (and fast) as an SQL query.


Chris

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-19 Thread Chris Friesen

On 11/19/2013 12:35 PM, Clint Byrum wrote:


Each scheduler process can own a different set of resources. If they
each grab instance requests in a round-robin fashion, then they will
fill their resources up in a relatively well balanced way until one
scheduler's resources are exhausted. At that time it should bow out of
taking new instances. If it can't fit a request in, it should kick the
request out for retry on another scheduler.

In this way, they only need to be in sync in that they need a way to
agree on who owns which resources. A distributed hash table that gets
refreshed whenever schedulers come and go would be fine for that.


That has some potential, but at high occupancy you could end up refusing 
to schedule something because no one scheduler has sufficient resources 
even if the cluster as a whole does.


This gets worse once you start factoring in things like heat and 
instance groups that will want to schedule whole sets of resources 
(instances, IP addresses, network links, cinder volumes, etc.) at once 
with constraints on where they can be placed relative to each other.


Chris


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-19 Thread Joshua Harlow
Sorry that was I prefer #3 (not #2) at the end there. Keyboard failure ;)

On 11/19/13 10:27 AM, "Joshua Harlow"  wrote:

>Personally I would prefer #3 from the below. #2 I think will still have to
>deal with consistency issues, just switching away from a DB doesn't make
>magical ponies and unicorns appear (in-fact it can potentially make the
>problem worse if its done incorrectly - and its pretty easy to get it
>wrong IMHO). #1 could also work, but then u hit a vertical scaling limit
>(works if u paid oracle for there DB or IBM for DB2 I suppose). I prefer
>#2 since I think it is honestly needed under all solutions.
>
>On 11/19/13 9:29 AM, "Chris Friesen"  wrote:
>
>>On 11/18/2013 06:47 PM, Joshua Harlow wrote:
>>> An idea related to this, what would need to be done to make the DB have
>>> the exact state that a compute node is going through (and therefore the
>>> scheduler would not make unreliable/racey decisions, even when there
>>>are
>>> multiple schedulers). It's not like we are dealing with a system which
>>> can not know the exact state (as long as the compute nodes are
>>>connected
>>> to the network, and a network partition does not occur).
>>
>>How would you synchronize the various schedulers with each other?
>>Suppose you have multiple scheduler nodes all trying to boot multiple
>>instances each.
>>
>>Even if each at the start of the process each scheduler has a perfect
>>view of the system, each scheduler would need to have a view of what
>>every other scheduler is doing in order to not make racy decisions.
>>
>>I see a few options:
>>
>>1) Push scheduling down into the database itself.  Implement scheduler
>>filters as SQL queries or stored procedures.
>>
>>2) Get rid of the DB for scheduling.  It looks like people are working
>>on this: https://blueprints.launchpad.net/nova/+spec/no-db-scheduler
>>
>>3) Do multi-stage scheduling.  Do a "tentative" schedule, then try and
>>update the DB to reserve all the necessary resources.  If that fails,
>>someone got there ahead of you so try again with the new data.
>>
>>Chris
>>
>>___
>>OpenStack-dev mailing list
>>OpenStack-dev@lists.openstack.org
>>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-19 Thread Clint Byrum
Excerpts from Chris Friesen's message of 2013-11-19 09:29:00 -0800:
> On 11/18/2013 06:47 PM, Joshua Harlow wrote:
> > An idea related to this, what would need to be done to make the DB have
> > the exact state that a compute node is going through (and therefore the
> > scheduler would not make unreliable/racey decisions, even when there are
> > multiple schedulers). It's not like we are dealing with a system which
> > can not know the exact state (as long as the compute nodes are connected
> > to the network, and a network partition does not occur).
> 
> How would you synchronize the various schedulers with each other? 
> Suppose you have multiple scheduler nodes all trying to boot multiple 
> instances each.
> 
> Even if each at the start of the process each scheduler has a perfect 
> view of the system, each scheduler would need to have a view of what 
> every other scheduler is doing in order to not make racy decisions.
> 

Your question assumes they need to be "in sync" at a granular level.

Each scheduler process can own a different set of resources. If they
each grab instance requests in a round-robin fashion, then they will
fill their resources up in a relatively well balanced way until one
scheduler's resources are exhausted. At that time it should bow out of
taking new instances. If it can't fit a request in, it should kick the
request out for retry on another scheduler.

In this way, they only need to be in sync in that they need a way to
agree on who owns which resources. A distributed hash table that gets
refreshed whenever schedulers come and go would be fine for that.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-19 Thread Joshua Harlow
Personally I would prefer #3 from the below. #2 I think will still have to
deal with consistency issues, just switching away from a DB doesn't make
magical ponies and unicorns appear (in-fact it can potentially make the
problem worse if its done incorrectly - and its pretty easy to get it
wrong IMHO). #1 could also work, but then u hit a vertical scaling limit
(works if u paid oracle for there DB or IBM for DB2 I suppose). I prefer
#2 since I think it is honestly needed under all solutions.

On 11/19/13 9:29 AM, "Chris Friesen"  wrote:

>On 11/18/2013 06:47 PM, Joshua Harlow wrote:
>> An idea related to this, what would need to be done to make the DB have
>> the exact state that a compute node is going through (and therefore the
>> scheduler would not make unreliable/racey decisions, even when there are
>> multiple schedulers). It's not like we are dealing with a system which
>> can not know the exact state (as long as the compute nodes are connected
>> to the network, and a network partition does not occur).
>
>How would you synchronize the various schedulers with each other?
>Suppose you have multiple scheduler nodes all trying to boot multiple
>instances each.
>
>Even if each at the start of the process each scheduler has a perfect
>view of the system, each scheduler would need to have a view of what
>every other scheduler is doing in order to not make racy decisions.
>
>I see a few options:
>
>1) Push scheduling down into the database itself.  Implement scheduler
>filters as SQL queries or stored procedures.
>
>2) Get rid of the DB for scheduling.  It looks like people are working
>on this: https://blueprints.launchpad.net/nova/+spec/no-db-scheduler
>
>3) Do multi-stage scheduling.  Do a "tentative" schedule, then try and
>update the DB to reserve all the necessary resources.  If that fails,
>someone got there ahead of you so try again with the new data.
>
>Chris
>
>___
>OpenStack-dev mailing list
>OpenStack-dev@lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-19 Thread Caitlin Bestler

On 11/18/2013 11:35 AM, Mike Spreitzer wrote:

There were some concerns expressed at the summit about scheduler
scalability in Nova, and a little recollection of Boris' proposal to
keep the needed state in memory.  I also heard one guy say that he
thinks Nova does not really need a general SQL database, that a NOSQL
database with a bit of denormalization and/or client-maintained
secondary indices could suffice.  Has that sort of thing been considered
before?  What is the community's level of interest in exploring that?

Thanks,
Mike



How the data is stored is not the central question. The real issue is 
how is the data normalized and distributed.


Data that is designed to be distributed deals with temporary 
inconsistencies and only worries about eventual consistency.

Once you have that you can store the data in Objects, or in
a distributed database.

If you define your data so that you need global synchronization then
you will always be fighting scaling issues.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-19 Thread Chris Friesen

On 11/18/2013 06:47 PM, Joshua Harlow wrote:

An idea related to this, what would need to be done to make the DB have
the exact state that a compute node is going through (and therefore the
scheduler would not make unreliable/racey decisions, even when there are
multiple schedulers). It's not like we are dealing with a system which
can not know the exact state (as long as the compute nodes are connected
to the network, and a network partition does not occur).


How would you synchronize the various schedulers with each other? 
Suppose you have multiple scheduler nodes all trying to boot multiple 
instances each.


Even if each at the start of the process each scheduler has a perfect 
view of the system, each scheduler would need to have a view of what 
every other scheduler is doing in order to not make racy decisions.


I see a few options:

1) Push scheduling down into the database itself.  Implement scheduler 
filters as SQL queries or stored procedures.


2) Get rid of the DB for scheduling.  It looks like people are working 
on this: https://blueprints.launchpad.net/nova/+spec/no-db-scheduler


3) Do multi-stage scheduling.  Do a "tentative" schedule, then try and 
update the DB to reserve all the necessary resources.  If that fails, 
someone got there ahead of you so try again with the new data.


Chris

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-18 Thread Joshua Harlow
I agree that we should embrace eventual consistency (under certain cases), but 
it begs the question of what are u eventually consistent on (maybe u shouldn't 
be eventually consistent on resource knowledge). U don't have to be eventually 
consistent on all the things.

So lets assume we are always consistent about static-like resources and then 
you could offer a 'consistent' scheduler that has no races; u could offer a 
less-consistent scheduler if there was a pluggable way to do this. But then the 
question becomes how does the 'consistent' scheduler reserve resources on a 
compute-node, before actually asking that compute-node to do the work required 
to fulfill the resource request, this is where I think the reservation process 
would be useful (of course it then also brings along the question of what do u 
do about reservation timeouts and cleaning up inactive/unfulfilled 
reservations). Think of this as planning how to carve up a cake before u carve 
it up. Nova has enough knowledge (or should) to know what the cake currently 
looks like (with-in reason, aka minus the dynamic eventually consistent 
resources) and therefore it should be able to know how to plan the cake 
carving, before actually doing the cake carving.

This is similar/the same issue (?) that cinder is dealing with with its work on 
having a defined state-machine (see: 
https://etherpad.openstack.org/p/CinderTaskFlowFSM) and integrating with 
taskflow to gain reliable workflows. Personally I prefer a slower (optimize it 
later) and reliable consistent scheduler & workflow that keeps my operations 
people sane over a eventually consistent one that has a higher chance of making 
them insane ;)

Anyway, that’s my current brain dump (and cake analogy, ha).

-Josh

From: Joe Gordon mailto:joe.gord...@gmail.com>>
Date: Monday, November 18, 2013 5:33 PM
To: Joshua Harlow mailto:harlo...@yahoo-inc.com>>
Cc: "OpenStack Development Mailing List (not for usage questions)" 
mailto:openstack-dev@lists.openstack.org>>
Subject: Re: [openstack-dev] [Nova] Does Nova really need an SQL database?




On Mon, Nov 18, 2013 at 4:47 PM, Joshua Harlow 
mailto:harlo...@yahoo-inc.com>> wrote:
An idea related to this, what would need to be done to make the DB have the 
exact state that a compute node is going through (and therefore the scheduler 
would not make unreliable/racey decisions, even when there are multiple 
schedulers). It's not like we are dealing with a system which can not know the 
exact state (as long as the compute nodes are connected to the network, and a 
network partition does not occur).


Good question, I don't have a clear idea of the amount of work required to do 
this.

So maybe if we think about ways to correctly reserve resources, and keep up to 
date information about reserved resources we could then eliminate the race and 
eliminate the retries entirely?

What is the trade off here? What benefits do we get at what cost? I have a 
vague idea but just want to be explicit here.  Also for 'cloudy' things we 
embrace the eventually consistent model, and I don't think we should drop that.


From: Joe Gordon mailto:joe.gord...@gmail.com>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
mailto:openstack-dev@lists.openstack.org>>
Date: Monday, November 18, 2013 3:32 PM

To: "OpenStack Development Mailing List (not for usage questions)" 
mailto:openstack-dev@lists.openstack.org>>
Subject: Re: [openstack-dev] [Nova] Does Nova really need an SQL database?




On Mon, Nov 18, 2013 at 4:08 PM, yunhong jiang 
mailto:yunhong.ji...@linux.intel.com>> wrote:
On Mon, 2013-11-18 at 14:09 -0800, Joe Gordon wrote:
>
> Phil Day discussed this at the summit and I have finally gotten around
> to posting a POC of this.
>
> https://review.openstack.org/#/c/57053/

Hi, Joe, why you think the DB is not exact state in your followed commit
message? I think the DB is updated to date by resource tracker, am I
right (the resource tracker get the underlying resource information
periodically but I think that information is mostly static). And I think
the scheduler retry mainly comes from the race condition of multiple
scheduler instance.


You answered the question yourself, the compute nodes (indirectly) update the 
DB periodically, so the further you are from the last periodic update the less 
up to date the DB is.

Its there for both reasons.  But yes it was originally put there because of the 
multi scheduler race condition.


"We already have the concept that the DB isn't the exact state of the
world, right now it's updated every 10 seconds. And we use the scheduler
retry mechanism to handle cases where the scheduler was wrong. "


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org<mailto:OpenStack-dev@li

Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-18 Thread Joe Gordon
On Mon, Nov 18, 2013 at 4:47 PM, Joshua Harlow wrote:

>  An idea related to this, what would need to be done to make the DB have
> the exact state that a compute node is going through (and therefore the
> scheduler would not make unreliable/racey decisions, even when there are
> multiple schedulers). It's not like we are dealing with a system which can
> not know the exact state (as long as the compute nodes are connected to the
> network, and a network partition does not occur).
>
>
Good question, I don't have a clear idea of the amount of work required to
do this.


>  So maybe if we think about ways to correctly reserve resources, and keep
> up to date information about reserved resources we could then eliminate the
> race and eliminate the retries entirely?
>

What is the trade off here? What benefits do we get at what cost? I have a
vague idea but just want to be explicit here.  Also for 'cloudy' things we
embrace the eventually consistent model, and I don't think we should drop
that.


>   From: Joe Gordon 
> Reply-To: "OpenStack Development Mailing List (not for usage questions)" <
> openstack-dev@lists.openstack.org>
> Date: Monday, November 18, 2013 3:32 PM
>
> To: "OpenStack Development Mailing List (not for usage questions)" <
> openstack-dev@lists.openstack.org>
> Subject: Re: [openstack-dev] [Nova] Does Nova really need an SQL database?
>
>
>
>
> On Mon, Nov 18, 2013 at 4:08 PM, yunhong jiang <
> yunhong.ji...@linux.intel.com> wrote:
>
>> On Mon, 2013-11-18 at 14:09 -0800, Joe Gordon wrote:
>> >
>> > Phil Day discussed this at the summit and I have finally gotten around
>> > to posting a POC of this.
>> >
>> > https://review.openstack.org/#/c/57053/
>>
>>  Hi, Joe, why you think the DB is not exact state in your followed commit
>> message? I think the DB is updated to date by resource tracker, am I
>> right (the resource tracker get the underlying resource information
>> periodically but I think that information is mostly static). And I think
>> the scheduler retry mainly comes from the race condition of multiple
>> scheduler instance.
>>
>
>
>  You answered the question yourself, the compute nodes (indirectly)
> update the DB periodically, so the further you are from the last periodic
> update the less up to date the DB is.
>
>  Its there for both reasons.  But yes it was originally put there because
> of the multi scheduler race condition.
>
>
>>
>> "We already have the concept that the DB isn't the exact state of the
>> world, right now it's updated every 10 seconds. And we use the scheduler
>> retry mechanism to handle cases where the scheduler was wrong. "
>>
>>
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-18 Thread Joe Gordon
On Mon, Nov 18, 2013 at 5:18 PM, yunhong jiang <
yunhong.ji...@linux.intel.com> wrote:

> On Mon, 2013-11-18 at 15:32 -0800, Joe Gordon wrote:
> >
> >
> >
> > On Mon, Nov 18, 2013 at 4:08 PM, yunhong jiang
> >  wrote:
> > On Mon, 2013-11-18 at 14:09 -0800, Joe Gordon wrote:
> > >
> > > Phil Day discussed this at the summit and I have finally
> > gotten around
> > > to posting a POC of this.
> > >
> > > https://review.openstack.org/#/c/57053/
> >
> >
> > Hi, Joe, why you think the DB is not exact state in your
> > followed commit
> > message? I think the DB is updated to date by resource
> > tracker, am I
> > right (the resource tracker get the underlying resource
> > information
> > periodically but I think that information is mostly static).
> > And I think
> > the scheduler retry mainly comes from the race condition of
> > multiple
> > scheduler instance.
> >
> >
> >
> >
> > You answered the question yourself, the compute nodes (indirectly)
> > update the DB periodically, so the further you are from the last
> > periodic update the less up to date the DB is.
> >
> But the compute node will also update the DB if any claim changes
> between the period, and also considering currently the resource tracker
> calculate the instance usage (like RAM, core etc) itself instead of
> depends on hyper-visor report, I think the DB information should be
> considered mostly up to date.
>
>
Yes, *mostly* up to date I agree, we can just make that word 'mostly'
configurable.  Thanks for helping clarify this point.


> Of course, I'm not against the information cache.
>
> --jyh
> >
> > Its there for both reasons.  But yes it was originally put there
> > because of the multi scheduler race condition.
> >
> >
> > "We already have the concept that the DB isn't the exact state
> > of the
> > world, right now it's updated every 10 seconds. And we use the
> > scheduler
> > retry mechanism to handle cases where the scheduler was wrong.
> > "
> >
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> >
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-18 Thread Joshua Harlow
An idea related to this, what would need to be done to make the DB have the 
exact state that a compute node is going through (and therefore the scheduler 
would not make unreliable/racey decisions, even when there are multiple 
schedulers). It's not like we are dealing with a system which can not know the 
exact state (as long as the compute nodes are connected to the network, and a 
network partition does not occur).

So maybe if we think about ways to correctly reserve resources, and keep up to 
date information about reserved resources we could then eliminate the race and 
eliminate the retries entirely?

From: Joe Gordon mailto:joe.gord...@gmail.com>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
mailto:openstack-dev@lists.openstack.org>>
Date: Monday, November 18, 2013 3:32 PM
To: "OpenStack Development Mailing List (not for usage questions)" 
mailto:openstack-dev@lists.openstack.org>>
Subject: Re: [openstack-dev] [Nova] Does Nova really need an SQL database?




On Mon, Nov 18, 2013 at 4:08 PM, yunhong jiang 
mailto:yunhong.ji...@linux.intel.com>> wrote:
On Mon, 2013-11-18 at 14:09 -0800, Joe Gordon wrote:
>
> Phil Day discussed this at the summit and I have finally gotten around
> to posting a POC of this.
>
> https://review.openstack.org/#/c/57053/

Hi, Joe, why you think the DB is not exact state in your followed commit
message? I think the DB is updated to date by resource tracker, am I
right (the resource tracker get the underlying resource information
periodically but I think that information is mostly static). And I think
the scheduler retry mainly comes from the race condition of multiple
scheduler instance.


You answered the question yourself, the compute nodes (indirectly) update the 
DB periodically, so the further you are from the last periodic update the less 
up to date the DB is.

Its there for both reasons.  But yes it was originally put there because of the 
multi scheduler race condition.


"We already have the concept that the DB isn't the exact state of the
world, right now it's updated every 10 seconds. And we use the scheduler
retry mechanism to handle cases where the scheduler was wrong. "


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org<mailto:OpenStack-dev@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-18 Thread yunhong jiang
On Mon, 2013-11-18 at 15:32 -0800, Joe Gordon wrote:
> 
> 
> 
> On Mon, Nov 18, 2013 at 4:08 PM, yunhong jiang
>  wrote:
> On Mon, 2013-11-18 at 14:09 -0800, Joe Gordon wrote:
> >
> > Phil Day discussed this at the summit and I have finally
> gotten around
> > to posting a POC of this.
> >
> > https://review.openstack.org/#/c/57053/
> 
> 
> Hi, Joe, why you think the DB is not exact state in your
> followed commit
> message? I think the DB is updated to date by resource
> tracker, am I
> right (the resource tracker get the underlying resource
> information
> periodically but I think that information is mostly static).
> And I think
> the scheduler retry mainly comes from the race condition of
> multiple
> scheduler instance.
> 
> 
> 
> 
> You answered the question yourself, the compute nodes (indirectly)
> update the DB periodically, so the further you are from the last
> periodic update the less up to date the DB is.
> 
But the compute node will also update the DB if any claim changes
between the period, and also considering currently the resource tracker
calculate the instance usage (like RAM, core etc) itself instead of
depends on hyper-visor report, I think the DB information should be
considered mostly up to date.

Of course, I'm not against the information cache.

--jyh
> 
> Its there for both reasons.  But yes it was originally put there
> because of the multi scheduler race condition.
>  
> 
> "We already have the concept that the DB isn't the exact state
> of the
> world, right now it's updated every 10 seconds. And we use the
> scheduler
> retry mechanism to handle cases where the scheduler was wrong.
> "
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-18 Thread Joe Gordon
On Mon, Nov 18, 2013 at 4:08 PM, yunhong jiang <
yunhong.ji...@linux.intel.com> wrote:

> On Mon, 2013-11-18 at 14:09 -0800, Joe Gordon wrote:
> >
> > Phil Day discussed this at the summit and I have finally gotten around
> > to posting a POC of this.
> >
> > https://review.openstack.org/#/c/57053/
>
> Hi, Joe, why you think the DB is not exact state in your followed commit
> message? I think the DB is updated to date by resource tracker, am I
> right (the resource tracker get the underlying resource information
> periodically but I think that information is mostly static). And I think
> the scheduler retry mainly comes from the race condition of multiple
> scheduler instance.
>


You answered the question yourself, the compute nodes (indirectly) update
the DB periodically, so the further you are from the last periodic update
the less up to date the DB is.

Its there for both reasons.  But yes it was originally put there because of
the multi scheduler race condition.


>
> "We already have the concept that the DB isn't the exact state of the
> world, right now it's updated every 10 seconds. And we use the scheduler
> retry mechanism to handle cases where the scheduler was wrong. "
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-18 Thread yunhong jiang
On Mon, 2013-11-18 at 14:09 -0800, Joe Gordon wrote:
> 
> Phil Day discussed this at the summit and I have finally gotten around
> to posting a POC of this. 
> 
> https://review.openstack.org/#/c/57053/

Hi, Joe, why you think the DB is not exact state in your followed commit
message? I think the DB is updated to date by resource tracker, am I
right (the resource tracker get the underlying resource information
periodically but I think that information is mostly static). And I think
the scheduler retry mainly comes from the race condition of multiple
scheduler instance.

"We already have the concept that the DB isn't the exact state of the
world, right now it's updated every 10 seconds. And we use the scheduler
retry mechanism to handle cases where the scheduler was wrong. "


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-18 Thread Joe Gordon
On Mon, Nov 18, 2013 at 12:14 PM, Jay Pipes  wrote:

> On 11/18/2013 02:35 PM, Mike Spreitzer wrote:
>
>> There were some concerns expressed at the summit about scheduler
>> scalability in Nova, and a little recollection of Boris' proposal to
>>  keep the needed state in memory.
>>
>
> While it could be possible to do all of the scheduler state in memory, I
> think a better (or at least, less cumbersome initially) approach would
> be to add some layers of in-memory caching to any existing parts where
> the scheduler currently makes a database query. The problem with this is
>

Phil Day discussed this at the summit and I have finally gotten around to
posting a POC of this.

https://review.openstack.org/#/c/57053/

It is very very rough, but gives the general idea. Small scale testing in
devstack showed promising initial results.



> that you won't be able to scale out the design -- since the scheduler's
> cached pieces cannot be shared easily across distributed nodes. This is
> where the concept of using cells and a hierarchical "sieve scheduling"
> pattern is used, where higher-level cell schedulers can quickly send a
> scheduling request to another cell's scheduler based on a small amount
> of information that can generally be compared against in-memory things
> (like region, availability zone, type of hypervisor, etc...)
>
>
>  I also heard one guy say that he thinks Nova does not really need a
>> general SQL database, that a NOSQL database with a bit of
>> denormalization and/or client-maintained secondary indices could
>> suffice.  Has that sort of thing been considered before?  What is the
>> community's level of interest in exploring that?
>>
>
> Good luck. :)  I don't think that whomever suggested that a NoSQL
> database with a "bit of denormalization" would suffice for Nova realized
> the extent to which the sets of data within Nova's database are highly
> relational. You will just end up implementing JOIN algorithms in Python
> code and make some of the more advanced search queries much slower, IMO.
>
> Oh, and BTW, Nova's "database" was originally Redis [1] :)
>
> Best,
> -jay
>
> [1]
> https://github.com/openstack/nova/blob/bf6e6e718cdc7488e2da87b21e258c
> cc065fe499/nova/datastore.py
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-18 Thread Jay Pipes

On 11/18/2013 02:35 PM, Mike Spreitzer wrote:

There were some concerns expressed at the summit about scheduler
scalability in Nova, and a little recollection of Boris' proposal to
 keep the needed state in memory.


While it could be possible to do all of the scheduler state in memory, I
think a better (or at least, less cumbersome initially) approach would
be to add some layers of in-memory caching to any existing parts where
the scheduler currently makes a database query. The problem with this is
that you won't be able to scale out the design -- since the scheduler's
cached pieces cannot be shared easily across distributed nodes. This is
where the concept of using cells and a hierarchical "sieve scheduling"
pattern is used, where higher-level cell schedulers can quickly send a
scheduling request to another cell's scheduler based on a small amount
of information that can generally be compared against in-memory things
(like region, availability zone, type of hypervisor, etc...)


I also heard one guy say that he thinks Nova does not really need a
general SQL database, that a NOSQL database with a bit of
denormalization and/or client-maintained secondary indices could
suffice.  Has that sort of thing been considered before?  What is the
community's level of interest in exploring that?


Good luck. :)  I don't think that whomever suggested that a NoSQL
database with a "bit of denormalization" would suffice for Nova realized
the extent to which the sets of data within Nova's database are highly
relational. You will just end up implementing JOIN algorithms in Python
code and make some of the more advanced search queries much slower, IMO.

Oh, and BTW, Nova's "database" was originally Redis [1] :)

Best,
-jay

[1]
https://github.com/openstack/nova/blob/bf6e6e718cdc7488e2da87b21e258ccc065fe499/nova/datastore.py

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-18 Thread Shawn Hartsock
+1 on changing the relationship between data persistence and scheduler rules. 
That's a directly addressable problem.

There was a thread on the ML a while ago dealing with the work load on the 
scheduler. The problem was that when the scheduler rules fire 55 times a second 
subsequent rules were interacting with the database  at a multiple of the 
number of nodes in the cloud... such that 55 rule fires a second on a 10k node 
cloud was something on the order of 550k database interactions.

While I'm interested in working with NoSQL databases, in memory fact-bases, and 
other interesting ways to deal with larger amounts of data that may not need a 
relational database... I've got a feeling we need to address this problem more 
directly rather than trying to optimize a central scheduler. Decentralizing the 
scheduler could solve the problem permanently, while optimizing a monolithic 
scheduler just prolongs how long you have before you are forced to address the 
core problem.

That's just my 2 cents on the topic.

# Shawn Hartsock


- Original Message -
> From: "Mike Spreitzer" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Monday, November 18, 2013 2:35:05 PM
> Subject: [openstack-dev] [Nova] Does Nova really need an SQL database?
> 
> There were some concerns expressed at the summit about scheduler
> scalability in Nova, and a little recollection of Boris' proposal to keep
> the needed state in memory.  I also heard one guy say that he thinks Nova
> does not really need a general SQL database, that a NOSQL database with a
> bit of denormalization and/or client-maintained secondary indices could
> suffice.  Has that sort of thing been considered before?  What is the
> community's level of interest in exploring that?
> 
> Thanks,
> Mike
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> https://urldefense.proofpoint.com/v1/url?u=http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=U9jd8i1QXRWQEdLI1XfrWPjXsJaoGrk8w31ffdfY7Wk%3D%0A&m=NwxAd6831JJcj%2FH7CoRa%2BVyaWBoKOth9FNmjCadmbeQ%3D%0A&s=58fde4a14f1b9bf8c7472739accdad2afcd72caf1914641cd02774d20e4a9cea
> 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Does Nova really need an SQL database?

2013-11-18 Thread Mike Wilson
I'm not sure the problem is that we use a general SQL database. The
problems as I see it are:

-Multi-master in MySQL sucks. Complicated, problematic and not performant.
Also, no great way to do multi-master over higher latency networks.
-MySQL and Postgres require tuning to scale.
-We tend to write queries badly when using SQLA. Ie. lots of code-level
joins and filtering.
-SQLA mapping is pretty slow. See Boris and Alexei's patch to
compute_node_get_all for an example of how this can be worked around[1].
Also comstud's work on the mysql backend[2].
-Thread serialization problem in eventlet, also somewhat addressed by the
mysql backend

Some of these problems are addressed very well by some NOSQL DBs,
specifically the multi-master problems just go away for the most part.
However our general SQL databases provide some nice things like
transactions that would require some more work on our end to do properly.

All that being said, I am very interested in what NOSQL DBs can do for us.

-Mike Wilson

[1] https://review.openstack.org/#/c/43151/
[2] https://blueprints.launchpad.net/nova/+spec/db-mysqldb-impl


On Mon, Nov 18, 2013 at 12:35 PM, Mike Spreitzer wrote:

> There were some concerns expressed at the summit about scheduler
> scalability in Nova, and a little recollection of Boris' proposal to keep
> the needed state in memory.  I also heard one guy say that he thinks Nova
> does not really need a general SQL database, that a NOSQL database with a
> bit of denormalization and/or client-maintained secondary indices could
> suffice.  Has that sort of thing been considered before?  What is the
> community's level of interest in exploring that?
>
> Thanks,
> Mike
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev