I agree that we should embrace eventual consistency (under certain cases), but 
it begs the question of what are u eventually consistent on (maybe u shouldn't 
be eventually consistent on resource knowledge). U don't have to be eventually 
consistent on all the things.

So lets assume we are always consistent about static-like resources and then 
you could offer a 'consistent' scheduler that has no races; u could offer a 
less-consistent scheduler if there was a pluggable way to do this. But then the 
question becomes how does the 'consistent' scheduler reserve resources on a 
compute-node, before actually asking that compute-node to do the work required 
to fulfill the resource request, this is where I think the reservation process 
would be useful (of course it then also brings along the question of what do u 
do about reservation timeouts and cleaning up inactive/unfulfilled 
reservations). Think of this as planning how to carve up a cake before u carve 
it up. Nova has enough knowledge (or should) to know what the cake currently 
looks like (with-in reason, aka minus the dynamic eventually consistent 
resources) and therefore it should be able to know how to plan the cake 
carving, before actually doing the cake carving.

This is similar/the same issue (?) that cinder is dealing with with its work on 
having a defined state-machine (see: 
https://etherpad.openstack.org/p/CinderTaskFlowFSM) and integrating with 
taskflow to gain reliable workflows. Personally I prefer a slower (optimize it 
later) and reliable consistent scheduler & workflow that keeps my operations 
people sane over a eventually consistent one that has a higher chance of making 
them insane ;)

Anyway, that’s my current brain dump (and cake analogy, ha).

-Josh

From: Joe Gordon <joe.gord...@gmail.com<mailto:joe.gord...@gmail.com>>
Date: Monday, November 18, 2013 5:33 PM
To: Joshua Harlow <harlo...@yahoo-inc.com<mailto:harlo...@yahoo-inc.com>>
Cc: "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>>
Subject: Re: [openstack-dev] [Nova] Does Nova really need an SQL database?




On Mon, Nov 18, 2013 at 4:47 PM, Joshua Harlow 
<harlo...@yahoo-inc.com<mailto:harlo...@yahoo-inc.com>> wrote:
An idea related to this, what would need to be done to make the DB have the 
exact state that a compute node is going through (and therefore the scheduler 
would not make unreliable/racey decisions, even when there are multiple 
schedulers). It's not like we are dealing with a system which can not know the 
exact state (as long as the compute nodes are connected to the network, and a 
network partition does not occur).


Good question, I don't have a clear idea of the amount of work required to do 
this.

So maybe if we think about ways to correctly reserve resources, and keep up to 
date information about reserved resources we could then eliminate the race and 
eliminate the retries entirely?

What is the trade off here? What benefits do we get at what cost? I have a 
vague idea but just want to be explicit here.  Also for 'cloudy' things we 
embrace the eventually consistent model, and I don't think we should drop that.


From: Joe Gordon <joe.gord...@gmail.com<mailto:joe.gord...@gmail.com>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>>
Date: Monday, November 18, 2013 3:32 PM

To: "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>>
Subject: Re: [openstack-dev] [Nova] Does Nova really need an SQL database?




On Mon, Nov 18, 2013 at 4:08 PM, yunhong jiang 
<yunhong.ji...@linux.intel.com<mailto:yunhong.ji...@linux.intel.com>> wrote:
On Mon, 2013-11-18 at 14:09 -0800, Joe Gordon wrote:
>
> Phil Day discussed this at the summit and I have finally gotten around
> to posting a POC of this.
>
> https://review.openstack.org/#/c/57053/

Hi, Joe, why you think the DB is not exact state in your followed commit
message? I think the DB is updated to date by resource tracker, am I
right (the resource tracker get the underlying resource information
periodically but I think that information is mostly static). And I think
the scheduler retry mainly comes from the race condition of multiple
scheduler instance.


You answered the question yourself, the compute nodes (indirectly) update the 
DB periodically, so the further you are from the last periodic update the less 
up to date the DB is.

Its there for both reasons.  But yes it was originally put there because of the 
multi scheduler race condition.


"We already have the concept that the DB isn't the exact state of the
world, right now it's updated every 10 seconds. And we use the scheduler
retry mechanism to handle cases where the scheduler was wrong. "


_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org<mailto:OpenStack-dev@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to