Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-20 Thread Joshua Harlow
Looking at the conductor code it still to me provides a low level database API 
that succumbs to the same races as a the old db access did. Get calls followed 
by some response followed by some python code followed by some rpc update 
followed by more code is still susceptible to consistency & fragility issues.

The API provided is more data oriented and not action oriented. I would argue 
that data oriented leads to lots of consistency issues with multiple 
conductors. Action/task oriented if that is ever accomplished allows the 
conductor to lock resources that are being "manipulated" so that another 
conductor can not alter the same resource at the same time.

Nova currently has a lot of devoted and hard to follow logic for when resources 
are simultaneously manipulated (deleted while building for example). Just look 
for *not found* exceptions being thrown in the conductor from *get/update 
function calls and check where that exception is handled (are all of them? are 
all resources cleaned up??). These seem like examples of a API that is to low 
level and wouldn't be exposed in a action/task oriented API. It appears that 
nova is trying to handle all of these special exists or not already exists (or 
similar consistency violations) calls correctly, which is good, but having said 
logic scattered sure doesn't inspire confidence that it is correctly doing the 
right logic under all scenarios to me. Does that not worry anyone else??

IMHO adding task logic in the conductor on top of the already hard to follow 
logic for these scenarios worries me personally. That's why I previously 
thought (and others seem to think) task logic and correct locking and such ... 
should be located in a service that can devote its code to just doing said 
tasks reliably. Honestly said code will be much much more complex than a 
database-rpc access layer (especially when the races and simultaneous 
manipulation problems are not hidden/scattered but are dealt with in an upfront 
and easily auditable manner).

But maybe this is nothing new to folks and all of this is already being thought 
about (solutions do seem to be appearing and more discussion about said ideas 
is always beneficial).

Just my thoughts...

Sent from my really tiny device...

On Jul 19, 2013, at 5:30 PM, "Peter Feiner"  wrote:

> On Fri, Jul 19, 2013 at 4:36 PM, Joshua Harlow  wrote:
>> This seems to me to be a good example where a library "problem" is leaking 
>> into the openstack architecture right? That is IMHO a bad path to go down.
>> 
>> I like to think of a world where this isn't a problem and design the correct 
>> solution there instead and fix the eventlet problem instead. Other large 
>> applications don't fallback to rpc calls to get around a database/eventlet 
>> scaling issues afaik.
>> 
>> Honestly I would almost just want to finally fix the eventlet problem (chris 
>> b. I think has been working on it) and design a system that doesn't try to 
>> work around a libraries lacking. But maybe that's to much idealism, idk...
> 
> Well, there are two problems that multiple nova-conductor processes
> fix. One is the bad interaction between eventlet and native code. The
> other is allowing multiprocessing.  That is, once nova-conductor
> starts to handle enough requests, enough time will be spent holding
> the GIL to make it a bottleneck; in fact I've had to scale keystone
> using multiple processes because of GIL contention (i.e., keystone was
> steadily at 100% CPU utilization when I was hitting OpenStack with
> enough requests). So multiple processes isn't avoidable. Indeed, other
> software that strives for high concurrency, such as apache, use
> multiple processes to avoid contention for per-process kernel
> resources like the mmap semaphore.
> 
>> This doesn't even touch on the synchronization issues that can happen when u 
>> start pumping db traffic over a mq. Ex, an update is now queued behind 
>> another update, the second one conflicts with the first, where does 
>> resolution happen when an async mq call is used. What about when you have X 
>> conductors doing Y reads and Z updates; I don't even want to think about the 
>> sync/races there (and so on...). Did u hit / check for any consistency 
>> issues in your tests? Consistency issues under high load using multiple 
>> conductors scare the bejezzus out of me
> 
> If a sequence of updates needs to be atomic, then they should be made
> in the same database transaction. Hence nova-conductor's interface
> isn't do_some_sql(query), it's a bunch of high-level nova operations
> that are implemented using transactions.
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-19 Thread Joshua Harlow
I remember trying to make this argument myself about a month or 2 ago. I agree 
with the thought & splitting up "principle", just unsure of the timing.

Taskflow (the library) I am hoping can become a useful library for making these 
complicates less complex. WIP of course :)

Honestly I think it's not just nova that sees this issue with flows and how to 
scale them outwards reliably. But this is  one of the big challenges (changing 
the tires on the car while its moving)...

Sent from my really tiny device...

On Jul 19, 2013, at 7:01 AM, "Day, Phil"  wrote:

> Hi Josh,
> 
> My idea's really pretty simple - make "DB proxy" and "Task workflow" separate 
> services, and allow people to co-locate them if they want to.
> 
> Cheers.
> Phil
> 
>> -Original Message-
>> From: Joshua Harlow [mailto:harlo...@yahoo-inc.com]
>> Sent: 17 July 2013 14:57
>> To: OpenStack Development Mailing List
>> Cc: OpenStack Development Mailing List
>> Subject: Re: [openstack-dev] Moving task flow to conductor - concern about
>> scale
>> 
>> Hi Phil,
>> 
>> I understand and appreciate your concern and I think everyone is trying to 
>> keep
>> that in mind. It still appears to me to be to early in this refactoring and 
>> task
>> restructuring effort to tell where it may "end up". I think that's also good 
>> news
>> since we can get these kinds of ideas (componentized conductors if u will) to
>> handle your (and mine) scaling concerns. It would be pretty neat if said
>> conductors could be scaled at different rates depending on there component,
>> although as u said we need to get much much better with handling said
>> patterns (as u said just 2 schedulers is a pita right now). I believe we can 
>> do it,
>> given the right kind of design and scaling "principles" we build in from the 
>> start
>> (right now).
>> 
>> Would like to hear more of your ideas so they get incorporated earlier rather
>> than later.
>> 
>> Sent from my really tiny device..
>> 
>> On Jul 16, 2013, at 9:55 AM, "Dan Smith"  wrote:
>> 
>>>> In the original context of using Conductor as a database proxy then
>>>> the number of conductor instances is directly related to the number
>>>> of compute hosts I need them to serve.
>>> 
>>> Just a point of note, as far as I know, the plan has always been to
>>> establish conductor as a thing that sits between the api and compute
>>> nodes. However, we started with the immediate need, which was the
>>> offloading of database traffic.
>>> 
>>>> What I not sure is that I would also want to have the same number of
>>>> conductor instances for task control flow - historically even running
>>>> 2 schedulers has been a problem, so the thought of having 10's of
>>>> them makes me very concerned at the moment.   However I can't see any
>>>> way to specialise a conductor to only handle one type of request.
>>> 
>>> Yeah, I don't think the way it's currently being done allows for
>>> specialization.
>>> 
>>> Since you were reviewing actual task code, can you offer any specifics
>>> about the thing(s) that concern you? I think that scaling conductor
>>> (and its tasks) horizontally is an important point we need to achieve,
>>> so if you see something that needs tweaking, please point it out.
>>> 
>>> Based on what is there now and proposed soon, I think it's mostly
>>> fairly safe, straightforward, and really no different than what two
>>> computes do when working together for something like resize or migrate.
>>> 
>>>> So I guess my question is, given that it may have to address two
>>>> independent scale drivers, is putting task work flow and DB proxy
>>>> functionality into the same service really the right thing to do - or
>>>> should there be some separation between them.
>>> 
>>> I think that we're going to need more than one "task" node, and so it
>>> seems appropriate to locate one scales-with-computes function with
>>> another.
>>> 
>>> Thanks!
>>> 
>>> --Dan
>>> 
>>> ___
>>> OpenStack-dev mailing list
>>> OpenStack-dev@lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> 
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-19 Thread Joshua Harlow
This seems to me to be a good example where a library "problem" is leaking into 
the openstack architecture right? That is IMHO a bad path to go down.

I like to think of a world where this isn't a problem and design the correct 
solution there instead and fix the eventlet problem instead. Other large 
applications don't fallback to rpc calls to get around a database/eventlet 
scaling issues afaik. 

Honestly I would almost just want to finally fix the eventlet problem (chris b. 
I think has been working on it) and design a system that doesn't try to work 
around a libraries lacking. But maybe that's to much idealism, idk...

This doesn't even touch on the synchronization issues that can happen when u 
start pumping db traffic over a mq. Ex, an update is now queued behind another 
update, the second one conflicts with the first, where does resolution happen 
when an async mq call is used. What about when you have X conductors doing Y 
reads and Z updates; I don't even want to think about the sync/races there (and 
so on...). Did u hit / check for any consistency issues in your tests? 
Consistency issues under high load using multiple conductors scare the bejezzus 
out of me

Sent from my really tiny device...

On Jul 19, 2013, at 10:58 AM, "Peter Feiner"  wrote:

> On Fri, Jul 19, 2013 at 10:15 AM, Dan Smith  wrote:
>> 
>>> So rather than asking "what doesn't work / might not work in the
>>> future" I think the question should be "aside from them both being
>>> things that could be described as a conductor - what's the
>>> architectural reason for wanting to have these two separate groups of
>>> functionality in the same service ?"
>> 
>> IMHO, the architectural reason is "lack of proliferation of services and
>> the added complexity that comes with it." If one expects the
>> proxy workload to always overshadow the task workload, then making
>> these two things a single service makes things a lot simpler.
> 
> I'd like to point a low-level detail that makes scaling nova-conductor
> at the process level extremely compelling: the database driver
> blocking the eventlet thread serializes nova's database access.
> 
> Since the database connection driver is typically implemented in a
> library beyond the purview of eventlet's monkeypatching (i.e., a
> native python extension like _mysql.so), blocking database calls will
> block all eventlet coroutines. Since most of what nova-conductor does
> is access the database, a nova-conductor process's handling of
> requests is effectively serial.
> 
> Nova-conductor is the gateway to the database for nova-compute
> processes.  So permitting a single nova-conductor process would
> effectively serialize all database queries during instance creation,
> deletion, periodic instance refreshes, etc. Since these queries are
> made frequently (i.e., easily 100 times during instance creation) and
> while other global locks are held (e.g., in the case of nova-compute's
> ResourceTracker), most of what nova-compute does becomes serialized.
> 
> In parallel performance experiments I've done, I have found that
> running multiple nova-conductor processes is the best way to mitigate
> the serialization of blocking database calls. Say I am booting N
> instances in parallel (usually up to N=40). If I have a single
> nova-conductor process, the duration of each nova-conductor RPC
> increases linearly with N, which can add _minutes_ to instance
> creation time (i.e., dozens of RPCs, some taking several seconds).
> However, if I run N nova-conductor processes in parallel, then the
> duration of the nova-conductor RPCs do not increase with N; since each
> RPC is most likely handled by a different nova-conductor, serial
> execution of each process is moot.
> 
> Note that there are alternative methods for preventing the eventlet
> thread from blocking during database calls. However, none of these
> alternatives performed as well as multiple nova-conductor processes:
> 
> Instead of using the native database driver like _mysql.so, you can
> use a pure-python driver, like pymysql by setting
> sql_connection=mysql+pymysql://... in the [DEFAULT] section of
> /etc/nova/nova.conf, which eventlet will monkeypatch to avoid
> blocking. The problem with this approach is the vastly greater CPU
> demand of the pure-python driver compared to the native driver. Since
> the pure-python driver is so much more CPU intensive, the eventlet
> thread spends most of its time talking to the database, which
> effectively the problem we had before!
> 
> Instead of making database calls from eventlet's thread, you can
> submit them to eventlet's pool of worker threads and wait for the
> results. Try this by setting dbapi_use_tpool=True in the [DEFAULT]
> section of /etc/nova/nova.conf. The problem I found with this approach
> was the overhead of synchronizing with the worker threads. In
> particular, the time elapsed between the worker thread finishing and
> the waiting coroutine being resumed was typically several

Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-19 Thread Peter Feiner
On Fri, Jul 19, 2013 at 4:36 PM, Joshua Harlow  wrote:
> This seems to me to be a good example where a library "problem" is leaking 
> into the openstack architecture right? That is IMHO a bad path to go down.
>
> I like to think of a world where this isn't a problem and design the correct 
> solution there instead and fix the eventlet problem instead. Other large 
> applications don't fallback to rpc calls to get around a database/eventlet 
> scaling issues afaik.
>
> Honestly I would almost just want to finally fix the eventlet problem (chris 
> b. I think has been working on it) and design a system that doesn't try to 
> work around a libraries lacking. But maybe that's to much idealism, idk...

Well, there are two problems that multiple nova-conductor processes
fix. One is the bad interaction between eventlet and native code. The
other is allowing multiprocessing.  That is, once nova-conductor
starts to handle enough requests, enough time will be spent holding
the GIL to make it a bottleneck; in fact I've had to scale keystone
using multiple processes because of GIL contention (i.e., keystone was
steadily at 100% CPU utilization when I was hitting OpenStack with
enough requests). So multiple processes isn't avoidable. Indeed, other
software that strives for high concurrency, such as apache, use
multiple processes to avoid contention for per-process kernel
resources like the mmap semaphore.

> This doesn't even touch on the synchronization issues that can happen when u 
> start pumping db traffic over a mq. Ex, an update is now queued behind 
> another update, the second one conflicts with the first, where does 
> resolution happen when an async mq call is used. What about when you have X 
> conductors doing Y reads and Z updates; I don't even want to think about the 
> sync/races there (and so on...). Did u hit / check for any consistency issues 
> in your tests? Consistency issues under high load using multiple conductors 
> scare the bejezzus out of me

If a sequence of updates needs to be atomic, then they should be made
in the same database transaction. Hence nova-conductor's interface
isn't do_some_sql(query), it's a bunch of high-level nova operations
that are implemented using transactions.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-19 Thread Robert Collins
On 19 July 2013 22:55, Day, Phil  wrote:
> Hi Josh,
>
> My idea's really pretty simple - make "DB proxy" and "Task workflow" separate 
> services, and allow people to co-locate them if they want to.

+1, for all the reasons discussed in this thread. I was weirded out
when I saw non-DB-proxy work being put into the same service. One
additional reason that hasn't been discussed is security : the more
complex the code in the 'actually connects to the DB', the greater the
risk of someone getting direct access that shouldn't via a code bug.

-Rob

-- 
Robert Collins 
Distinguished Technologist
HP Cloud Services

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-19 Thread Dan Smith
> Nova-conductor is the gateway to the database for nova-compute
> processes.  So permitting a single nova-conductor process would
> effectively serialize all database queries during instance creation,
> deletion, periodic instance refreshes, etc.

FWIW, I don't think anyone is suggesting a single conductor, and
especially not a single database proxy.

> Since these queries are made frequently (i.e., easily 100 times
> during instance creation) and while other global locks are held
> (e.g., in the case of nova-compute's ResourceTracker), most of what
> nova-compute does becomes serialized.

I think your numbers are a bit off. When I measured it just before
grizzly, an instance create was something like 20-30 database calls.
Unless that's changed (a lot) lately ... :)

--Dan

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-19 Thread Peter Feiner
On Fri, Jul 19, 2013 at 10:15 AM, Dan Smith  wrote:
>
> > So rather than asking "what doesn't work / might not work in the
> > future" I think the question should be "aside from them both being
> > things that could be described as a conductor - what's the
> > architectural reason for wanting to have these two separate groups of
> > functionality in the same service ?"
>
> IMHO, the architectural reason is "lack of proliferation of services and
> the added complexity that comes with it." If one expects the
> proxy workload to always overshadow the task workload, then making
> these two things a single service makes things a lot simpler.

I'd like to point a low-level detail that makes scaling nova-conductor
at the process level extremely compelling: the database driver
blocking the eventlet thread serializes nova's database access.

Since the database connection driver is typically implemented in a
library beyond the purview of eventlet's monkeypatching (i.e., a
native python extension like _mysql.so), blocking database calls will
block all eventlet coroutines. Since most of what nova-conductor does
is access the database, a nova-conductor process's handling of
requests is effectively serial.

Nova-conductor is the gateway to the database for nova-compute
processes.  So permitting a single nova-conductor process would
effectively serialize all database queries during instance creation,
deletion, periodic instance refreshes, etc. Since these queries are
made frequently (i.e., easily 100 times during instance creation) and
while other global locks are held (e.g., in the case of nova-compute's
ResourceTracker), most of what nova-compute does becomes serialized.

In parallel performance experiments I've done, I have found that
running multiple nova-conductor processes is the best way to mitigate
the serialization of blocking database calls. Say I am booting N
instances in parallel (usually up to N=40). If I have a single
nova-conductor process, the duration of each nova-conductor RPC
increases linearly with N, which can add _minutes_ to instance
creation time (i.e., dozens of RPCs, some taking several seconds).
However, if I run N nova-conductor processes in parallel, then the
duration of the nova-conductor RPCs do not increase with N; since each
RPC is most likely handled by a different nova-conductor, serial
execution of each process is moot.

Note that there are alternative methods for preventing the eventlet
thread from blocking during database calls. However, none of these
alternatives performed as well as multiple nova-conductor processes:

Instead of using the native database driver like _mysql.so, you can
use a pure-python driver, like pymysql by setting
sql_connection=mysql+pymysql://... in the [DEFAULT] section of
/etc/nova/nova.conf, which eventlet will monkeypatch to avoid
blocking. The problem with this approach is the vastly greater CPU
demand of the pure-python driver compared to the native driver. Since
the pure-python driver is so much more CPU intensive, the eventlet
thread spends most of its time talking to the database, which
effectively the problem we had before!

Instead of making database calls from eventlet's thread, you can
submit them to eventlet's pool of worker threads and wait for the
results. Try this by setting dbapi_use_tpool=True in the [DEFAULT]
section of /etc/nova/nova.conf. The problem I found with this approach
was the overhead of synchronizing with the worker threads. In
particular, the time elapsed between the worker thread finishing and
the waiting coroutine being resumed was typically several times
greater than the duration of the database call itself.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-19 Thread Day, Phil
> -Original Message-
> From: Dan Smith [mailto:d...@danplanet.com]
> Sent: 19 July 2013 15:15
> To: OpenStack Development Mailing List
> Cc: Day, Phil
> Subject: Re: [openstack-dev] Moving task flow to conductor - concern about
> scale
> 
> > There's nothing I've seen so far that causes me alarm,  but then again
> > we're in the very early stages and haven't moved anything really
> > complex.
> 
> The migrations (live, cold, and resize) are moving there now. These are some
> of the more complex stateful operations I would expect conductor to manage in
> the near term, and maybe ever.
> 
> > I just don't buy into this line of thinking - I need more than one API
> > node for HA as well - but that doesn't mean that therefore I want to
> > put anything else that needs more than one node in there.
> >
> > I don't even think these do scale-with-compute in the same way;  DB
> > proxy scales with the number of compute hosts because each new host
> > introduces an amount of DB load though its periodic tasks.Task
> 
> > to create / modify servers - and that's not directly related to the
> > number of hosts.
> 
> Unlike API, the only incoming requests that generate load for the conductor 
> are
> things like migrations, which also generate database traffic.
> 
> > So rather than asking "what doesn't work / might not work in the
> > future" I think the question should be "aside from them both being
> > things that could be described as a conductor - what's the
> > architectural reason for wanting to have these two separate groups of
> > functionality in the same service ?"
> 
> IMHO, the architectural reason is "lack of proliferation of services and the
> added complexity that comes with it."
> 

IMO I don't think reducing the number of services is a good enough reason to 
group unrelated services (db-proxy, task_workflow).  Otherwise why aren't we 
arguing to just add all of these to the existing scheduler service ?

> If one expects the proxy workload to
> always overshadow the task workload, then making these two things a single
> service makes things a lot simpler.

Not if you have to run 40 services to cope with the proxy load, but don't want 
the risk/complexity of havign 40 task workflow engines working in parallel.

> > If they were separate services and it turns out that I can/want/need
> > to run the same number of both then I can pretty easily do that  - but
> > the current approach is removing what to be seems a very important
> > degree of freedom around deployment on a large scale system.
> 
> I guess the question, then, is whether other folks agree that the scaling-
> separately problem is concerning enough to justify at least an RPC topic split
> now which would enable the services to be separated later if need be.
> 

Yep - that's the key question.   An in the interest of keeping the system 
stable at scale while we roll through this I think we should be erring on the 
side of caution/keeping deployment options open rather than waiting to see if 
there's a problem.

> I would like to point out, however, that the functions are being split into
> different interfaces currently. While that doesn't reach low enough on the 
> stack
> to allow hosting them in two different places, it does provide organization 
> such
> that if we later needed to split them, it would be a relatively simple (hah)
> matter of coordinating an RPC upgrade like anything else.
> 
> --Dan

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-19 Thread Joe Gordon
On Jul 19, 2013 9:57 AM, "Day, Phil"  wrote:
>
> > -Original Message-
> > From: Dan Smith [mailto:d...@danplanet.com]
> > Sent: 19 July 2013 15:15
> > To: OpenStack Development Mailing List
> > Cc: Day, Phil
> > Subject: Re: [openstack-dev] Moving task flow to conductor - concern
about
> > scale
> >
> > > There's nothing I've seen so far that causes me alarm,  but then again
> > > we're in the very early stages and haven't moved anything really
> > > complex.
> >
> > The migrations (live, cold, and resize) are moving there now. These are
some
> > of the more complex stateful operations I would expect conductor to
manage in
> > the near term, and maybe ever.
> >
> > > I just don't buy into this line of thinking - I need more than one API
> > > node for HA as well - but that doesn't mean that therefore I want to
> > > put anything else that needs more than one node in there.
> > >
> > > I don't even think these do scale-with-compute in the same way;  DB
> > > proxy scales with the number of compute hosts because each new host
> > > introduces an amount of DB load though its periodic tasks.Task
> >
> > > to create / modify servers - and that's not directly related to the
> > > number of hosts.
> >
> > Unlike API, the only incoming requests that generate load for the
conductor are
> > things like migrations, which also generate database traffic.
> >
> > > So rather than asking "what doesn't work / might not work in the
> > > future" I think the question should be "aside from them both being
> > > things that could be described as a conductor - what's the
> > > architectural reason for wanting to have these two separate groups of
> > > functionality in the same service ?"
> >
> > IMHO, the architectural reason is "lack of proliferation of services
and the
> > added complexity that comes with it."
> >
>
> IMO I don't think reducing the number of services is a good enough reason
to group unrelated services (db-proxy, task_workflow).  Otherwise why
aren't we arguing to just add all of these to the existing scheduler
service ?
>
> > If one expects the proxy workload to
> > always overshadow the task workload, then making these two things a
single
> > service makes things a lot simpler.
>
> Not if you have to run 40 services to cope with the proxy load, but don't
want the risk/complexity of havign 40 task workflow engines working in
parallel.
>
> > > If they were separate services and it turns out that I can/want/need
> > > to run the same number of both then I can pretty easily do that  - but
> > > the current approach is removing what to be seems a very important
> > > degree of freedom around deployment on a large scale system.
> >
> > I guess the question, then, is whether other folks agree that the
scaling-
> > separately problem is concerning enough to justify at least an RPC
topic split
> > now which would enable the services to be separated later if need be.
> >
>
> Yep - that's the key question.   An in the interest of keeping the system
stable at scale while we roll through this I think we should be erring on
the side of caution/keeping deployment options open rather than waiting to
see if there's a problem.

++, unless there is some downside to a RPC topic split, this seems like a
reasonable precaution.

>
> > I would like to point out, however, that the functions are being split
into
> > different interfaces currently. While that doesn't reach low enough on
the stack
> > to allow hosting them in two different places, it does provide
organization such
> > that if we later needed to split them, it would be a relatively simple
(hah)
> > matter of coordinating an RPC upgrade like anything else.
> >
> > --Dan
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-19 Thread Dan Smith
> I had assumed that some of the task management state would exist
> in memory. Is it all going to exist in the database?

Well, our state is tracked in the database now, so.. yeah. There's a
desire, of course, to make the state transitions as
idempotent/restartable as possible, which may mean driving some
finer-grained status details into the database. That's really
independent of the move to conductor (although doing that does take
less effort if those don't have to make an RPC trip to get there).

--Dan

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-19 Thread Peter Feiner
On Fri, Jul 19, 2013 at 11:06 AM, Dan Smith  wrote:
> FWIW, I don't think anyone is suggesting a single conductor, and
> especially not a single database proxy.

This is a critical detail that I missed. Re-reading Phil's original email,
I see you're debating the ratio of nova-conductor DB proxies to
nova-conductor task flow managers.

I had assumed that some of the task management state would exist
in memory. Is it all going to exist in the database?

>> Since these queries are made frequently (i.e., easily 100 times
>> during instance creation) and while other global locks are held
>> (e.g., in the case of nova-compute's ResourceTracker), most of what
>> nova-compute does becomes serialized.
>
> I think your numbers are a bit off. When I measured it just before
> grizzly, an instance create was something like 20-30 database calls.
> Unless that's changed (a lot) lately ... :)

Ah perhaps... at least I had the order of magnitude right :-) Even
with 20-30 calls,
when a bunch of instances are being booted in parallel and all of the
database calls
are serialized, minutes are added in instance creation time.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-19 Thread Dan Smith
> There's nothing I've seen so far that causes me alarm,  but then
> again we're in the very early stages and haven't moved anything
> really complex.

The migrations (live, cold, and resize) are moving there now. These are
some of the more complex stateful operations I would expect conductor
to manage in the near term, and maybe ever.

> I just don't buy into this line of thinking - I need more than one
> API node for HA as well - but that doesn't mean that therefore I want
> to put anything else that needs more than one node in there.
> 
> I don't even think these do scale-with-compute in the same way;  DB
> proxy scales with the number of compute hosts because each new host
> introduces an amount of DB load though its periodic tasks.Task

> to create / modify servers - and that's not directly related to the
> number of hosts. 

Unlike API, the only incoming requests that generate load for the
conductor are things like migrations, which also generate database
traffic.

> So rather than asking "what doesn't work / might not work in the
> future" I think the question should be "aside from them both being
> things that could be described as a conductor - what's the
> architectural reason for wanting to have these two separate groups of
> functionality in the same service ?"

IMHO, the architectural reason is "lack of proliferation of services and
the added complexity that comes with it." If one expects the
proxy workload to always overshadow the task workload, then making
these two things a single service makes things a lot simpler.

> If they were separate services and it turns out that I can/want/need
> to run the same number of both then I can pretty easily do that  -
> but the current approach is removing what to be seems a very
> important degree of freedom around deployment on a large scale system.

I guess the question, then, is whether other folks agree that the
scaling-separately problem is concerning enough to justify at least an
RPC topic split now which would enable the services to be separated
later if need be.

I would like to point out, however, that the functions are being split
into different interfaces currently. While that doesn't reach low
enough on the stack to allow hosting them in two different places, it
does provide organization such that if we later needed to split them, it
would be a relatively simple (hah) matter of coordinating an RPC
upgrade like anything else.

--Dan

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-19 Thread Day, Phil
Hi Josh,

My idea's really pretty simple - make "DB proxy" and "Task workflow" separate 
services, and allow people to co-locate them if they want to.

Cheers.
Phil

> -Original Message-
> From: Joshua Harlow [mailto:harlo...@yahoo-inc.com]
> Sent: 17 July 2013 14:57
> To: OpenStack Development Mailing List
> Cc: OpenStack Development Mailing List
> Subject: Re: [openstack-dev] Moving task flow to conductor - concern about
> scale
> 
> Hi Phil,
> 
> I understand and appreciate your concern and I think everyone is trying to 
> keep
> that in mind. It still appears to me to be to early in this refactoring and 
> task
> restructuring effort to tell where it may "end up". I think that's also good 
> news
> since we can get these kinds of ideas (componentized conductors if u will) to
> handle your (and mine) scaling concerns. It would be pretty neat if said
> conductors could be scaled at different rates depending on there component,
> although as u said we need to get much much better with handling said
> patterns (as u said just 2 schedulers is a pita right now). I believe we can 
> do it,
> given the right kind of design and scaling "principles" we build in from the 
> start
> (right now).
> 
> Would like to hear more of your ideas so they get incorporated earlier rather
> than later.
> 
> Sent from my really tiny device..
> 
> On Jul 16, 2013, at 9:55 AM, "Dan Smith"  wrote:
> 
> >> In the original context of using Conductor as a database proxy then
> >> the number of conductor instances is directly related to the number
> >> of compute hosts I need them to serve.
> >
> > Just a point of note, as far as I know, the plan has always been to
> > establish conductor as a thing that sits between the api and compute
> > nodes. However, we started with the immediate need, which was the
> > offloading of database traffic.
> >
> >> What I not sure is that I would also want to have the same number of
> >> conductor instances for task control flow - historically even running
> >> 2 schedulers has been a problem, so the thought of having 10's of
> >> them makes me very concerned at the moment.   However I can't see any
> >> way to specialise a conductor to only handle one type of request.
> >
> > Yeah, I don't think the way it's currently being done allows for
> > specialization.
> >
> > Since you were reviewing actual task code, can you offer any specifics
> > about the thing(s) that concern you? I think that scaling conductor
> > (and its tasks) horizontally is an important point we need to achieve,
> > so if you see something that needs tweaking, please point it out.
> >
> > Based on what is there now and proposed soon, I think it's mostly
> > fairly safe, straightforward, and really no different than what two
> > computes do when working together for something like resize or migrate.
> >
> >> So I guess my question is, given that it may have to address two
> >> independent scale drivers, is putting task work flow and DB proxy
> >> functionality into the same service really the right thing to do - or
> >> should there be some separation between them.
> >
> > I think that we're going to need more than one "task" node, and so it
> > seems appropriate to locate one scales-with-computes function with
> > another.
> >
> > Thanks!
> >
> > --Dan
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-19 Thread Day, Phil
> -Original Message-
> From: Dan Smith [mailto:d...@danplanet.com]
> Sent: 16 July 2013 14:51
> To: OpenStack Development Mailing List
> Cc: Day, Phil
> Subject: Re: [openstack-dev] Moving task flow to conductor - concern about
> scale
> 
> > In the original context of using Conductor as a database proxy then
> > the number of conductor instances is directly related to the number of
> > compute hosts I need them to serve.
> 
> Just a point of note, as far as I know, the plan has always been to establish
> conductor as a thing that sits between the api and compute nodes. However,
> we started with the immediate need, which was the offloading of database
> traffic.
>

Like I said, I see the need for both a layer between the API and compute and 
between compute and DB, - I just don't see them as having to be part of the 
same thing.

 
> > What I not sure is that I would also want to have the same number of
> > conductor instances for task control flow - historically even running
> > 2 schedulers has been a problem, so the thought of having 10's of
> > them makes me very concerned at the moment.   However I can't see any
> > way to specialise a conductor to only handle one type of request.
> 
> Yeah, I don't think the way it's currently being done allows for 
> specialization.
> 
> Since you were reviewing actual task code, can you offer any specifics about
> the thing(s) that concern you? I think that scaling conductor (and its tasks)
> horizontally is an important point we need to achieve, so if you see something
> that needs tweaking, please point it out.
> 
> Based on what is there now and proposed soon, I think it's mostly fairly safe,
> straightforward, and really no different than what two computes do when
> working together for something like resize or migrate.
>

There's nothing I've seen so far that causes me alarm,  but then again we're in 
the very early stages and haven't moved anything really complex.
However I think there's an inherent big difference in scaling something which 
is stateless like a DB proxy and scaling a statefull entity like a task 
workflow component.  I'd also suggest that so far there is no real experience 
with that latter within the current code base; compute nodes (which are the 
main scaled-out component so far) work on well defined subsets of the data.


> > So I guess my question is, given that it may have to address two
> > independent scale drivers, is putting task work flow and DB proxy
> > functionality into the same service really the right thing to do - or
> > should there be some separation between them.
> 
> I think that we're going to need more than one "task" node, and so it seems
> appropriate to locate one scales-with-computes function with another.
> 

I just don't buy into this line of thinking - I need more than one API node for 
HA as well - but that doesn't mean that therefore I want to put anything else 
that needs more than one node in there.

I don't even think these do scale-with-compute in the same way;  DB proxy 
scales with the number of compute hosts because each new host introduces an 
amount of DB load though its periodic tasks.Task work flow scales with the 
number of requests coming into  the system to create / modify servers - and 
that's not directly related to the number of hosts. 

So rather than asking "what doesn't work / might not work in the future" I 
think the question should be "aside from them both being things that could be 
described as a conductor - what's the architectural reason for wanting to have 
these two separate groups of functionality in the same service ?"

If it's really just because the concept of "conductor" got used for a DB proxy 
layer before the task workflow, then we should either think if a new name for 
the latter or rename the former.

If they were separate services and it turns out that I can/want/need to run the 
same number of both then I can pretty easily do that  - but the current 
approach is removing what to be seems a very important degree of freedom around 
deployment on a large scale system.

Cheers,
Phil


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-17 Thread Joshua Harlow
Hi Phil, 

I understand and appreciate your concern and I think everyone is trying to keep 
that in mind. It still appears to me to be to early in this refactoring and 
task restructuring effort to tell where it may "end up". I think that's also 
good news since we can get these kinds of ideas (componentized conductors if u 
will) to handle your (and mine) scaling concerns. It would be pretty neat if 
said conductors could be scaled at different rates depending on there 
component, although as u said we need to get much much better with handling 
said patterns (as u said just 2 schedulers is a pita right now). I believe we 
can do it, given the right kind of design and scaling "principles" we build in 
from the start (right now).

Would like to hear more of your ideas so they get incorporated earlier rather 
than later.

Sent from my really tiny device..

On Jul 16, 2013, at 9:55 AM, "Dan Smith"  wrote:

>> In the original context of using Conductor as a database proxy then
>> the number of conductor instances is directly related to the number
>> of compute hosts I need them to serve.
> 
> Just a point of note, as far as I know, the plan has always been to
> establish conductor as a thing that sits between the api and compute
> nodes. However, we started with the immediate need, which was the
> offloading of database traffic.
> 
>> What I not sure is that I would also want to have the same number of
>> conductor instances for task control flow - historically even running
>> 2 schedulers has been a problem, so the thought of having 10's of
>> them makes me very concerned at the moment.   However I can't see any
>> way to specialise a conductor to only handle one type of request.
> 
> Yeah, I don't think the way it's currently being done allows for
> specialization.
> 
> Since you were reviewing actual task code, can you offer any specifics
> about the thing(s) that concern you? I think that scaling conductor (and
> its tasks) horizontally is an important point we need to achieve, so if
> you see something that needs tweaking, please point it out.
> 
> Based on what is there now and proposed soon, I think it's mostly fairly
> safe, straightforward, and really no different than what two computes do
> when working together for something like resize or migrate.
> 
>> So I guess my question is, given that it may have to address two
>> independent scale drivers, is putting task work flow and DB proxy
>> functionality into the same service really the right thing to do - or
>> should there be some separation between them.
> 
> I think that we're going to need more than one "task" node, and so it
> seems appropriate to locate one scales-with-computes function with
> another.
> 
> Thanks!
> 
> --Dan
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Moving task flow to conductor - concern about scale

2013-07-16 Thread Dan Smith
> In the original context of using Conductor as a database proxy then
> the number of conductor instances is directly related to the number
> of compute hosts I need them to serve. 

Just a point of note, as far as I know, the plan has always been to
establish conductor as a thing that sits between the api and compute
nodes. However, we started with the immediate need, which was the
offloading of database traffic.

> What I not sure is that I would also want to have the same number of
> conductor instances for task control flow - historically even running
> 2 schedulers has been a problem, so the thought of having 10's of
> them makes me very concerned at the moment.   However I can't see any
> way to specialise a conductor to only handle one type of request.

Yeah, I don't think the way it's currently being done allows for
specialization.

Since you were reviewing actual task code, can you offer any specifics
about the thing(s) that concern you? I think that scaling conductor (and
its tasks) horizontally is an important point we need to achieve, so if
you see something that needs tweaking, please point it out.

Based on what is there now and proposed soon, I think it's mostly fairly
safe, straightforward, and really no different than what two computes do
when working together for something like resize or migrate.

> So I guess my question is, given that it may have to address two
> independent scale drivers, is putting task work flow and DB proxy
> functionality into the same service really the right thing to do - or
> should there be some separation between them.

I think that we're going to need more than one "task" node, and so it
seems appropriate to locate one scales-with-computes function with
another.

Thanks!

--Dan

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Moving task flow to conductor - concern about scale

2013-07-16 Thread Day, Phil
Hi Folks,

Reviewing some the changes to move control flows into conductor made me wonder 
about an issue that I haven't seen discussed so far (apologies if it was and 
I've missed it):

In the original context of using Conductor as a database proxy then the number 
of conductor instances is directly related to the number of compute hosts I 
need them to serve.   I don't have a fee for what this ratio is (as we haven't 
switched yet) but based on the discussions in Portland I have the expectation 
that even with the eventlet performance fix in place there could still need to 
be 10's for a large deployment.

What I not sure is that I would also want to have the same number of conductor 
instances for task control flow - historically even running 2 schedulers has 
been a problem, so the thought of having 10's of them makes me very concerned 
at the moment.   However I can't see any way to specialise a conductor to only 
handle one type of request.

So I guess my question is, given that it may have to address two independent 
scale drivers, is putting task work flow and DB proxy functionality into the 
same service really the right thing to do - or should there be some separation 
between them.

Don't get me wrong - I'm not against the concept of having the task work flow 
in a well defined place - just wondering if conductor is really the logical 
place to do it rather than , for example,  making this part of an extended set 
of functionality for the scheduler (which is already a separate service with 
its own scaling properties).

Thoughts ?

Phil
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev