Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-05 Thread Everett Toews
On Nov 3, 2015, at 11:46 PM, John Griffith 
> wrote:

On Tue, Nov 3, 2015 at 4:57 PM, michael mccune 
> wrote:
On 11/03/2015 05:20 PM, Boris Pavlovic wrote:
What if we add new API method that will just resturn resource status by
UUID? Or even just extend get request with the new argument that returns
only status?

Thoughts?

not sure i understand the resource status by UUID, could you explain that a 
little more.

as for changing the get request to return only the status, can't you have a 
filter on the get url that instructs it to return only the status?

​Yes, we already have that capability and it's used in a number of places.​



Relevant API guideline

http://specs.openstack.org/openstack/api-wg/guidelines/pagination_filter_sort.html#filtering

Everett
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-04 Thread Boris Pavlovic
John,

> Our resources are not. We've also had specific requests to prevent
> > header bloat because it impacts the HTTP caching systems. Also, it's
> > pretty clear that headers are really not where you want to put volatile
> > information, which this is.
> Hmm, you do make a good point about caching.



Caching is useful only in such cases when you would like to return same
data many times.
In our case we are interested in latest state of resource, such kinds of
things can't be cached.


> I think we should step back here and figure out what the actual problem
> > is, and what ways we might go about solving it. This has jumped directly
> > to a point in time optimized fast poll loop. It will shave a few cycles
> > off right now on our current implementation, but will still be orders of
> > magnitude more costly that consuming the Nova notifications if the only
> > thing that is cared about is task state transitions. And it's an API
> > change we have to live with largely *forever* so short term optimization
> > is not what we want to go for.
> I do agree with that.


The thing here is that we have to have Async API, because we have long
running operations.
And basically there are 3 approaches to understand that operation is done:
1) pub/sub
2) polling resource status
3) long polling requests

All approaches have pros and cons, however the "actual" problem will stay
the same and you can't fix that..


Best regards,
Boris Pavlovic

On Thu, Nov 5, 2015 at 12:18 AM, John Garbutt  wrote:

> On 4 November 2015 at 15:00, Sean Dague  wrote:
> > On 11/04/2015 09:49 AM, Jay Pipes wrote:
> >> On 11/04/2015 09:32 AM, Sean Dague wrote:
> >>> On 11/04/2015 09:00 AM, Jay Pipes wrote:
>  On 11/03/2015 05:20 PM, Boris Pavlovic wrote:
> > Hi stackers,
> >
> > Usually such projects like Heat, Tempest, Rally, Scalar, and other
> tool
> > that works with OpenStack are working with resources (e.g. VM,
> Volumes,
> > Images, ..) in the next way:
> >
> >   >>> resource = api.resouce_do_some_stuff()
> >   >>> while api.resource_get(resource["uuid"]) != expected_status
> >   >>>sleep(a_bit)
> >
> > For each async operation they are polling and call many times
> > resource_get() which creates significant load on API and DB layers
> due
> > the nature of this request. (Usually getting full information about
> > resources produces SQL requests that contains multiple JOINs, e,g for
> > nova vm it's 6 joins).
> >
> > What if we add new API method that will just resturn resource status
> by
> > UUID? Or even just extend get request with the new argument that
> > returns
> > only status?
> 
>  +1
> 
>  All APIs should have an HTTP HEAD call on important resources for
>  retrieving quick status information for the resource.
> 
>  In fact, I proposed exactly this in my Compute "vNext" API proposal:
> 
>  http://docs.oscomputevnext.apiary.io/#reference/server/serversid/head
> 
>  Swift's API supports HEAD for accounts:
> 
> 
> http://developer.openstack.org/api-ref-objectstorage-v1.html#showAccountMeta
> 
> 
> 
>  containers:
> 
> 
> http://developer.openstack.org/api-ref-objectstorage-v1.html#showContainerMeta
> 
> 
> 
>  and objects:
> 
> 
> http://developer.openstack.org/api-ref-objectstorage-v1.html#showObjectMeta
> 
> 
>  So, yeah, I agree.
>  -jay
> >>>
> >>> How would you expect this to work on "servers"? HEAD specifically
> >>> forbids returning a body, and, unlike swift, we don't return very much
> >>> information in our headers.
> >>
> >> I didn't propose doing it on a collection resource like "servers". Only
> >> on an entity resource like a single "server".
> >>
> >> HEAD /v2/{tenant}/servers/{uuid}
> >> HTTP/1.1 200 OK
> >> Content-Length: 1022
> >> Last-Modified: Thu, 16 Jan 2014 21:12:31 GMT
> >> Content-Type: application/json
> >> Date: Thu, 16 Jan 2014 21:13:19 GMT
> >> OpenStack-Compute-API-Server-VM-State: ACTIVE
> >> OpenStack-Compute-API-Server-Power-State: RUNNING
> >> OpenStack-Compute-API-Server-Task-State: NONE
> >
> > Right, but these headers aren't in the normal resource. They are
> > returned in the body only.
> >
> > The point of HEAD is give me the same thing as GET without the body,
> > because I only care about the headers. Swift resources are structured in
> > a way where this information is useful.
>
> I guess we would have to add this to GET requests, for consistency,
> which feels like duplication.
>
> > Our resources are not. We've also had specific requests to prevent
> > header bloat because it impacts the HTTP caching systems. Also, it's
> > pretty clear that headers are really not where you want to put volatile
> > information, which this is.
>
> Hmm, you do make a good point about caching.
>
> > I think we should step back here and figure out what the actual 

Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-04 Thread Boris Pavlovic
Sean,

This seems like a fundamental abuse of HTTP honestly. If you find
> yourself creating a ton of new headers, you are probably doing it wrong.


I totally agree on this. We shouldn't add a lot of HTTP headers. Imho why
not just return in body string with status (in my case).


> I think longer term we probably need a dedicated event service in
> OpenStack.


Unfortunately, this will work slower then current solution with JOINs,
require more resources and it will be very hard to use... (like you'll need
to add one more service to openstack, and use one more client..)


Best regards,
Boris Pavlovic


On Thu, Nov 5, 2015 at 12:42 AM, Sean Dague  wrote:

> On 11/04/2015 10:13 AM, John Garbutt wrote:
> > On 4 November 2015 at 14:49, Jay Pipes  wrote:
> >> On 11/04/2015 09:32 AM, Sean Dague wrote:
> >>>
> >>> On 11/04/2015 09:00 AM, Jay Pipes wrote:
> 
>  On 11/03/2015 05:20 PM, Boris Pavlovic wrote:
> >
> > Hi stackers,
> >
> > Usually such projects like Heat, Tempest, Rally, Scalar, and other
> tool
> > that works with OpenStack are working with resources (e.g. VM,
> Volumes,
> > Images, ..) in the next way:
> >
> >   >>> resource = api.resouce_do_some_stuff()
> >   >>> while api.resource_get(resource["uuid"]) != expected_status
> >   >>>sleep(a_bit)
> >
> > For each async operation they are polling and call many times
> > resource_get() which creates significant load on API and DB layers
> due
> > the nature of this request. (Usually getting full information about
> > resources produces SQL requests that contains multiple JOINs, e,g for
> > nova vm it's 6 joins).
> >
> > What if we add new API method that will just resturn resource status
> by
> > UUID? Or even just extend get request with the new argument that
> returns
> > only status?
> 
> 
>  +1
> 
>  All APIs should have an HTTP HEAD call on important resources for
>  retrieving quick status information for the resource.
> 
>  In fact, I proposed exactly this in my Compute "vNext" API proposal:
> 
>  http://docs.oscomputevnext.apiary.io/#reference/server/serversid/head
> 
>  Swift's API supports HEAD for accounts:
> 
> 
> 
> http://developer.openstack.org/api-ref-objectstorage-v1.html#showAccountMeta
> 
> 
>  containers:
> 
> 
> 
> http://developer.openstack.org/api-ref-objectstorage-v1.html#showContainerMeta
> 
> 
>  and objects:
> 
> 
> 
> http://developer.openstack.org/api-ref-objectstorage-v1.html#showObjectMeta
> 
>  So, yeah, I agree.
>  -jay
> >>>
> >>>
> >>> How would you expect this to work on "servers"? HEAD specifically
> >>> forbids returning a body, and, unlike swift, we don't return very much
> >>> information in our headers.
> >>
> >>
> >> I didn't propose doing it on a collection resource like "servers". Only
> on
> >> an entity resource like a single "server".
> >>
> >> HEAD /v2/{tenant}/servers/{uuid}
> >> HTTP/1.1 200 OK
> >> Content-Length: 1022
> >> Last-Modified: Thu, 16 Jan 2014 21:12:31 GMT
> >> Content-Type: application/json
> >> Date: Thu, 16 Jan 2014 21:13:19 GMT
> >> OpenStack-Compute-API-Server-VM-State: ACTIVE
> >> OpenStack-Compute-API-Server-Power-State: RUNNING
> >> OpenStack-Compute-API-Server-Task-State: NONE
> >
> > For polling, that sounds quite efficient and handy.
> >
> > For "servers" we could do this (I think there was a spec up that wanted
> this):
> >
> > HEAD /v2/{tenant}/servers
> > HTTP/1.1 200 OK
> > Content-Length: 1022
> > Last-Modified: Thu, 16 Jan 2014 21:12:31 GMT
> > Content-Type: application/json
> > Date: Thu, 16 Jan 2014 21:13:19 GMT
> > OpenStack-Compute-API-Server-Count: 13
>
> This seems like a fundamental abuse of HTTP honestly. If you find
> yourself creating a ton of new headers, you are probably doing it wrong.
>
> I do think the near term work around is to actually use Searchlight.
> They're monitoring the notifications bus for nova, and refreshing
> resources when they see a notification which might have changed it. It
> still means that Searchlight is hitting our API more than ideal, but at
> least only one service is doing so, and if the rest hit that instead
> they'll get the resource without any db hits (it's all through an
> elastic search cluster).
>
> I think longer term we probably need a dedicated event service in
> OpenStack. A few of us actually had an informal conversation about this
> during the Nova notifications session to figure out if there was a way
> to optimize the Searchlight path. Nearly everyone wants websockets,
> which is good. The problem is, that means you've got to anticipate
> 10,000+ open websockets as soon as we expose this. Which means the stack
> to deliver that sanely isn't just a bit of python code, it's also the
> highly optimized server underneath.
>
> So, I feel like with Searchlight we've got a work 

Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-04 Thread Boris Pavlovic
Robert,

I don't have the exactly numbers, but during the real testing of real
deployments I saw the impact of polling resource, this is one of the reason
why we have to add quite big sleep() during polling in Rally to reduce
amount of GET requests and avoid DDoS of OpenStack..

In any case it doesn't seem like hard task to collect the numbers.

Best regards,
Boris Pavlovic

On Thu, Nov 5, 2015 at 3:56 AM, Robert Collins 
wrote:

> On 5 November 2015 at 04:42, Sean Dague  wrote:
> > On 11/04/2015 10:13 AM, John Garbutt wrote:
>
> > I think longer term we probably need a dedicated event service in
> > OpenStack. A few of us actually had an informal conversation about this
> > during the Nova notifications session to figure out if there was a way
> > to optimize the Searchlight path. Nearly everyone wants websockets,
> > which is good. The problem is, that means you've got to anticipate
> > 10,000+ open websockets as soon as we expose this. Which means the stack
> > to deliver that sanely isn't just a bit of python code, it's also the
> > highly optimized server underneath.
>
> So any decent epoll implementation should let us hit that without a
> super optimised server - eventlet being in that category. I totally
> get that we're going to expect thundering herds, but websockets isn't
> new and the stacks we have - apache, eventlet - have been around long
> enough to adjust to the rather different scaling pattern.
>
> So - lets not panic, get a proof of concept up somewhere and then run
> an actual baseline test. If thats shockingly bad *then* lets panic.
>
> -Rob
>
>
> --
> Robert Collins 
> Distinguished Technologist
> HP Converged Cloud
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-04 Thread Robert Collins
On 5 November 2015 at 13:06, Boris Pavlovic  wrote:
> Robert,
>
> I don't have the exactly numbers, but during the real testing of real
> deployments I saw the impact of polling resource, this is one of the reason
> why we have to add quite big sleep() during polling in Rally to reduce
> amount of GET requests and avoid DDoS of OpenStack..
>
> In any case it doesn't seem like hard task to collect the numbers.

Please do!.

But for clarity - in case the sub-thread wasn't clear - I was talking
about the numbers for a websocket based push thing, not polling.

-Rob



-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-04 Thread Robert Collins
On 5 November 2015 at 04:42, Sean Dague  wrote:
> On 11/04/2015 10:13 AM, John Garbutt wrote:

> I think longer term we probably need a dedicated event service in
> OpenStack. A few of us actually had an informal conversation about this
> during the Nova notifications session to figure out if there was a way
> to optimize the Searchlight path. Nearly everyone wants websockets,
> which is good. The problem is, that means you've got to anticipate
> 10,000+ open websockets as soon as we expose this. Which means the stack
> to deliver that sanely isn't just a bit of python code, it's also the
> highly optimized server underneath.

So any decent epoll implementation should let us hit that without a
super optimised server - eventlet being in that category. I totally
get that we're going to expect thundering herds, but websockets isn't
new and the stacks we have - apache, eventlet - have been around long
enough to adjust to the rather different scaling pattern.

So - lets not panic, get a proof of concept up somewhere and then run
an actual baseline test. If thats shockingly bad *then* lets panic.

-Rob


-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-04 Thread Sean Dague
On 11/04/2015 09:00 AM, Jay Pipes wrote:
> On 11/03/2015 05:20 PM, Boris Pavlovic wrote:
>> Hi stackers,
>>
>> Usually such projects like Heat, Tempest, Rally, Scalar, and other tool
>> that works with OpenStack are working with resources (e.g. VM, Volumes,
>> Images, ..) in the next way:
>>
>>  >>> resource = api.resouce_do_some_stuff()
>>  >>> while api.resource_get(resource["uuid"]) != expected_status
>>  >>>sleep(a_bit)
>>
>> For each async operation they are polling and call many times
>> resource_get() which creates significant load on API and DB layers due
>> the nature of this request. (Usually getting full information about
>> resources produces SQL requests that contains multiple JOINs, e,g for
>> nova vm it's 6 joins).
>>
>> What if we add new API method that will just resturn resource status by
>> UUID? Or even just extend get request with the new argument that returns
>> only status?
> 
> +1
> 
> All APIs should have an HTTP HEAD call on important resources for
> retrieving quick status information for the resource.
> 
> In fact, I proposed exactly this in my Compute "vNext" API proposal:
> 
> http://docs.oscomputevnext.apiary.io/#reference/server/serversid/head
> 
> Swift's API supports HEAD for accounts:
> 
> http://developer.openstack.org/api-ref-objectstorage-v1.html#showAccountMeta
> 
> 
> containers:
> 
> http://developer.openstack.org/api-ref-objectstorage-v1.html#showContainerMeta
> 
> 
> and objects:
> 
> http://developer.openstack.org/api-ref-objectstorage-v1.html#showObjectMeta
> 
> So, yeah, I agree.
> -jay

How would you expect this to work on "servers"? HEAD specifically
forbids returning a body, and, unlike swift, we don't return very much
information in our headers.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-04 Thread John Garbutt
On 4 November 2015 at 14:49, Jay Pipes  wrote:
> On 11/04/2015 09:32 AM, Sean Dague wrote:
>>
>> On 11/04/2015 09:00 AM, Jay Pipes wrote:
>>>
>>> On 11/03/2015 05:20 PM, Boris Pavlovic wrote:

 Hi stackers,

 Usually such projects like Heat, Tempest, Rally, Scalar, and other tool
 that works with OpenStack are working with resources (e.g. VM, Volumes,
 Images, ..) in the next way:

   >>> resource = api.resouce_do_some_stuff()
   >>> while api.resource_get(resource["uuid"]) != expected_status
   >>>sleep(a_bit)

 For each async operation they are polling and call many times
 resource_get() which creates significant load on API and DB layers due
 the nature of this request. (Usually getting full information about
 resources produces SQL requests that contains multiple JOINs, e,g for
 nova vm it's 6 joins).

 What if we add new API method that will just resturn resource status by
 UUID? Or even just extend get request with the new argument that returns
 only status?
>>>
>>>
>>> +1
>>>
>>> All APIs should have an HTTP HEAD call on important resources for
>>> retrieving quick status information for the resource.
>>>
>>> In fact, I proposed exactly this in my Compute "vNext" API proposal:
>>>
>>> http://docs.oscomputevnext.apiary.io/#reference/server/serversid/head
>>>
>>> Swift's API supports HEAD for accounts:
>>>
>>>
>>> http://developer.openstack.org/api-ref-objectstorage-v1.html#showAccountMeta
>>>
>>>
>>> containers:
>>>
>>>
>>> http://developer.openstack.org/api-ref-objectstorage-v1.html#showContainerMeta
>>>
>>>
>>> and objects:
>>>
>>>
>>> http://developer.openstack.org/api-ref-objectstorage-v1.html#showObjectMeta
>>>
>>> So, yeah, I agree.
>>> -jay
>>
>>
>> How would you expect this to work on "servers"? HEAD specifically
>> forbids returning a body, and, unlike swift, we don't return very much
>> information in our headers.
>
>
> I didn't propose doing it on a collection resource like "servers". Only on
> an entity resource like a single "server".
>
> HEAD /v2/{tenant}/servers/{uuid}
> HTTP/1.1 200 OK
> Content-Length: 1022
> Last-Modified: Thu, 16 Jan 2014 21:12:31 GMT
> Content-Type: application/json
> Date: Thu, 16 Jan 2014 21:13:19 GMT
> OpenStack-Compute-API-Server-VM-State: ACTIVE
> OpenStack-Compute-API-Server-Power-State: RUNNING
> OpenStack-Compute-API-Server-Task-State: NONE

For polling, that sounds quite efficient and handy.

For "servers" we could do this (I think there was a spec up that wanted this):

HEAD /v2/{tenant}/servers
HTTP/1.1 200 OK
Content-Length: 1022
Last-Modified: Thu, 16 Jan 2014 21:12:31 GMT
Content-Type: application/json
Date: Thu, 16 Jan 2014 21:13:19 GMT
OpenStack-Compute-API-Server-Count: 13

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-04 Thread Jay Pipes

On 11/04/2015 09:32 AM, Sean Dague wrote:

On 11/04/2015 09:00 AM, Jay Pipes wrote:

On 11/03/2015 05:20 PM, Boris Pavlovic wrote:

Hi stackers,

Usually such projects like Heat, Tempest, Rally, Scalar, and other tool
that works with OpenStack are working with resources (e.g. VM, Volumes,
Images, ..) in the next way:

  >>> resource = api.resouce_do_some_stuff()
  >>> while api.resource_get(resource["uuid"]) != expected_status
  >>>sleep(a_bit)

For each async operation they are polling and call many times
resource_get() which creates significant load on API and DB layers due
the nature of this request. (Usually getting full information about
resources produces SQL requests that contains multiple JOINs, e,g for
nova vm it's 6 joins).

What if we add new API method that will just resturn resource status by
UUID? Or even just extend get request with the new argument that returns
only status?


+1

All APIs should have an HTTP HEAD call on important resources for
retrieving quick status information for the resource.

In fact, I proposed exactly this in my Compute "vNext" API proposal:

http://docs.oscomputevnext.apiary.io/#reference/server/serversid/head

Swift's API supports HEAD for accounts:

http://developer.openstack.org/api-ref-objectstorage-v1.html#showAccountMeta


containers:

http://developer.openstack.org/api-ref-objectstorage-v1.html#showContainerMeta


and objects:

http://developer.openstack.org/api-ref-objectstorage-v1.html#showObjectMeta

So, yeah, I agree.
-jay


How would you expect this to work on "servers"? HEAD specifically
forbids returning a body, and, unlike swift, we don't return very much
information in our headers.


I didn't propose doing it on a collection resource like "servers". Only 
on an entity resource like a single "server".


HEAD /v2/{tenant}/servers/{uuid}
HTTP/1.1 200 OK
Content-Length: 1022
Last-Modified: Thu, 16 Jan 2014 21:12:31 GMT
Content-Type: application/json
Date: Thu, 16 Jan 2014 21:13:19 GMT
OpenStack-Compute-API-Server-VM-State: ACTIVE
OpenStack-Compute-API-Server-Power-State: RUNNING
OpenStack-Compute-API-Server-Task-State: NONE

Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-04 Thread Sean Dague
On 11/04/2015 09:49 AM, Jay Pipes wrote:
> On 11/04/2015 09:32 AM, Sean Dague wrote:
>> On 11/04/2015 09:00 AM, Jay Pipes wrote:
>>> On 11/03/2015 05:20 PM, Boris Pavlovic wrote:
 Hi stackers,

 Usually such projects like Heat, Tempest, Rally, Scalar, and other tool
 that works with OpenStack are working with resources (e.g. VM, Volumes,
 Images, ..) in the next way:

   >>> resource = api.resouce_do_some_stuff()
   >>> while api.resource_get(resource["uuid"]) != expected_status
   >>>sleep(a_bit)

 For each async operation they are polling and call many times
 resource_get() which creates significant load on API and DB layers due
 the nature of this request. (Usually getting full information about
 resources produces SQL requests that contains multiple JOINs, e,g for
 nova vm it's 6 joins).

 What if we add new API method that will just resturn resource status by
 UUID? Or even just extend get request with the new argument that
 returns
 only status?
>>>
>>> +1
>>>
>>> All APIs should have an HTTP HEAD call on important resources for
>>> retrieving quick status information for the resource.
>>>
>>> In fact, I proposed exactly this in my Compute "vNext" API proposal:
>>>
>>> http://docs.oscomputevnext.apiary.io/#reference/server/serversid/head
>>>
>>> Swift's API supports HEAD for accounts:
>>>
>>> http://developer.openstack.org/api-ref-objectstorage-v1.html#showAccountMeta
>>>
>>>
>>>
>>> containers:
>>>
>>> http://developer.openstack.org/api-ref-objectstorage-v1.html#showContainerMeta
>>>
>>>
>>>
>>> and objects:
>>>
>>> http://developer.openstack.org/api-ref-objectstorage-v1.html#showObjectMeta
>>>
>>>
>>> So, yeah, I agree.
>>> -jay
>>
>> How would you expect this to work on "servers"? HEAD specifically
>> forbids returning a body, and, unlike swift, we don't return very much
>> information in our headers.
> 
> I didn't propose doing it on a collection resource like "servers". Only
> on an entity resource like a single "server".
> 
> HEAD /v2/{tenant}/servers/{uuid}
> HTTP/1.1 200 OK
> Content-Length: 1022
> Last-Modified: Thu, 16 Jan 2014 21:12:31 GMT
> Content-Type: application/json
> Date: Thu, 16 Jan 2014 21:13:19 GMT
> OpenStack-Compute-API-Server-VM-State: ACTIVE
> OpenStack-Compute-API-Server-Power-State: RUNNING
> OpenStack-Compute-API-Server-Task-State: NONE

Right, but these headers aren't in the normal resource. They are
returned in the body only.

The point of HEAD is give me the same thing as GET without the body,
because I only care about the headers. Swift resources are structured in
a way where this information is useful.

Our resources are not. We've also had specific requests to prevent
header bloat because it impacts the HTTP caching systems. Also, it's
pretty clear that headers are really not where you want to put volatile
information, which this is.

I think we should step back here and figure out what the actual problem
is, and what ways we might go about solving it. This has jumped directly
to a point in time optimized fast poll loop. It will shave a few cycles
off right now on our current implementation, but will still be orders of
magnitude more costly that consuming the Nova notifications if the only
thing that is cared about is task state transitions. And it's an API
change we have to live with largely *forever* so short term optimization
is not what we want to go for. We should focus on the long term game here.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-04 Thread John Garbutt
On 4 November 2015 at 15:00, Sean Dague  wrote:
> On 11/04/2015 09:49 AM, Jay Pipes wrote:
>> On 11/04/2015 09:32 AM, Sean Dague wrote:
>>> On 11/04/2015 09:00 AM, Jay Pipes wrote:
 On 11/03/2015 05:20 PM, Boris Pavlovic wrote:
> Hi stackers,
>
> Usually such projects like Heat, Tempest, Rally, Scalar, and other tool
> that works with OpenStack are working with resources (e.g. VM, Volumes,
> Images, ..) in the next way:
>
>   >>> resource = api.resouce_do_some_stuff()
>   >>> while api.resource_get(resource["uuid"]) != expected_status
>   >>>sleep(a_bit)
>
> For each async operation they are polling and call many times
> resource_get() which creates significant load on API and DB layers due
> the nature of this request. (Usually getting full information about
> resources produces SQL requests that contains multiple JOINs, e,g for
> nova vm it's 6 joins).
>
> What if we add new API method that will just resturn resource status by
> UUID? Or even just extend get request with the new argument that
> returns
> only status?

 +1

 All APIs should have an HTTP HEAD call on important resources for
 retrieving quick status information for the resource.

 In fact, I proposed exactly this in my Compute "vNext" API proposal:

 http://docs.oscomputevnext.apiary.io/#reference/server/serversid/head

 Swift's API supports HEAD for accounts:

 http://developer.openstack.org/api-ref-objectstorage-v1.html#showAccountMeta



 containers:

 http://developer.openstack.org/api-ref-objectstorage-v1.html#showContainerMeta



 and objects:

 http://developer.openstack.org/api-ref-objectstorage-v1.html#showObjectMeta


 So, yeah, I agree.
 -jay
>>>
>>> How would you expect this to work on "servers"? HEAD specifically
>>> forbids returning a body, and, unlike swift, we don't return very much
>>> information in our headers.
>>
>> I didn't propose doing it on a collection resource like "servers". Only
>> on an entity resource like a single "server".
>>
>> HEAD /v2/{tenant}/servers/{uuid}
>> HTTP/1.1 200 OK
>> Content-Length: 1022
>> Last-Modified: Thu, 16 Jan 2014 21:12:31 GMT
>> Content-Type: application/json
>> Date: Thu, 16 Jan 2014 21:13:19 GMT
>> OpenStack-Compute-API-Server-VM-State: ACTIVE
>> OpenStack-Compute-API-Server-Power-State: RUNNING
>> OpenStack-Compute-API-Server-Task-State: NONE
>
> Right, but these headers aren't in the normal resource. They are
> returned in the body only.
>
> The point of HEAD is give me the same thing as GET without the body,
> because I only care about the headers. Swift resources are structured in
> a way where this information is useful.

I guess we would have to add this to GET requests, for consistency,
which feels like duplication.

> Our resources are not. We've also had specific requests to prevent
> header bloat because it impacts the HTTP caching systems. Also, it's
> pretty clear that headers are really not where you want to put volatile
> information, which this is.

Hmm, you do make a good point about caching.

> I think we should step back here and figure out what the actual problem
> is, and what ways we might go about solving it. This has jumped directly
> to a point in time optimized fast poll loop. It will shave a few cycles
> off right now on our current implementation, but will still be orders of
> magnitude more costly that consuming the Nova notifications if the only
> thing that is cared about is task state transitions. And it's an API
> change we have to live with largely *forever* so short term optimization
> is not what we want to go for.

I do agree with that.

> We should focus on the long term game here.

The long term plan being the end user async API? Maybe using
websockets, or similar?
https://etherpad.openstack.org/p/liberty-cross-project-user-notifications

Thanks,
johnthetubaguy

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-04 Thread Jay Pipes

On 11/03/2015 05:20 PM, Boris Pavlovic wrote:

Hi stackers,

Usually such projects like Heat, Tempest, Rally, Scalar, and other tool
that works with OpenStack are working with resources (e.g. VM, Volumes,
Images, ..) in the next way:

 >>> resource = api.resouce_do_some_stuff()
 >>> while api.resource_get(resource["uuid"]) != expected_status
 >>>sleep(a_bit)

For each async operation they are polling and call many times
resource_get() which creates significant load on API and DB layers due
the nature of this request. (Usually getting full information about
resources produces SQL requests that contains multiple JOINs, e,g for
nova vm it's 6 joins).

What if we add new API method that will just resturn resource status by
UUID? Or even just extend get request with the new argument that returns
only status?


+1

All APIs should have an HTTP HEAD call on important resources for 
retrieving quick status information for the resource.


In fact, I proposed exactly this in my Compute "vNext" API proposal:

http://docs.oscomputevnext.apiary.io/#reference/server/serversid/head

Swift's API supports HEAD for accounts:

http://developer.openstack.org/api-ref-objectstorage-v1.html#showAccountMeta

containers:

http://developer.openstack.org/api-ref-objectstorage-v1.html#showContainerMeta

and objects:

http://developer.openstack.org/api-ref-objectstorage-v1.html#showObjectMeta

So, yeah, I agree.
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-04 Thread Sean Dague
On 11/04/2015 10:13 AM, John Garbutt wrote:
> On 4 November 2015 at 14:49, Jay Pipes  wrote:
>> On 11/04/2015 09:32 AM, Sean Dague wrote:
>>>
>>> On 11/04/2015 09:00 AM, Jay Pipes wrote:

 On 11/03/2015 05:20 PM, Boris Pavlovic wrote:
>
> Hi stackers,
>
> Usually such projects like Heat, Tempest, Rally, Scalar, and other tool
> that works with OpenStack are working with resources (e.g. VM, Volumes,
> Images, ..) in the next way:
>
>   >>> resource = api.resouce_do_some_stuff()
>   >>> while api.resource_get(resource["uuid"]) != expected_status
>   >>>sleep(a_bit)
>
> For each async operation they are polling and call many times
> resource_get() which creates significant load on API and DB layers due
> the nature of this request. (Usually getting full information about
> resources produces SQL requests that contains multiple JOINs, e,g for
> nova vm it's 6 joins).
>
> What if we add new API method that will just resturn resource status by
> UUID? Or even just extend get request with the new argument that returns
> only status?


 +1

 All APIs should have an HTTP HEAD call on important resources for
 retrieving quick status information for the resource.

 In fact, I proposed exactly this in my Compute "vNext" API proposal:

 http://docs.oscomputevnext.apiary.io/#reference/server/serversid/head

 Swift's API supports HEAD for accounts:


 http://developer.openstack.org/api-ref-objectstorage-v1.html#showAccountMeta


 containers:


 http://developer.openstack.org/api-ref-objectstorage-v1.html#showContainerMeta


 and objects:


 http://developer.openstack.org/api-ref-objectstorage-v1.html#showObjectMeta

 So, yeah, I agree.
 -jay
>>>
>>>
>>> How would you expect this to work on "servers"? HEAD specifically
>>> forbids returning a body, and, unlike swift, we don't return very much
>>> information in our headers.
>>
>>
>> I didn't propose doing it on a collection resource like "servers". Only on
>> an entity resource like a single "server".
>>
>> HEAD /v2/{tenant}/servers/{uuid}
>> HTTP/1.1 200 OK
>> Content-Length: 1022
>> Last-Modified: Thu, 16 Jan 2014 21:12:31 GMT
>> Content-Type: application/json
>> Date: Thu, 16 Jan 2014 21:13:19 GMT
>> OpenStack-Compute-API-Server-VM-State: ACTIVE
>> OpenStack-Compute-API-Server-Power-State: RUNNING
>> OpenStack-Compute-API-Server-Task-State: NONE
> 
> For polling, that sounds quite efficient and handy.
> 
> For "servers" we could do this (I think there was a spec up that wanted this):
> 
> HEAD /v2/{tenant}/servers
> HTTP/1.1 200 OK
> Content-Length: 1022
> Last-Modified: Thu, 16 Jan 2014 21:12:31 GMT
> Content-Type: application/json
> Date: Thu, 16 Jan 2014 21:13:19 GMT
> OpenStack-Compute-API-Server-Count: 13

This seems like a fundamental abuse of HTTP honestly. If you find
yourself creating a ton of new headers, you are probably doing it wrong.

I do think the near term work around is to actually use Searchlight.
They're monitoring the notifications bus for nova, and refreshing
resources when they see a notification which might have changed it. It
still means that Searchlight is hitting our API more than ideal, but at
least only one service is doing so, and if the rest hit that instead
they'll get the resource without any db hits (it's all through an
elastic search cluster).

I think longer term we probably need a dedicated event service in
OpenStack. A few of us actually had an informal conversation about this
during the Nova notifications session to figure out if there was a way
to optimize the Searchlight path. Nearly everyone wants websockets,
which is good. The problem is, that means you've got to anticipate
10,000+ open websockets as soon as we expose this. Which means the stack
to deliver that sanely isn't just a bit of python code, it's also the
highly optimized server underneath.

So, I feel like with Searchlight we've got a work around that's more
efficient than we're going to make with an API that we really don't want
to support down the road. Because I definitely don't want to make
general purpose search a thing inside every service, as in order to make
it efficient we're going to have to reimplement most of searchlight in
the services.

Instead of spending the energy on this path, it would be much better to
push forward on the end user events path, which is really the long term
model we want.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-04 Thread gord chung

apologies, if the below was mentioned at some point in this thread.

On 04/11/2015 10:42 AM, Sean Dague wrote:

This seems like a fundamental abuse of HTTP honestly. If you find
yourself creating a ton of new headers, you are probably doing it wrong.
if we want to explore the HTTP path, did we consider using ETags[1] to 
check whether resources have changed? it's something used by Gnocchi's 
API to handle resource changes.


I do think the near term work around is to actually use Searchlight.
They're monitoring the notifications bus for nova, and refreshing
resources when they see a notification which might have changed it. It
still means that Searchlight is hitting our API more than ideal, but at
least only one service is doing so, and if the rest hit that instead
they'll get the resource without any db hits (it's all through an
elastic search cluster).

I think longer term we probably need a dedicated event service in
OpenStack. A few of us actually had an informal conversation about this
during the Nova notifications session to figure out if there was a way
to optimize the Searchlight path. Nearly everyone wants websockets,
which is good. The problem is, that means you've got to anticipate
10,000+ open websockets as soon as we expose this. Which means the stack
to deliver that sanely isn't just a bit of python code, it's also the
highly optimized server underneath.
as part of the StackTach integration efforts, Ceilometer (as of Juno) 
listens to all notifications in the OpenStack ecosystem and builds a 
normalised event model[2] from it. the normalised event data is stored 
in a backend (elasticsearch, sql, mongodb, hbase) and from this you can 
query based on required attributes. in addition to storing events, in 
Liberty, Aodh (alarming service) added support to take events and create 
alarms based on change of state[3] with expanded functionality to be 
added. this was added to handle the NFV use case but may also be 
relevant here as it seems like we want to have an action based on status 
changes.


i should mention that we discussed splitting out the event logic in 
Ceilometer to create a generic listener[4] service which could convert 
notification data to meters, events, and anything else. this isn't a 
high priority item but might be an integration point for those looking 
to leverage notifications in OpenStack.


[1] https://en.wikipedia.org/wiki/HTTP_ETag
[2] http://docs.openstack.org/admin-guide-cloud/telemetry-events.html
[3] 
http://specs.openstack.org/openstack/ceilometer-specs/specs/liberty/event-alarm-evaluator.html

[4] https://etherpad.openstack.org/p/mitaka-telemetry-split

cheers,

--
gord


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-04 Thread Chris Friesen

On 11/03/2015 11:45 PM, John Griffith wrote:



On Tue, Nov 3, 2015 at 3:20 PM, Boris Pavlovic > wrote:

Hi stackers,

Usually such projects like Heat, Tempest, Rally, Scalar, and other tool that
works with OpenStack are working with resources (e.g. VM, Volumes, Images,
..) in the next way:

 >>> resource = api.resouce_do_some_stuff()
 >>> while api.resource_get(resource["uuid"]) != expected_status
 >>>sleep(a_bit)

For each async operation they are polling and call many times resource_get()
which creates significant load on API and DB layers due the nature of this
request. (Usually getting full information about resources produces SQL
requests that contains multiple JOINs, e,g for nova vm it's 6 joins).

What if we add new API method that will just resturn resource status by
UUID? Or even just extend get request with the new argument that returns
only status?



​Hey Boris,

As I asked in IRC, I'm kinda curious what the difference is here in terms of API
and DB calls.  I very well might be missing an idea here, but currently we do a
get by ID in that loop that you mention, the only difference I see in what
you're suggesting is a reduced payload maybe?  A response that only includes the
status?

I may be missing an important idea here, but it seems to me that you would still
have the same number of API calls and DB request, just possibly a slightly
smaller payload.  Let me know if I'm missing the idea here.



I think the idea is that we would only retrieve resource status rather than the 
full information about the resource.  In doing so we would:


1) Reduce the load on the DB due to doing fewer JOINs and retrieving less data.
2) Reduce the message payload.

I suspect that the first one is more important.

Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-03 Thread Boris Pavlovic
Hi stackers,

Usually such projects like Heat, Tempest, Rally, Scalar, and other tool
that works with OpenStack are working with resources (e.g. VM, Volumes,
Images, ..) in the next way:

>>> resource = api.resouce_do_some_stuff()
>>> while api.resource_get(resource["uuid"]) != expected_status
>>>sleep(a_bit)

For each async operation they are polling and call many times
resource_get() which creates significant load on API and DB layers due the
nature of this request. (Usually getting full information about resources
produces SQL requests that contains multiple JOINs, e,g for nova vm it's 6
joins).

What if we add new API method that will just resturn resource status by
UUID? Or even just extend get request with the new argument that returns
only status?

Thoughts?


Best regards,
Boris Pavlovic
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-03 Thread Morgan Fainberg
On Nov 3, 2015 4:29 PM, "Clint Byrum"  wrote:
>
> Excerpts from Boris Pavlovic's message of 2015-11-03 14:20:10 -0800:
> > Hi stackers,
> >
> > Usually such projects like Heat, Tempest, Rally, Scalar, and other tool
> > that works with OpenStack are working with resources (e.g. VM, Volumes,
> > Images, ..) in the next way:
> >
> > >>> resource = api.resouce_do_some_stuff()
> > >>> while api.resource_get(resource["uuid"]) != expected_status
> > >>>sleep(a_bit)
> >
> > For each async operation they are polling and call many times
> > resource_get() which creates significant load on API and DB layers due
the
> > nature of this request. (Usually getting full information about
resources
> > produces SQL requests that contains multiple JOINs, e,g for nova vm
it's 6
> > joins).
> >
> > What if we add new API method that will just resturn resource status by
> > UUID? Or even just extend get request with the new argument that returns
> > only status?
>
> I like the idea of being able pass in the set of fields you want to
> see with each get. In SQL, often times only passing in indexed fields
> will allow a query to be entirely serviced by a brief range scan in
> the B-tree. For instance, if you have an index on '(UUID, status)',
> then this lookup will be a single read from an index in MySQL/MariaDB:
>
> SELECT status FROM instances WHERE UUID='foo';
>
> The explain on this will say 'Using index' and basically you'll just do
> a range scan on the UUID portion, and only find one entry, which will
> be lightning fast, and return only status since it already has it there
> in the index. Maintaining the index is not free, but probably worth it
> if your users really do poll this way a lot.
>
> That said, this is optimizing for polling, and I'm not a huge fan. I'd
> much rather see a pub/sub model added to the API, so that users can
> simply subscribe to changes in resources, and poll only when a very long
> timeout has passed. This will reduce load on API services, databases,

++ this is a much better long term solution if we are investing engineering
resources along these lines.

> caches, etc. There was a thread some time ago about using Nova's built
> in notifications to produce an Atom feed per-project. That seems like
> a much more scalable model, as even polling just that super fast query
> will still incur quite a bit more cost than a GET with If-Modified-Since
> on a single xml file.
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-03 Thread Boris Pavlovic
Clint, Morgan,

I totally agree that the pub/sub model is better approach.

However, there are 2 great things about polling:
1) it's simpler to use than pub/sub (especially in shell)
2) it has really simple implementation & we can get this in OpenStack in
few days/weeks

What about just supporting both approaches?


Best regards,
Boris Pavlovic

On Wed, Nov 4, 2015 at 9:33 AM, Morgan Fainberg 
wrote:

>
> On Nov 3, 2015 4:29 PM, "Clint Byrum"  wrote:
> >
> > Excerpts from Boris Pavlovic's message of 2015-11-03 14:20:10 -0800:
> > > Hi stackers,
> > >
> > > Usually such projects like Heat, Tempest, Rally, Scalar, and other tool
> > > that works with OpenStack are working with resources (e.g. VM, Volumes,
> > > Images, ..) in the next way:
> > >
> > > >>> resource = api.resouce_do_some_stuff()
> > > >>> while api.resource_get(resource["uuid"]) != expected_status
> > > >>>sleep(a_bit)
> > >
> > > For each async operation they are polling and call many times
> > > resource_get() which creates significant load on API and DB layers due
> the
> > > nature of this request. (Usually getting full information about
> resources
> > > produces SQL requests that contains multiple JOINs, e,g for nova vm
> it's 6
> > > joins).
> > >
> > > What if we add new API method that will just resturn resource status by
> > > UUID? Or even just extend get request with the new argument that
> returns
> > > only status?
> >
> > I like the idea of being able pass in the set of fields you want to
> > see with each get. In SQL, often times only passing in indexed fields
> > will allow a query to be entirely serviced by a brief range scan in
> > the B-tree. For instance, if you have an index on '(UUID, status)',
> > then this lookup will be a single read from an index in MySQL/MariaDB:
> >
> > SELECT status FROM instances WHERE UUID='foo';
> >
> > The explain on this will say 'Using index' and basically you'll just do
> > a range scan on the UUID portion, and only find one entry, which will
> > be lightning fast, and return only status since it already has it there
> > in the index. Maintaining the index is not free, but probably worth it
> > if your users really do poll this way a lot.
> >
> > That said, this is optimizing for polling, and I'm not a huge fan. I'd
> > much rather see a pub/sub model added to the API, so that users can
> > simply subscribe to changes in resources, and poll only when a very long
> > timeout has passed. This will reduce load on API services, databases,
>
> ++ this is a much better long term solution if we are investing
> engineering resources along these lines.
>
> > caches, etc. There was a thread some time ago about using Nova's built
> > in notifications to produce an Atom feed per-project. That seems like
> > a much more scalable model, as even polling just that super fast query
> > will still incur quite a bit more cost than a GET with If-Modified-Since
> > on a single xml file.
> >
> >
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-03 Thread michael mccune

On 11/03/2015 05:20 PM, Boris Pavlovic wrote:

What if we add new API method that will just resturn resource status by
UUID? Or even just extend get request with the new argument that returns
only status?

Thoughts?


not sure i understand the resource status by UUID, could you explain 
that a little more.


as for changing the get request to return only the status, can't you 
have a filter on the get url that instructs it to return only the status?


mike

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-03 Thread Clint Byrum
Excerpts from Boris Pavlovic's message of 2015-11-03 14:20:10 -0800:
> Hi stackers,
> 
> Usually such projects like Heat, Tempest, Rally, Scalar, and other tool
> that works with OpenStack are working with resources (e.g. VM, Volumes,
> Images, ..) in the next way:
> 
> >>> resource = api.resouce_do_some_stuff()
> >>> while api.resource_get(resource["uuid"]) != expected_status
> >>>sleep(a_bit)
> 
> For each async operation they are polling and call many times
> resource_get() which creates significant load on API and DB layers due the
> nature of this request. (Usually getting full information about resources
> produces SQL requests that contains multiple JOINs, e,g for nova vm it's 6
> joins).
> 
> What if we add new API method that will just resturn resource status by
> UUID? Or even just extend get request with the new argument that returns
> only status?

I like the idea of being able pass in the set of fields you want to
see with each get. In SQL, often times only passing in indexed fields
will allow a query to be entirely serviced by a brief range scan in
the B-tree. For instance, if you have an index on '(UUID, status)',
then this lookup will be a single read from an index in MySQL/MariaDB:

SELECT status FROM instances WHERE UUID='foo';

The explain on this will say 'Using index' and basically you'll just do
a range scan on the UUID portion, and only find one entry, which will
be lightning fast, and return only status since it already has it there
in the index. Maintaining the index is not free, but probably worth it
if your users really do poll this way a lot.

That said, this is optimizing for polling, and I'm not a huge fan. I'd
much rather see a pub/sub model added to the API, so that users can
simply subscribe to changes in resources, and poll only when a very long
timeout has passed. This will reduce load on API services, databases,
caches, etc. There was a thread some time ago about using Nova's built
in notifications to produce an Atom feed per-project. That seems like
a much more scalable model, as even polling just that super fast query
will still incur quite a bit more cost than a GET with If-Modified-Since
on a single xml file.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-03 Thread John Griffith
On Tue, Nov 3, 2015 at 3:20 PM, Boris Pavlovic  wrote:

> Hi stackers,
>
> Usually such projects like Heat, Tempest, Rally, Scalar, and other tool
> that works with OpenStack are working with resources (e.g. VM, Volumes,
> Images, ..) in the next way:
>
> >>> resource = api.resouce_do_some_stuff()
> >>> while api.resource_get(resource["uuid"]) != expected_status
> >>>sleep(a_bit)
>
> For each async operation they are polling and call many times
> resource_get() which creates significant load on API and DB layers due the
> nature of this request. (Usually getting full information about resources
> produces SQL requests that contains multiple JOINs, e,g for nova vm it's 6
> joins).
>
> What if we add new API method that will just resturn resource status by
> UUID? Or even just extend get request with the new argument that returns
> only status?
>
> Thoughts?
>
>
> Best regards,
> Boris Pavlovic
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> ​Hey Boris,

As I asked in IRC, I'm kinda curious what the difference is here in terms
of API and DB calls.  I very well might be missing an idea here, but
currently we do a get by ID in that loop that you mention, the only
difference I see in what you're suggesting is a reduced payload maybe?  A
response that only includes the status?

I may be missing an important idea here, but it seems to me that you would
still have the same number of API calls and DB request, just possibly a
slightly smaller payload.  Let me know if I'm missing the idea here.

Thanks,
John​
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-03 Thread John Griffith
On Tue, Nov 3, 2015 at 4:57 PM, michael mccune  wrote:

> On 11/03/2015 05:20 PM, Boris Pavlovic wrote:
>
>> What if we add new API method that will just resturn resource status by
>> UUID? Or even just extend get request with the new argument that returns
>> only status?
>>
>> Thoughts?
>>
>
> not sure i understand the resource status by UUID, could you explain that
> a little more.
>
> as for changing the get request to return only the status, can't you have
> a filter on the get url that instructs it to return only the status?
>

​Yes, we already have that capability and it's used in a number of places.​


>
> mike
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-03 Thread Boris Pavlovic
John,


The main point here is to reduce amount of data that we request from DB and
that is process by API services and sent via network
and make SQL requests simpler (remove joins from SELECT).

So like if you fetch 10 bytes instead of 1000 bytes you will process 100
times less and it will scale 100 timer better and work overall 100 time
faster.

>From other side polling may easily cause 100 API requests / second And
create significant load on the cloud.

Clint,

Please do not forget abut the fact that we are removing from SQL requests
JOINs.

Here is how look SQL request that gets VM info:
http://paste.openstack.org/show/477934/ (it has 6 joins)

This is how it looks for glance image:
http://paste.openstack.org/show/477933/ (it has 2 joins)

So the performance/scale impact will be higher.

Best regards,
Boris Pavlovic


On Wed, Nov 4, 2015 at 4:18 PM, Clint Byrum  wrote:

> Excerpts from Boris Pavlovic's message of 2015-11-03 17:32:43 -0800:
> > Clint, Morgan,
> >
> > I totally agree that the pub/sub model is better approach.
> >
> > However, there are 2 great things about polling:
> > 1) it's simpler to use than pub/sub (especially in shell)
>
> I envision something like this:
>
>
> while changes=$(openstack compute server-events --run react-to-status
> --fields status id1 id2 id3 id4) ; do
>   for id_and_status in $changes ; do
> id=${id_and_status##:}
> status=${id_and_status%%:}
>   done
> done
>
> Not exactly "hard"
>
> > 2) it has really simple implementation & we can get this in OpenStack in
> > few days/weeks
> >
>
> It doesn't actually solve a ton of things though. Even if we optimize
> it down to the fewest operations, it is still ultimately a DB query and
> extra churn in the API service.
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-03 Thread Clint Byrum
Excerpts from Boris Pavlovic's message of 2015-11-03 17:32:43 -0800:
> Clint, Morgan,
> 
> I totally agree that the pub/sub model is better approach.
> 
> However, there are 2 great things about polling:
> 1) it's simpler to use than pub/sub (especially in shell)

I envision something like this:


while changes=$(openstack compute server-events --run react-to-status --fields 
status id1 id2 id3 id4) ; do
  for id_and_status in $changes ; do
id=${id_and_status##:}
status=${id_and_status%%:}
  done
done

Not exactly "hard"

> 2) it has really simple implementation & we can get this in OpenStack in
> few days/weeks
> 

It doesn't actually solve a ton of things though. Even if we optimize
it down to the fewest operations, it is still ultimately a DB query and
extra churn in the API service.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][api][tc][perfromance] API for getting only status of resources

2015-11-03 Thread Clint Byrum
Excerpts from John Griffith's message of 2015-11-03 21:45:12 -0800:
> On Tue, Nov 3, 2015 at 3:20 PM, Boris Pavlovic  wrote:
> 
> > Hi stackers,
> >
> > Usually such projects like Heat, Tempest, Rally, Scalar, and other tool
> > that works with OpenStack are working with resources (e.g. VM, Volumes,
> > Images, ..) in the next way:
> >
> > >>> resource = api.resouce_do_some_stuff()
> > >>> while api.resource_get(resource["uuid"]) != expected_status
> > >>>sleep(a_bit)
> >
> > For each async operation they are polling and call many times
> > resource_get() which creates significant load on API and DB layers due the
> > nature of this request. (Usually getting full information about resources
> > produces SQL requests that contains multiple JOINs, e,g for nova vm it's 6
> > joins).
> >
> > What if we add new API method that will just resturn resource status by
> > UUID? Or even just extend get request with the new argument that returns
> > only status?
> >
> > Thoughts?
> >
> >
> > Best regards,
> > Boris Pavlovic
> >
> > __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> > ​Hey Boris,
> 
> As I asked in IRC, I'm kinda curious what the difference is here in terms
> of API and DB calls.  I very well might be missing an idea here, but
> currently we do a get by ID in that loop that you mention, the only
> difference I see in what you're suggesting is a reduced payload maybe?  A
> response that only includes the status?
> 
> I may be missing an important idea here, but it seems to me that you would
> still have the same number of API calls and DB request, just possibly a
> slightly smaller payload.  Let me know if I'm missing the idea here.

This is a scaling optimization. Reading fewer columns from the DB will
result in a leaner query (even if the time difference is indiscernible
by humans, doing 1000 "SELECT status" vs. "SELECT *" concurrently will
show just how much faster this can be. There's also the issue of ORM
objects. If you can avoid building a whole object, and just grab one field
with one direct query, you'll save overall on RAM, CPU, wire traffic,
cache space, etc. etc. It only makes sense to optimize at this level if
you expect many tight loops polling many resources.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev