[Openstack] openstack on RHEL 6.1

2011-05-25 Thread David Robinson
Hi All,

Several people have reported on IRC that they can't get openstack
working on RHEL 6.1. Starting the openstack-nova-network service (and
various others) causes this traceback:

(nova): TRACE:   File "/usr/lib64/python2.6/subprocess.py", line 725,
in communicate
(nova): TRACE: stdout, stderr = self._communicate(input, endtime)
(nova): TRACE:   File "/usr/lib64/python2.6/subprocess.py", line 1322,
in _communicate
(nova): TRACE: self.wait(timeout=self._remaining_time(endtime))
(nova): TRACE: TypeError: wait() got an unexpected keyword argument 'timeout'

Red Hat have backported the 'timeout' feature for subprocess.Popen to
Python-2.6.6 in RHEL 6.1. At first glance it looks like they've broken
subprocess but infact they haven't - its Eventlet that's causing the
problem because its 'wait' function doesn't expect the timeout
argument.

I've reported the problem upstream here:

https://bitbucket.org/which_linden/eventlet/issue/89/add-a-timeout-argument-to-subprocesspopen

The patch below fixed the problem for me:

--- /usr/lib/python2.6/site-packages/eventlet/green/subprocess.py.orig  
2011-05-25
23:31:34.597271402 +
+++ /usr/lib/python2.6/site-packages/eventlet/green/subprocess.py   
2011-05-25
23:33:24.055602468 +
@@ -32,7 +32,7 @@
 setattr(self, attr, wrapped_pipe)
 __init__.__doc__ = subprocess_orig.Popen.__init__.__doc__

-def wait(self, check_interval=0.01):
+def wait(self, check_interval=0.01, timeout=None):
 # Instead of a blocking OS call, this version of wait() uses logic
 # borrowed from the eventlet 0.2 processes.Process.wait() method.
 try:

--
Dave

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Getting pagination right

2011-05-25 Thread Eric Day
Hi Jay,

On Wed, May 25, 2011 at 03:40:27PM -0400, Jay Pipes wrote:
> On Wed, May 25, 2011 at 3:32 PM, Greg Holt  wrote:
> > select w from x where y > marker order by y limit z
> >
> > This gives you consistent pagination, means the database doesn't have to 
> > count matching rows to find an offset, and means you could shard by y later 
> > in life if you need to scale that much.

FWIW, this is the same approach I'm using in burrow, except where
order is roughly insert time (no guaranteed strict ordering when it's
sharded). This is simple and scales well.

> Sorry, that doesn't make any sense to me. An offset is part of the
> LIMIT X OFFSET Y clause. Your query above would return ALL the rows
> that match WHERE y > marker. That's not what we want. We want a
> segment of those rows.

Assuming you have an index on marker for efficiency, I think the query
makes perfect sense. Can you explain a bit more why you need the offset
if we are ok with the result set for a given page changing? From the
total set, we're returning everything after the last seen marker,
ordering those rows larger (which should be a no-op with the index),
and then limit by an optional limit parameter. Seems ok to me.

> In addition, your SQL expression does not return consistent result
> sets if the Y column does not have an increasing value over time. New
> records inserted into the table will pop into random pages of results
> if the Y column does not have a strictly increasing ordering.

I think not being consistent is OK. If you are viewing a list of images
and then you see a new one pop up on refresh after another insert
(using the same marker), I think this is perfectly fine. It certainly
helps keep things simpler on the backends. Doesn't seem like a big
issue for swift so far, and I don't anticipate issues with burrow.

-Eric

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Getting pagination right

2011-05-25 Thread Kevin L. Mitchell
On Wed, 2011-05-25 at 16:27 -0400, William Wolf wrote:
> I personally like Jay's proposal (except maybe keeping 'pages' out for
> now in favor of just having 1 way to do things, rather than many ways
> to do the same thing), but feel that the term 'marker' should maybe be
> renamed.  Maybe 'timestamp' would even be better? I'm open to other
> suggestions.

Well, it's a consistency mechanism that selects a subset of rows--those
that are visible at a specific time...so perhaps "subset" is a good
self-documenting parameter name?
-- 
Kevin L. Mitchell 

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Getting pagination right

2011-05-25 Thread William Wolf
OpenstackPaginationEmail

I think there is a lot of confusion in the two uses of the word 'marker', and 
maybe under Jay's proposal we need another word for 'marker'.

Suppose we have the following images:

PK Created_atUpdated_at  Deleted_at
1   2011-05-25 12:00:002011-05-25 12:00:04
2   2011-05-25 12:00:012011-05-25 12:00:03
3   2011-05-25 12:00:022011-05-25 12:00:05
4   2011-05-25 12:00:03 2011-05-25 12:00:09

Under the current 1.1 spec, 'marker' means the id of the last element you saw, 
such that:

/images?marker=3&limit=2

will give you the 2 images *updated* before server with id '3':

[1, 2]

(assuming order by updated_at desc)

This is *not* what the current code does because we do not ORDER BY updated_at 
yet, as Jay pointed out.  I'm just showing what the spec wants as it is 
currently written.

If I understand Jay's ideas correctly, he wants us to pass marker(a different 
marker), offset, AND limit.  So a query would go something like this:

/images?marker=&offset=3&limit=10

I believe that marker can be left empty here, and it will default to now(), but 
whatever  gets set to, it will return images that were created 
*before*  and that were deleted *after*  (if any).  The 
main advantage here is that it gives you a persistent 'collection snapshot' of 
your query results, based on the time that you made the initial query.  If it 
takes you a minute to page through results, and images were deleted or added 
during that time, it wont throw off your pagination if you keep your marker 
constant.

If we passed in marker = '2011-05-25 12:00:03', offset = '0', and limit = '4', 
we would get:

[4, 3, 2, 1]

(assuming order by created_at desc)

using jay's method.  If we kept everything the same, but passed in '2011-05-25 
12:00:10' as the marker, image 4 would not be on the list because at that time 
image 4 was deleted.

Please correct me if something above is incorrect.


As for thoughts, I talked with Mark Washenberger and Brian Waldon, and we came 
up with 2 possible ways to move forward.  

Things that we agree on in both cases:

* The current way nova handles paging is inefficient, and needs to be improved. 
* We need to use ORDER BY in all of our queries, and not assume that id's will 
be ordered by time.  
* We order our queries by created_at, *not* updated_at as specified in the 
current spec (you can see the confusion this may cause in my first example 
above).

I personally like Jay's proposal (except maybe keeping 'pages' out for now in 
favor of just having 1 way to do things, rather than many ways to do the same 
thing), but feel that the term 'marker' should maybe be renamed.  Maybe 
'timestamp' would even be better? I'm open to other suggestions.

Another idea that we had was to still use marker/limit with marker being an id, 
but to move the existing inefficient python logic into the db layer.  This will 
give us the sharding/scaling advantages that Greg mentioned, and also get rid 
of a lot of the problems Jay outlined with our current implementation.

With this method, however, I believe we will need glance to support 
marker/limit as well in order to make things efficient.  

So, to summarize, our two suggestions currently are:

1) Follow Jay's proposal, but find a better name for 'marker'.
2) Keep with marker/limit but still move inefficient logic out of python to db 
layer, and have glance support marker/limit as well.

Thoughts on these two paths of moving forward?  Anyone have ideas for other 
routes we could take?



-Original Message-
From: "Jay Pipes" 
Sent: Wednesday, May 25, 2011 15:57
To: "Greg Holt" 
Cc: openstack@lists.launchpad.net
Subject: Re: [Openstack] Getting pagination right

On Wed, May 25, 2011 at 3:43 PM, Greg Holt  wrote:
> Okay, I give up then. Not sure what's different with what you have vs. Swift 
> dbs. Just trying to offer up what we do and have been doing for a while now.

The pagination in Swift is not consistent. Inserts into the Swift
databases in between the time of the initial query and the requesting
the "next page" can result in rows from the original first page
getting on the new second page.

Code in swift/common/db.py lines 958 through 974 shows an ORDER BY
name. Newly inserted objects (or records that are deleted) with a name
value > marker and < end_marker can result in a page changing its
contents on refresh. This is why I was saying it's not a consistent
view of the data.

-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : h

Re: [Openstack] Getting pagination right

2011-05-25 Thread Jay Pipes
On Wed, May 25, 2011 at 3:43 PM, Greg Holt  wrote:
> Okay, I give up then. Not sure what's different with what you have vs. Swift 
> dbs. Just trying to offer up what we do and have been doing for a while now.

The pagination in Swift is not consistent. Inserts into the Swift
databases in between the time of the initial query and the requesting
the "next page" can result in rows from the original first page
getting on the new second page.

Code in swift/common/db.py lines 958 through 974 shows an ORDER BY
name. Newly inserted objects (or records that are deleted) with a name
value > marker and < end_marker can result in a page changing its
contents on refresh. This is why I was saying it's not a consistent
view of the data.

-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Getting pagination right

2011-05-25 Thread Greg Holt
Okay, I give up then. Not sure what's different with what you have vs. Swift 
dbs. Just trying to offer up what we do and have been doing for a while now.

On May 25, 2011, at 2:40 PM, Jay Pipes wrote:

> On Wed, May 25, 2011 at 3:32 PM, Greg Holt  wrote:
>> select w from x where y > marker order by y limit z
>> 
>> This gives you consistent pagination, means the database doesn't have to 
>> count matching rows to find an offset, and means you could shard by y later 
>> in life if you need to scale that much.
> 
> Sorry, that doesn't make any sense to me. An offset is part of the
> LIMIT X OFFSET Y clause. Your query above would return ALL the rows
> that match WHERE y > marker. That's not what we want. We want a
> segment of those rows.
> 
> In addition, your SQL expression does not return consistent result
> sets if the Y column does not have an increasing value over time. New
> records inserted into the table will pop into random pages of results
> if the Y column does not have a strictly increasing ordering.
> 
> -jay


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Getting pagination right

2011-05-25 Thread Jay Pipes
On Wed, May 25, 2011 at 3:32 PM, Greg Holt  wrote:
> select w from x where y > marker order by y limit z
>
> This gives you consistent pagination, means the database doesn't have to 
> count matching rows to find an offset, and means you could shard by y later 
> in life if you need to scale that much.

Sorry, that doesn't make any sense to me. An offset is part of the
LIMIT X OFFSET Y clause. Your query above would return ALL the rows
that match WHERE y > marker. That's not what we want. We want a
segment of those rows.

In addition, your SQL expression does not return consistent result
sets if the Y column does not have an increasing value over time. New
records inserted into the table will pop into random pages of results
if the Y column does not have a strictly increasing ordering.

-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Getting pagination right

2011-05-25 Thread Greg Holt
select w from x where y > marker order by y limit z

This gives you consistent pagination, means the database doesn't have to count 
matching rows to find an offset, and means you could shard by y later in life 
if you need to scale that much.


On May 25, 2011, at 2:28 PM, Jay Pipes wrote:

> On Wed, May 25, 2011 at 3:10 PM, Greg Holt  wrote:
>> Okay cool. Just stop using the term offset anywhere in your examples then 
>> and drop the whole page= thing as well. :)
> 
> Sorry, I'm not understanding what you're getting at here. The offset
> is required to pass to the database query as the OFFSET clause. The
> page= parameter allows you to calculate that OFFSET value.
> 
> -jay


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Getting pagination right

2011-05-25 Thread Jay Pipes
On Wed, May 25, 2011 at 3:10 PM, Greg Holt  wrote:
> Okay cool. Just stop using the term offset anywhere in your examples then and 
> drop the whole page= thing as well. :)

Sorry, I'm not understanding what you're getting at here. The offset
is required to pass to the database query as the OFFSET clause. The
page= parameter allows you to calculate that OFFSET value.

-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] OpenStack API, Reservation ID's and Num Instances ...

2011-05-25 Thread Sandy Walsh
Actually, myself and Soren were talking about this over IRC earlier today.

Soren has a theory that we may not even need respond back to the caller for a 
"you win" decision to be made. The receiving compute node could just act on it 
straight away. I think he's correct. If you can't handle the request, don't 
listen for it. No fanout required.

Remember that this proposal only really works with Flavor-based scheduling and 
not ad-hoc capability-oriented scheduling.

Which means ... the whole Scheduler concept potentially goes away.
All the Zone stuff and the Cost computation stuff would stay the same, just 
move location.

The use case that causes problems is the one I mentioned earlier:  "prefer 
hosts that don't already hold instances for this customer" ... which requires 
compute nodes to know about each others state.

But I think we can handle this in the following way:
- Modulo the number of Compute nodes by some number X. This is our "stripe" 
number.
- Keep a per-customer reference to the last "stripe" used (S, an int)
- When requesting a new instance, ask for a Compute node of Stripe (S + 1) % X
- Compute nodes listen for Flavors on their Stripe only

This implies we need # Flavors * # Stripes queues, but queues are lightweight 
anyway.

Still stewing on the ramifications of all this.

-S




From: Vishvananda Ishaya [vishvana...@gmail.com]
Sent: Wednesday, May 25, 2011 3:42 PM
To: Sandy Walsh
Cc: Soren Hansen; openstack@lists.launchpad.net
Subject: Re: [Openstack] OpenStack API, Reservation ID's and Num Instances ...


On May 25, 2011, at 4:41 AM, Sandy Walsh wrote:


- We'd need to create a per-Flavor FanOut queue. Some Compute nodes would 
express their interest in handling the request and we'd, somehow, need to 
decide which one gets the work. Who would decide that? How could they do it 
without creating a single point of failure?


Mesos has an interesing way of doing this, where there is an external scheduler 
that receives offers for resources and can choose to take the offers or not.

http://www.cs.brown.edu/courses/csci2950-u/f10/papers/mesos_tech_report.pdf

This would definitely be a radical difference from what we are doing, but 
perhaps we could provide some sort of webhook where we can pass sanitized 
offers back to clients and they can decide whether to take them or not.

Vish




Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace. 
Any dissemination, distribution or copying of the enclosed material is 
prohibited.
If you receive this transmission in error, please notify us immediately by 
e-mail
at ab...@rackspace.com, and delete the original message. 
Your cooperation is appreciated.

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Getting pagination right

2011-05-25 Thread Greg Holt
Okay cool. Just stop using the term offset anywhere in your examples then and 
drop the whole page= thing as well. :)

On May 25, 2011, at 1:28 PM, Jay Pipes wrote:

> I'm not proposing using marker/limit over offset/limit. I'm proposing
> using marker/limit/offset over searching manually through a giant set
> of results looking for a marker and then grabbing the next few
> results. ;)


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] OpenStack API, Reservation ID's and Num Instances ...

2011-05-25 Thread Vishvananda Ishaya

On May 25, 2011, at 4:41 AM, Sandy Walsh wrote:

> 
> - We'd need to create a per-Flavor FanOut queue. Some Compute nodes would 
> express their interest in handling the request and we'd, somehow, need to 
> decide which one gets the work. Who would decide that? How could they do it 
> without creating a single point of failure?
> 

Mesos has an interesing way of doing this, where there is an external scheduler 
that receives offers for resources and can choose to take the offers or not.

http://www.cs.brown.edu/courses/csci2950-u/f10/papers/mesos_tech_report.pdf

This would definitely be a radical difference from what we are doing, but 
perhaps we could provide some sort of webhook where we can pass sanitized 
offers back to clients and they can decide whether to take them or not. 

Vish


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Getting pagination right

2011-05-25 Thread Jay Pipes
I'm not proposing using marker/limit over offset/limit. I'm proposing
using marker/limit/offset over searching manually through a giant set
of results looking for a marker and then grabbing the next few
results. ;)

On Wed, May 25, 2011 at 2:26 PM, Greg Holt  wrote:
> Also, using marker/limit rather than offset/limit makes it much easier to
> shard for scale if desired. One of the reasons we chose that route with
> Swift. Regular databases with indexes work better with marker/limit than
> offset/limit too.
> On May 25, 2011, at 12:38 PM, Justin Santa Barbara wrote:
>
> I definitely agree that the paging logic is questionable; we definitely
> should be specifying the sort order if we're relying on it, as I believe a
> database is otherwise free to return results however it sees fit.
> I'm not convinced that the proposed logic around the queries and update
> timestamps is typo-free.  But even if it was correct, the problem with
> timestamps is that I don't think you can get a sensible set of results.  If
> you order by increasing timestamp, then I think you see items repeatedly if
> they are concurrently updated.  If you order by decreasing timestamp, I
> think you can miss an item altogether if it's updated in between page
> fetches.
> I think the 'normal' way to make this work is to order by primary key, and
> to have the marker be the PK of the 'last seen row'.  Because the client
> can't know what the pk is, the marker is normally just an opaque value.
>  It's also (normally) much easier for the DB to sort by primary key than by
> any other value.
> The behaviour of an iteration using the PK is fairly nice.  You never see a
> row twice, so you don't need to de-dup client side.  You are guaranteed to
> see all rows, unless they are concurrently inserted or deleted.  If a row is
> inserted you'll see it if you haven't yet fetched the page it would be on.
>  If a row is deleted you don't see it unless you've already fetched the page
> it was on.
> Zones may complicate any scheme, though I think both proposed methods could
> be made to work.  I think in general the next N rows have to be fetched from
> each child zone and then combined (top N), throwing away the extra records.
>   One way around this would be to define an order on the child zones, and
> then show the records from zone 1, then those from zone 2 etc etc.  You just
> need to map from the marker to a child zone, and if you don't get the full
> complement of rows you need to merge in the first results from the next zone
> as well.
> The SQL query using the PK would look like this (and there's no need for an
> OFFSET clause):
> SELECT *
> FROM 
> ORDER BY 
> WHERE  > @marker
> AND ()
> LIMIT 
> Without additional filters, that's a _very_ efficient query for a sensible
> database.
> Justin
>
>
> On Wed, May 25, 2011 at 9:28 AM, Jay Pipes  wrote:
>>
>> When a user issues a GET request for a resource collection, it's clear
>> that you don't want to return ALL results in the collection. Returning
>> hundreds of thousands of records per request isn't particularly a good
>> idea.
>>
>> Therefore, we use the concept of "pagination" -- returning a subset of
>> records matching the search criteria.
>>
>> In Glance's API, we've added this ability (currently in code review),
>> but our approach is a bit at odds with the way that the OpenStack
>> Compute API is structured. I'd like to explain why I think our
>> approach is better, both in terms of usability and in terms of
>> efficiency.
>>
>> First, here is how Nova does its pagination in the OpenStack API.
>>
>> 1) Look for a "marker" query parameter in the request
>> 2) Look for a "limit" query parameter in the request, otherwise use a
>> default max limit.
>> 3) If marker parameter is found, the value of the "marker" query
>> parameter is assumed to be an *integer* and assumed to be the
>> *identifier of the last object on the previous "page" of results*
>> 4) Request *all* results
>> 5) In the code, find the "page" of results by looking for the object
>> in the set of results that matches the marker parameter
>> 6) When found, grab the next object in the result set, plus X objects,
>> where X is the limit parameter
>>
>> The relevant code is in /nova/api/openstack/common.py, and looks like so:
>>
>> def limited_by_marker(items, request, max_limit=FLAGS.osapi_max_limit):
>>    """Return a slice of items according to the requested marker and
>> limit."""
>>
>> 
>>
>>    limit = min(max_limit, limit)
>>    start_index = 0
>>    if marker:
>>        start_index = -1
>>        for i, item in enumerate(items):
>>            if item['id'] == marker:
>>                start_index = i + 1
>>                break
>>        if start_index < 0:
>>            raise webob.exc.HTTPBadRequest(_('marker [%s] not found' %
>> marker))
>>    range_end = start_index + limit
>>    return items[start_index:range_end]
>>
>> There are three MAJOR problems with the above method of pagination:
>>
>> 1) Very inefficient
>>
>> The P

Re: [Openstack] Getting pagination right

2011-05-25 Thread Greg Holt
Also, using marker/limit rather than offset/limit makes it much easier to shard 
for scale if desired. One of the reasons we chose that route with Swift. 
Regular databases with indexes work better with marker/limit than offset/limit 
too.

On May 25, 2011, at 12:38 PM, Justin Santa Barbara wrote:

> I definitely agree that the paging logic is questionable; we definitely 
> should be specifying the sort order if we're relying on it, as I believe a 
> database is otherwise free to return results however it sees fit.
> 
> I'm not convinced that the proposed logic around the queries and update 
> timestamps is typo-free.  But even if it was correct, the problem with 
> timestamps is that I don't think you can get a sensible set of results.  If 
> you order by increasing timestamp, then I think you see items repeatedly if 
> they are concurrently updated.  If you order by decreasing timestamp, I think 
> you can miss an item altogether if it's updated in between page fetches.
> 
> I think the 'normal' way to make this work is to order by primary key, and to 
> have the marker be the PK of the 'last seen row'.  Because the client can't 
> know what the pk is, the marker is normally just an opaque value.  It's also 
> (normally) much easier for the DB to sort by primary key than by any other 
> value.
> 
> The behaviour of an iteration using the PK is fairly nice.  You never see a 
> row twice, so you don't need to de-dup client side.  You are guaranteed to 
> see all rows, unless they are concurrently inserted or deleted.  If a row is 
> inserted you'll see it if you haven't yet fetched the page it would be on.  
> If a row is deleted you don't see it unless you've already fetched the page 
> it was on.
> 
> Zones may complicate any scheme, though I think both proposed methods could 
> be made to work.  I think in general the next N rows have to be fetched from 
> each child zone and then combined (top N), throwing away the extra records.   
> One way around this would be to define an order on the child zones, and then 
> show the records from zone 1, then those from zone 2 etc etc.  You just need 
> to map from the marker to a child zone, and if you don't get the full 
> complement of rows you need to merge in the first results from the next zone 
> as well.
> 
> The SQL query using the PK would look like this (and there's no need for an 
> OFFSET clause):
> 
> SELECT *
> FROM 
> ORDER BY 
> WHERE  > @marker
> AND ()
> LIMIT 
> 
> Without additional filters, that's a _very_ efficient query for a sensible 
> database.
> 
> Justin
> 
> 
> 
> On Wed, May 25, 2011 at 9:28 AM, Jay Pipes  wrote:
> When a user issues a GET request for a resource collection, it's clear
> that you don't want to return ALL results in the collection. Returning
> hundreds of thousands of records per request isn't particularly a good
> idea.
> 
> Therefore, we use the concept of "pagination" -- returning a subset of
> records matching the search criteria.
> 
> In Glance's API, we've added this ability (currently in code review),
> but our approach is a bit at odds with the way that the OpenStack
> Compute API is structured. I'd like to explain why I think our
> approach is better, both in terms of usability and in terms of
> efficiency.
> 
> First, here is how Nova does its pagination in the OpenStack API.
> 
> 1) Look for a "marker" query parameter in the request
> 2) Look for a "limit" query parameter in the request, otherwise use a
> default max limit.
> 3) If marker parameter is found, the value of the "marker" query
> parameter is assumed to be an *integer* and assumed to be the
> *identifier of the last object on the previous "page" of results*
> 4) Request *all* results
> 5) In the code, find the "page" of results by looking for the object
> in the set of results that matches the marker parameter
> 6) When found, grab the next object in the result set, plus X objects,
> where X is the limit parameter
> 
> The relevant code is in /nova/api/openstack/common.py, and looks like so:
> 
> def limited_by_marker(items, request, max_limit=FLAGS.osapi_max_limit):
>"""Return a slice of items according to the requested marker and limit."""
> 
> 
> 
>limit = min(max_limit, limit)
>start_index = 0
>if marker:
>start_index = -1
>for i, item in enumerate(items):
>if item['id'] == marker:
>start_index = i + 1
>break
>if start_index < 0:
>raise webob.exc.HTTPBadRequest(_('marker [%s] not found' % marker))
>range_end = start_index + limit
>return items[start_index:range_end]
> 
> There are three MAJOR problems with the above method of pagination:
> 
> 1) Very inefficient
> 
> The Python code does the job of determining the page of results to
> find. This means that the database query returns EVERY row in the
> resultset, instead of only returning a small subset of the results
> using the LIMIT X OFFSET Y SQL construct. With hundreds of thousands
> o

Re: [Openstack] Getting pagination right

2011-05-25 Thread Jay Pipes
This is what happens when I braindump a bunch of stuff into an
email... I get myself confused :)

I'm proposing the following. Please pick the strategy apart:

1) Pass the LIMIT and OFFSET parameters down into the database API queries
2) Add a default ORDER BY for all queries returning result sets
3) Add a "page" query parameter that would be used to calculate the
OFFSET programmatically
4) Change the concept of the "marker" parameter to refer to *the
timestamp of the initial query*

This would result in queries to the database of the following template:

SELECT * FROM 
WHERE created_at <= @marker AND deleted_at <= @marker
AND 
ORDER BY 
LIMIT @pagesize OFFSET (@pagesize * (@pageno - 1));

Sorry for calling out Justin re: the order of his SQL expression. No
offense intended. I know he's a good SQL brain :)

-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Getting pagination right

2011-05-25 Thread Jay Pipes
On Wed, May 25, 2011 at 1:38 PM, Justin Santa Barbara
 wrote:
> The SQL query using the PK would look like this (and there's no need for an
> OFFSET clause):
> SELECT *
> FROM 
> ORDER BY 
> WHERE  > @marker
> AND ()
> LIMIT 
> Without additional filters, that's a _very_ efficient query for a sensible
> database.

The above isn't actually valid SQL. The WHERE clause must precede the
ORDER BY clause.

In any case, what I meant to say in the original post was that the way
to provide consistent views of a result set is to do:

SET @marker = 

SELECT * FROM 
WHERE updated_at <= @marker
AND 
ORDER BY 
LIMIT  OFFSET 

Cheers,
jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Getting pagination right

2011-05-25 Thread Jay Pipes
I misspoke in my original email.

Marker should be the timestamp of the initial query.

-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Getting pagination right

2011-05-25 Thread Justin Santa Barbara
I definitely agree that the paging logic is questionable; we definitely
should be specifying the sort order if we're relying on it, as I believe a
database is otherwise free to return results however it sees fit.

I'm not convinced that the proposed logic around the queries and update
timestamps is typo-free.  But even if it was correct, the problem with
timestamps is that I don't think you can get a sensible set of results.  If
you order by increasing timestamp, then I think you see items repeatedly if
they are concurrently updated.  If you order by decreasing timestamp, I
think you can miss an item altogether if it's updated in between page
fetches.

I think the 'normal' way to make this work is to order by primary key, and
to have the marker be the PK of the 'last seen row'.  Because the client
can't know what the pk is, the marker is normally just an opaque value.
 It's also (normally) much easier for the DB to sort by primary key than by
any other value.

The behaviour of an iteration using the PK is fairly nice.  You never see a
row twice, so you don't need to de-dup client side.  You are guaranteed to
see all rows, unless they are concurrently inserted or deleted.  If a row is
inserted you'll see it if you haven't yet fetched the page it would be on.
 If a row is deleted you don't see it unless you've already fetched the page
it was on.

Zones may complicate any scheme, though I think both proposed methods could
be made to work.  I think in general the next N rows have to be fetched from
each child zone and then combined (top N), throwing away the extra records.
  One way around this would be to define an order on the child zones, and
then show the records from zone 1, then those from zone 2 etc etc.  You just
need to map from the marker to a child zone, and if you don't get the full
complement of rows you need to merge in the first results from the next zone
as well.

The SQL query using the PK would look like this (and there's no need for an
OFFSET clause):

SELECT *
FROM 
ORDER BY 
WHERE  > @marker
AND ()
LIMIT 

Without additional filters, that's a _very_ efficient query for a sensible
database.

Justin



On Wed, May 25, 2011 at 9:28 AM, Jay Pipes  wrote:

> When a user issues a GET request for a resource collection, it's clear
> that you don't want to return ALL results in the collection. Returning
> hundreds of thousands of records per request isn't particularly a good
> idea.
>
> Therefore, we use the concept of "pagination" -- returning a subset of
> records matching the search criteria.
>
> In Glance's API, we've added this ability (currently in code review),
> but our approach is a bit at odds with the way that the OpenStack
> Compute API is structured. I'd like to explain why I think our
> approach is better, both in terms of usability and in terms of
> efficiency.
>
> First, here is how Nova does its pagination in the OpenStack API.
>
> 1) Look for a "marker" query parameter in the request
> 2) Look for a "limit" query parameter in the request, otherwise use a
> default max limit.
> 3) If marker parameter is found, the value of the "marker" query
> parameter is assumed to be an *integer* and assumed to be the
> *identifier of the last object on the previous "page" of results*
> 4) Request *all* results
> 5) In the code, find the "page" of results by looking for the object
> in the set of results that matches the marker parameter
> 6) When found, grab the next object in the result set, plus X objects,
> where X is the limit parameter
>
> The relevant code is in /nova/api/openstack/common.py, and looks like so:
>
> def limited_by_marker(items, request, max_limit=FLAGS.osapi_max_limit):
>"""Return a slice of items according to the requested marker and
> limit."""
>
> 
>
>limit = min(max_limit, limit)
>start_index = 0
>if marker:
>start_index = -1
>for i, item in enumerate(items):
>if item['id'] == marker:
>start_index = i + 1
>break
>if start_index < 0:
>raise webob.exc.HTTPBadRequest(_('marker [%s] not found' %
> marker))
>range_end = start_index + limit
>return items[start_index:range_end]
>
> There are three MAJOR problems with the above method of pagination:
>
> 1) Very inefficient
>
> The Python code does the job of determining the page of results to
> find. This means that the database query returns EVERY row in the
> resultset, instead of only returning a small subset of the results
> using the LIMIT X OFFSET Y SQL construct. With hundreds of thousands
> of records in the various databases, this method is simply not
> scalable.
>
> 2) There is no ordering of the results in the queries the Nova
> database API makes
>
> The only queries in the Nova database API that order results is the
> instance_type_get_all() method.
>
> Without an order to the search results, paging is almost meaningless.
> A page of results depends on the order of the list of results. The
> reason the existing 

[Openstack] Getting pagination right

2011-05-25 Thread Jay Pipes
When a user issues a GET request for a resource collection, it's clear
that you don't want to return ALL results in the collection. Returning
hundreds of thousands of records per request isn't particularly a good
idea.

Therefore, we use the concept of "pagination" -- returning a subset of
records matching the search criteria.

In Glance's API, we've added this ability (currently in code review),
but our approach is a bit at odds with the way that the OpenStack
Compute API is structured. I'd like to explain why I think our
approach is better, both in terms of usability and in terms of
efficiency.

First, here is how Nova does its pagination in the OpenStack API.

1) Look for a "marker" query parameter in the request
2) Look for a "limit" query parameter in the request, otherwise use a
default max limit.
3) If marker parameter is found, the value of the "marker" query
parameter is assumed to be an *integer* and assumed to be the
*identifier of the last object on the previous "page" of results*
4) Request *all* results
5) In the code, find the "page" of results by looking for the object
in the set of results that matches the marker parameter
6) When found, grab the next object in the result set, plus X objects,
where X is the limit parameter

The relevant code is in /nova/api/openstack/common.py, and looks like so:

def limited_by_marker(items, request, max_limit=FLAGS.osapi_max_limit):
"""Return a slice of items according to the requested marker and limit."""



limit = min(max_limit, limit)
start_index = 0
if marker:
start_index = -1
for i, item in enumerate(items):
if item['id'] == marker:
start_index = i + 1
break
if start_index < 0:
raise webob.exc.HTTPBadRequest(_('marker [%s] not found' % marker))
range_end = start_index + limit
return items[start_index:range_end]

There are three MAJOR problems with the above method of pagination:

1) Very inefficient

The Python code does the job of determining the page of results to
find. This means that the database query returns EVERY row in the
resultset, instead of only returning a small subset of the results
using the LIMIT X OFFSET Y SQL construct. With hundreds of thousands
of records in the various databases, this method is simply not
scalable.

2) There is no ordering of the results in the queries the Nova
database API makes

The only queries in the Nova database API that order results is the
instance_type_get_all() method.

Without an order to the search results, paging is almost meaningless.
A page of results depends on the order of the list of results. The
reason the existing Nova code works at all is because of the fact that
all objects have an autoincrementing integer primary key, and by
nature, this autoincrementing integer primary key increases over time.
If the identifier was switched to, say, UUID, the results ordering
would fall apart completely, and "pages" of results would get skewed
and corrupted as records were inserted over time.

For instance, assume a character primary key (like UUID keys would be
stored in hex string format). Assume the following simplified set of
instances:

PK Created_at
ABC   2011-05-25 12:00:00
DEF   2011-05-25 12:00:01
GHJ   2011-05-25 12:00:02
KLM   2011-05-25 12:00:03

Assume you want to get a list of instances with GET /servers?limit=2.
With the existing code, you'd get this:

ABC   2011-05-25 12:00:00
DEF   2011-05-25 12:00:01

With GET /servers?limit=2?marker=DEF, you'd get this:

GHJ   2011-05-25 12:00:00
KLM   2011-05-25 12:00:03

However, let's say a new instance is added:

BCD   2011-05-25 12:00:04

Because there is no sort order specified in the database queries, the
order of results is indeterminate. Page 1 may now contain the BCD
record, or it may not. It totally depends on the implementation of the
relational database, because the SQL standard does not mandate a
default sort order when one is not specified in the ORDER BY clause.

3) The marker parameter assumes an autoincrementing primary key

Because the marker parameter assumes an autoincrementing primary key,
the paging of results currently DOES solve the problem of new records
messing up paging.

As an example, let's assume I've fixed #2 above and put a default sort
order on the primary key, descending. Also assume a simple resultset
like so:

PK Created_at
1  2011-05-25 12:00:00
2  2011-05-25 12:00:01
3  2011-05-25 12:00:02
4  2011-05-25 12:00:03

Assume I make a request for GET /servers?limit=2. I would get:

4  2011-05-25 12:00:03
3  2011-05-25 12:00:02

The results returned would be the "last" 10 server instances created.
If I did a request for GET /servers?limit=2?marker=3, I would get the
results:

2  2011-05-25 12:00:01
1  2011-05-25 12:00:00

If I add a new instance, say:

5  2011-05-25 12:00:05

and did the request for GET /servers?limit=2?marker=3, I 

Re: [Openstack] OpenStack API, Reservation ID's and Num Instances ...

2011-05-25 Thread Sandy Walsh
Heh, you're right, I was completely mistaken :)

That's a really cool idea Soren!

One of problems we're faced with is keeping instances from the same customer 
off a single compute node. We want to spread them out to lessen the impact of 
machine failure. In order to solve this we needed to put a weight on each host 
candidate and use that to decide the final ordering. The implication is we 
can't dump all the requests in the Scheduler queues and have them 
round-robin'ed, as is currently done. A single Scheduler has to devise a plan 
and then all the Compute nodes can execute them concurrently. A large part of 
the Distributed Scheduler work has been working around this issue.

That said, your idea has tremendous merit for those customers that don't have 
that requirement. 

Right now, we have two ways of dispersing "Provision Resource" requests:
1. The way it is now: dump all the requests in the Scheduler queues and have 
them picked off concurrently.
2. The way the Distributed Scheduler works: Send the request for N resources 
out as an atomic unit. A single Scheduler devises a plan and sends tiny 
units-of-work to the Compute nodes.

Your idea would be the third approach:
3. Place the request in a Flavor-specific queue and let any Compute node 
capable handle the request.

Let me give a little thought how we can do this cleanly. It's going to get 
complicated in the code if each request potentially has a different dispersal 
strategy. The request dispersal strategy seems to go hand-in-hand with the 
underlying worker-node selection process. They both need to agree on the 
approach. 

But ... it's certainly not impossible. We just need to bundle the code in 
nova.compute.api.create() to the configured nova.scheduler.driver ... 1:1
Believe me, based on the stuff we've got going on in the Distributed Scheduler, 
that would be a really nice refactoring. 

As an aside, the implications of #3 that I can see at first blush:
- We'd be effectively bypassing the scheduler, so we'd need to think about how 
that would work with Zones since Queues don't span Zones.
- We'd need to create a per-Flavor FanOut queue. Some Compute nodes would 
express their interest in handling the request and we'd, somehow, need to 
decide which one gets the work. Who would decide that? How could they do it 
without creating a single point of failure?

Regardless ... it's a great idea and definitely one that deserves more 
consideration.

Thanks!
-S


From: Soren Hansen [so...@linux2go.dk]
Sent: Tuesday, May 24, 2011 4:56 PM
To: Sandy Walsh
Cc: openstack@lists.launchpad.net
Subject: Re: [Openstack] OpenStack API, Reservation ID's and Num Instances ...

2011/5/23 Sandy Walsh :
> To Soren's point about "losing the ability to rely on a fixed set of
> topics in the message queue for doing scheduling" this is not the case,
> there are no new topics introduced.

That's not exactly what I meant.

If we stuck with the simple flavours that we have right now, we could
schedule stuff exclusively using the message queue. The scheduler
would not need to know *anything* about the various compute nodes.
Scheduling an instance of flavour X would be achieved simply by
sending a "run this instance" message on the message queue with the
"flavour-X" topic. Any compute node able to accommodate an instance of
that size would be subscribed to that topic, and the message queue
would simply route the message to a "random" one of them. As a compute
node fills up, it'll unsubscribe from the topics representing flavours
that it no longer has room for. This sounds Very Scalable[tm] to me :)

Even if all the scheduling attributes we come up with were discrete
and enumerable, the cartesian product of all of them is potentially
*huge*, so having a topic for each of the possible combinations sounds
like very little fun. If any of them are not enumerable, it gets even
less fun. So, adding these capabilities would get in the way of
implementing something like the above. I guess it could be a
configuration option, i.e. if you choose the rich scheduling option
set, you don't get to use the cool scheduler.

--
Soren Hansen| http://linux2go.dk/
Ubuntu Developer| http://www.ubuntu.com/
OpenStack Developer | http://www.openstack.org/


Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace. 
Any dissemination, distribution or copying of the enclosed material is 
prohibited.
If you receive this transmission in error, please notify us immediately by 
e-mail
at ab...@rackspace.com, and delete the original message. 
Your cooperation is appreciated.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.