[Openstack] openstack on RHEL 6.1
Hi All, Several people have reported on IRC that they can't get openstack working on RHEL 6.1. Starting the openstack-nova-network service (and various others) causes this traceback: (nova): TRACE: File "/usr/lib64/python2.6/subprocess.py", line 725, in communicate (nova): TRACE: stdout, stderr = self._communicate(input, endtime) (nova): TRACE: File "/usr/lib64/python2.6/subprocess.py", line 1322, in _communicate (nova): TRACE: self.wait(timeout=self._remaining_time(endtime)) (nova): TRACE: TypeError: wait() got an unexpected keyword argument 'timeout' Red Hat have backported the 'timeout' feature for subprocess.Popen to Python-2.6.6 in RHEL 6.1. At first glance it looks like they've broken subprocess but infact they haven't - its Eventlet that's causing the problem because its 'wait' function doesn't expect the timeout argument. I've reported the problem upstream here: https://bitbucket.org/which_linden/eventlet/issue/89/add-a-timeout-argument-to-subprocesspopen The patch below fixed the problem for me: --- /usr/lib/python2.6/site-packages/eventlet/green/subprocess.py.orig 2011-05-25 23:31:34.597271402 + +++ /usr/lib/python2.6/site-packages/eventlet/green/subprocess.py 2011-05-25 23:33:24.055602468 + @@ -32,7 +32,7 @@ setattr(self, attr, wrapped_pipe) __init__.__doc__ = subprocess_orig.Popen.__init__.__doc__ -def wait(self, check_interval=0.01): +def wait(self, check_interval=0.01, timeout=None): # Instead of a blocking OS call, this version of wait() uses logic # borrowed from the eventlet 0.2 processes.Process.wait() method. try: -- Dave ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Getting pagination right
Hi Jay, On Wed, May 25, 2011 at 03:40:27PM -0400, Jay Pipes wrote: > On Wed, May 25, 2011 at 3:32 PM, Greg Holt wrote: > > select w from x where y > marker order by y limit z > > > > This gives you consistent pagination, means the database doesn't have to > > count matching rows to find an offset, and means you could shard by y later > > in life if you need to scale that much. FWIW, this is the same approach I'm using in burrow, except where order is roughly insert time (no guaranteed strict ordering when it's sharded). This is simple and scales well. > Sorry, that doesn't make any sense to me. An offset is part of the > LIMIT X OFFSET Y clause. Your query above would return ALL the rows > that match WHERE y > marker. That's not what we want. We want a > segment of those rows. Assuming you have an index on marker for efficiency, I think the query makes perfect sense. Can you explain a bit more why you need the offset if we are ok with the result set for a given page changing? From the total set, we're returning everything after the last seen marker, ordering those rows larger (which should be a no-op with the index), and then limit by an optional limit parameter. Seems ok to me. > In addition, your SQL expression does not return consistent result > sets if the Y column does not have an increasing value over time. New > records inserted into the table will pop into random pages of results > if the Y column does not have a strictly increasing ordering. I think not being consistent is OK. If you are viewing a list of images and then you see a new one pop up on refresh after another insert (using the same marker), I think this is perfectly fine. It certainly helps keep things simpler on the backends. Doesn't seem like a big issue for swift so far, and I don't anticipate issues with burrow. -Eric ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Getting pagination right
On Wed, 2011-05-25 at 16:27 -0400, William Wolf wrote: > I personally like Jay's proposal (except maybe keeping 'pages' out for > now in favor of just having 1 way to do things, rather than many ways > to do the same thing), but feel that the term 'marker' should maybe be > renamed. Maybe 'timestamp' would even be better? I'm open to other > suggestions. Well, it's a consistency mechanism that selects a subset of rows--those that are visible at a specific time...so perhaps "subset" is a good self-documenting parameter name? -- Kevin L. Mitchell ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Getting pagination right
OpenstackPaginationEmail I think there is a lot of confusion in the two uses of the word 'marker', and maybe under Jay's proposal we need another word for 'marker'. Suppose we have the following images: PK Created_atUpdated_at Deleted_at 1 2011-05-25 12:00:002011-05-25 12:00:04 2 2011-05-25 12:00:012011-05-25 12:00:03 3 2011-05-25 12:00:022011-05-25 12:00:05 4 2011-05-25 12:00:03 2011-05-25 12:00:09 Under the current 1.1 spec, 'marker' means the id of the last element you saw, such that: /images?marker=3&limit=2 will give you the 2 images *updated* before server with id '3': [1, 2] (assuming order by updated_at desc) This is *not* what the current code does because we do not ORDER BY updated_at yet, as Jay pointed out. I'm just showing what the spec wants as it is currently written. If I understand Jay's ideas correctly, he wants us to pass marker(a different marker), offset, AND limit. So a query would go something like this: /images?marker=&offset=3&limit=10 I believe that marker can be left empty here, and it will default to now(), but whatever gets set to, it will return images that were created *before* and that were deleted *after* (if any). The main advantage here is that it gives you a persistent 'collection snapshot' of your query results, based on the time that you made the initial query. If it takes you a minute to page through results, and images were deleted or added during that time, it wont throw off your pagination if you keep your marker constant. If we passed in marker = '2011-05-25 12:00:03', offset = '0', and limit = '4', we would get: [4, 3, 2, 1] (assuming order by created_at desc) using jay's method. If we kept everything the same, but passed in '2011-05-25 12:00:10' as the marker, image 4 would not be on the list because at that time image 4 was deleted. Please correct me if something above is incorrect. As for thoughts, I talked with Mark Washenberger and Brian Waldon, and we came up with 2 possible ways to move forward. Things that we agree on in both cases: * The current way nova handles paging is inefficient, and needs to be improved. * We need to use ORDER BY in all of our queries, and not assume that id's will be ordered by time. * We order our queries by created_at, *not* updated_at as specified in the current spec (you can see the confusion this may cause in my first example above). I personally like Jay's proposal (except maybe keeping 'pages' out for now in favor of just having 1 way to do things, rather than many ways to do the same thing), but feel that the term 'marker' should maybe be renamed. Maybe 'timestamp' would even be better? I'm open to other suggestions. Another idea that we had was to still use marker/limit with marker being an id, but to move the existing inefficient python logic into the db layer. This will give us the sharding/scaling advantages that Greg mentioned, and also get rid of a lot of the problems Jay outlined with our current implementation. With this method, however, I believe we will need glance to support marker/limit as well in order to make things efficient. So, to summarize, our two suggestions currently are: 1) Follow Jay's proposal, but find a better name for 'marker'. 2) Keep with marker/limit but still move inefficient logic out of python to db layer, and have glance support marker/limit as well. Thoughts on these two paths of moving forward? Anyone have ideas for other routes we could take? -Original Message- From: "Jay Pipes" Sent: Wednesday, May 25, 2011 15:57 To: "Greg Holt" Cc: openstack@lists.launchpad.net Subject: Re: [Openstack] Getting pagination right On Wed, May 25, 2011 at 3:43 PM, Greg Holt wrote: > Okay, I give up then. Not sure what's different with what you have vs. Swift > dbs. Just trying to offer up what we do and have been doing for a while now. The pagination in Swift is not consistent. Inserts into the Swift databases in between the time of the initial query and the requesting the "next page" can result in rows from the original first page getting on the new second page. Code in swift/common/db.py lines 958 through 974 shows an ORDER BY name. Newly inserted objects (or records that are deleted) with a name value > marker and < end_marker can result in a page changing its contents on refresh. This is why I was saying it's not a consistent view of the data. -jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : h
Re: [Openstack] Getting pagination right
On Wed, May 25, 2011 at 3:43 PM, Greg Holt wrote: > Okay, I give up then. Not sure what's different with what you have vs. Swift > dbs. Just trying to offer up what we do and have been doing for a while now. The pagination in Swift is not consistent. Inserts into the Swift databases in between the time of the initial query and the requesting the "next page" can result in rows from the original first page getting on the new second page. Code in swift/common/db.py lines 958 through 974 shows an ORDER BY name. Newly inserted objects (or records that are deleted) with a name value > marker and < end_marker can result in a page changing its contents on refresh. This is why I was saying it's not a consistent view of the data. -jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Getting pagination right
Okay, I give up then. Not sure what's different with what you have vs. Swift dbs. Just trying to offer up what we do and have been doing for a while now. On May 25, 2011, at 2:40 PM, Jay Pipes wrote: > On Wed, May 25, 2011 at 3:32 PM, Greg Holt wrote: >> select w from x where y > marker order by y limit z >> >> This gives you consistent pagination, means the database doesn't have to >> count matching rows to find an offset, and means you could shard by y later >> in life if you need to scale that much. > > Sorry, that doesn't make any sense to me. An offset is part of the > LIMIT X OFFSET Y clause. Your query above would return ALL the rows > that match WHERE y > marker. That's not what we want. We want a > segment of those rows. > > In addition, your SQL expression does not return consistent result > sets if the Y column does not have an increasing value over time. New > records inserted into the table will pop into random pages of results > if the Y column does not have a strictly increasing ordering. > > -jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Getting pagination right
On Wed, May 25, 2011 at 3:32 PM, Greg Holt wrote: > select w from x where y > marker order by y limit z > > This gives you consistent pagination, means the database doesn't have to > count matching rows to find an offset, and means you could shard by y later > in life if you need to scale that much. Sorry, that doesn't make any sense to me. An offset is part of the LIMIT X OFFSET Y clause. Your query above would return ALL the rows that match WHERE y > marker. That's not what we want. We want a segment of those rows. In addition, your SQL expression does not return consistent result sets if the Y column does not have an increasing value over time. New records inserted into the table will pop into random pages of results if the Y column does not have a strictly increasing ordering. -jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Getting pagination right
select w from x where y > marker order by y limit z This gives you consistent pagination, means the database doesn't have to count matching rows to find an offset, and means you could shard by y later in life if you need to scale that much. On May 25, 2011, at 2:28 PM, Jay Pipes wrote: > On Wed, May 25, 2011 at 3:10 PM, Greg Holt wrote: >> Okay cool. Just stop using the term offset anywhere in your examples then >> and drop the whole page= thing as well. :) > > Sorry, I'm not understanding what you're getting at here. The offset > is required to pass to the database query as the OFFSET clause. The > page= parameter allows you to calculate that OFFSET value. > > -jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Getting pagination right
On Wed, May 25, 2011 at 3:10 PM, Greg Holt wrote: > Okay cool. Just stop using the term offset anywhere in your examples then and > drop the whole page= thing as well. :) Sorry, I'm not understanding what you're getting at here. The offset is required to pass to the database query as the OFFSET clause. The page= parameter allows you to calculate that OFFSET value. -jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] OpenStack API, Reservation ID's and Num Instances ...
Actually, myself and Soren were talking about this over IRC earlier today. Soren has a theory that we may not even need respond back to the caller for a "you win" decision to be made. The receiving compute node could just act on it straight away. I think he's correct. If you can't handle the request, don't listen for it. No fanout required. Remember that this proposal only really works with Flavor-based scheduling and not ad-hoc capability-oriented scheduling. Which means ... the whole Scheduler concept potentially goes away. All the Zone stuff and the Cost computation stuff would stay the same, just move location. The use case that causes problems is the one I mentioned earlier: "prefer hosts that don't already hold instances for this customer" ... which requires compute nodes to know about each others state. But I think we can handle this in the following way: - Modulo the number of Compute nodes by some number X. This is our "stripe" number. - Keep a per-customer reference to the last "stripe" used (S, an int) - When requesting a new instance, ask for a Compute node of Stripe (S + 1) % X - Compute nodes listen for Flavors on their Stripe only This implies we need # Flavors * # Stripes queues, but queues are lightweight anyway. Still stewing on the ramifications of all this. -S From: Vishvananda Ishaya [vishvana...@gmail.com] Sent: Wednesday, May 25, 2011 3:42 PM To: Sandy Walsh Cc: Soren Hansen; openstack@lists.launchpad.net Subject: Re: [Openstack] OpenStack API, Reservation ID's and Num Instances ... On May 25, 2011, at 4:41 AM, Sandy Walsh wrote: - We'd need to create a per-Flavor FanOut queue. Some Compute nodes would express their interest in handling the request and we'd, somehow, need to decide which one gets the work. Who would decide that? How could they do it without creating a single point of failure? Mesos has an interesing way of doing this, where there is an external scheduler that receives offers for resources and can choose to take the offers or not. http://www.cs.brown.edu/courses/csci2950-u/f10/papers/mesos_tech_report.pdf This would definitely be a radical difference from what we are doing, but perhaps we could provide some sort of webhook where we can pass sanitized offers back to clients and they can decide whether to take them or not. Vish Confidentiality Notice: This e-mail message (including any attached or embedded documents) is intended for the exclusive and confidential use of the individual or entity to which this message is addressed, and unless otherwise expressly indicated, is confidential and privileged information of Rackspace. Any dissemination, distribution or copying of the enclosed material is prohibited. If you receive this transmission in error, please notify us immediately by e-mail at ab...@rackspace.com, and delete the original message. Your cooperation is appreciated. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Getting pagination right
Okay cool. Just stop using the term offset anywhere in your examples then and drop the whole page= thing as well. :) On May 25, 2011, at 1:28 PM, Jay Pipes wrote: > I'm not proposing using marker/limit over offset/limit. I'm proposing > using marker/limit/offset over searching manually through a giant set > of results looking for a marker and then grabbing the next few > results. ;) ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] OpenStack API, Reservation ID's and Num Instances ...
On May 25, 2011, at 4:41 AM, Sandy Walsh wrote: > > - We'd need to create a per-Flavor FanOut queue. Some Compute nodes would > express their interest in handling the request and we'd, somehow, need to > decide which one gets the work. Who would decide that? How could they do it > without creating a single point of failure? > Mesos has an interesing way of doing this, where there is an external scheduler that receives offers for resources and can choose to take the offers or not. http://www.cs.brown.edu/courses/csci2950-u/f10/papers/mesos_tech_report.pdf This would definitely be a radical difference from what we are doing, but perhaps we could provide some sort of webhook where we can pass sanitized offers back to clients and they can decide whether to take them or not. Vish ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Getting pagination right
I'm not proposing using marker/limit over offset/limit. I'm proposing using marker/limit/offset over searching manually through a giant set of results looking for a marker and then grabbing the next few results. ;) On Wed, May 25, 2011 at 2:26 PM, Greg Holt wrote: > Also, using marker/limit rather than offset/limit makes it much easier to > shard for scale if desired. One of the reasons we chose that route with > Swift. Regular databases with indexes work better with marker/limit than > offset/limit too. > On May 25, 2011, at 12:38 PM, Justin Santa Barbara wrote: > > I definitely agree that the paging logic is questionable; we definitely > should be specifying the sort order if we're relying on it, as I believe a > database is otherwise free to return results however it sees fit. > I'm not convinced that the proposed logic around the queries and update > timestamps is typo-free. But even if it was correct, the problem with > timestamps is that I don't think you can get a sensible set of results. If > you order by increasing timestamp, then I think you see items repeatedly if > they are concurrently updated. If you order by decreasing timestamp, I > think you can miss an item altogether if it's updated in between page > fetches. > I think the 'normal' way to make this work is to order by primary key, and > to have the marker be the PK of the 'last seen row'. Because the client > can't know what the pk is, the marker is normally just an opaque value. > It's also (normally) much easier for the DB to sort by primary key than by > any other value. > The behaviour of an iteration using the PK is fairly nice. You never see a > row twice, so you don't need to de-dup client side. You are guaranteed to > see all rows, unless they are concurrently inserted or deleted. If a row is > inserted you'll see it if you haven't yet fetched the page it would be on. > If a row is deleted you don't see it unless you've already fetched the page > it was on. > Zones may complicate any scheme, though I think both proposed methods could > be made to work. I think in general the next N rows have to be fetched from > each child zone and then combined (top N), throwing away the extra records. > One way around this would be to define an order on the child zones, and > then show the records from zone 1, then those from zone 2 etc etc. You just > need to map from the marker to a child zone, and if you don't get the full > complement of rows you need to merge in the first results from the next zone > as well. > The SQL query using the PK would look like this (and there's no need for an > OFFSET clause): > SELECT * > FROM > ORDER BY > WHERE > @marker > AND () > LIMIT > Without additional filters, that's a _very_ efficient query for a sensible > database. > Justin > > > On Wed, May 25, 2011 at 9:28 AM, Jay Pipes wrote: >> >> When a user issues a GET request for a resource collection, it's clear >> that you don't want to return ALL results in the collection. Returning >> hundreds of thousands of records per request isn't particularly a good >> idea. >> >> Therefore, we use the concept of "pagination" -- returning a subset of >> records matching the search criteria. >> >> In Glance's API, we've added this ability (currently in code review), >> but our approach is a bit at odds with the way that the OpenStack >> Compute API is structured. I'd like to explain why I think our >> approach is better, both in terms of usability and in terms of >> efficiency. >> >> First, here is how Nova does its pagination in the OpenStack API. >> >> 1) Look for a "marker" query parameter in the request >> 2) Look for a "limit" query parameter in the request, otherwise use a >> default max limit. >> 3) If marker parameter is found, the value of the "marker" query >> parameter is assumed to be an *integer* and assumed to be the >> *identifier of the last object on the previous "page" of results* >> 4) Request *all* results >> 5) In the code, find the "page" of results by looking for the object >> in the set of results that matches the marker parameter >> 6) When found, grab the next object in the result set, plus X objects, >> where X is the limit parameter >> >> The relevant code is in /nova/api/openstack/common.py, and looks like so: >> >> def limited_by_marker(items, request, max_limit=FLAGS.osapi_max_limit): >> """Return a slice of items according to the requested marker and >> limit.""" >> >> >> >> limit = min(max_limit, limit) >> start_index = 0 >> if marker: >> start_index = -1 >> for i, item in enumerate(items): >> if item['id'] == marker: >> start_index = i + 1 >> break >> if start_index < 0: >> raise webob.exc.HTTPBadRequest(_('marker [%s] not found' % >> marker)) >> range_end = start_index + limit >> return items[start_index:range_end] >> >> There are three MAJOR problems with the above method of pagination: >> >> 1) Very inefficient >> >> The P
Re: [Openstack] Getting pagination right
Also, using marker/limit rather than offset/limit makes it much easier to shard for scale if desired. One of the reasons we chose that route with Swift. Regular databases with indexes work better with marker/limit than offset/limit too. On May 25, 2011, at 12:38 PM, Justin Santa Barbara wrote: > I definitely agree that the paging logic is questionable; we definitely > should be specifying the sort order if we're relying on it, as I believe a > database is otherwise free to return results however it sees fit. > > I'm not convinced that the proposed logic around the queries and update > timestamps is typo-free. But even if it was correct, the problem with > timestamps is that I don't think you can get a sensible set of results. If > you order by increasing timestamp, then I think you see items repeatedly if > they are concurrently updated. If you order by decreasing timestamp, I think > you can miss an item altogether if it's updated in between page fetches. > > I think the 'normal' way to make this work is to order by primary key, and to > have the marker be the PK of the 'last seen row'. Because the client can't > know what the pk is, the marker is normally just an opaque value. It's also > (normally) much easier for the DB to sort by primary key than by any other > value. > > The behaviour of an iteration using the PK is fairly nice. You never see a > row twice, so you don't need to de-dup client side. You are guaranteed to > see all rows, unless they are concurrently inserted or deleted. If a row is > inserted you'll see it if you haven't yet fetched the page it would be on. > If a row is deleted you don't see it unless you've already fetched the page > it was on. > > Zones may complicate any scheme, though I think both proposed methods could > be made to work. I think in general the next N rows have to be fetched from > each child zone and then combined (top N), throwing away the extra records. > One way around this would be to define an order on the child zones, and then > show the records from zone 1, then those from zone 2 etc etc. You just need > to map from the marker to a child zone, and if you don't get the full > complement of rows you need to merge in the first results from the next zone > as well. > > The SQL query using the PK would look like this (and there's no need for an > OFFSET clause): > > SELECT * > FROM > ORDER BY > WHERE > @marker > AND () > LIMIT > > Without additional filters, that's a _very_ efficient query for a sensible > database. > > Justin > > > > On Wed, May 25, 2011 at 9:28 AM, Jay Pipes wrote: > When a user issues a GET request for a resource collection, it's clear > that you don't want to return ALL results in the collection. Returning > hundreds of thousands of records per request isn't particularly a good > idea. > > Therefore, we use the concept of "pagination" -- returning a subset of > records matching the search criteria. > > In Glance's API, we've added this ability (currently in code review), > but our approach is a bit at odds with the way that the OpenStack > Compute API is structured. I'd like to explain why I think our > approach is better, both in terms of usability and in terms of > efficiency. > > First, here is how Nova does its pagination in the OpenStack API. > > 1) Look for a "marker" query parameter in the request > 2) Look for a "limit" query parameter in the request, otherwise use a > default max limit. > 3) If marker parameter is found, the value of the "marker" query > parameter is assumed to be an *integer* and assumed to be the > *identifier of the last object on the previous "page" of results* > 4) Request *all* results > 5) In the code, find the "page" of results by looking for the object > in the set of results that matches the marker parameter > 6) When found, grab the next object in the result set, plus X objects, > where X is the limit parameter > > The relevant code is in /nova/api/openstack/common.py, and looks like so: > > def limited_by_marker(items, request, max_limit=FLAGS.osapi_max_limit): >"""Return a slice of items according to the requested marker and limit.""" > > > >limit = min(max_limit, limit) >start_index = 0 >if marker: >start_index = -1 >for i, item in enumerate(items): >if item['id'] == marker: >start_index = i + 1 >break >if start_index < 0: >raise webob.exc.HTTPBadRequest(_('marker [%s] not found' % marker)) >range_end = start_index + limit >return items[start_index:range_end] > > There are three MAJOR problems with the above method of pagination: > > 1) Very inefficient > > The Python code does the job of determining the page of results to > find. This means that the database query returns EVERY row in the > resultset, instead of only returning a small subset of the results > using the LIMIT X OFFSET Y SQL construct. With hundreds of thousands > o
Re: [Openstack] Getting pagination right
This is what happens when I braindump a bunch of stuff into an email... I get myself confused :) I'm proposing the following. Please pick the strategy apart: 1) Pass the LIMIT and OFFSET parameters down into the database API queries 2) Add a default ORDER BY for all queries returning result sets 3) Add a "page" query parameter that would be used to calculate the OFFSET programmatically 4) Change the concept of the "marker" parameter to refer to *the timestamp of the initial query* This would result in queries to the database of the following template: SELECT * FROM WHERE created_at <= @marker AND deleted_at <= @marker AND ORDER BY LIMIT @pagesize OFFSET (@pagesize * (@pageno - 1)); Sorry for calling out Justin re: the order of his SQL expression. No offense intended. I know he's a good SQL brain :) -jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Getting pagination right
On Wed, May 25, 2011 at 1:38 PM, Justin Santa Barbara wrote: > The SQL query using the PK would look like this (and there's no need for an > OFFSET clause): > SELECT * > FROM > ORDER BY > WHERE > @marker > AND () > LIMIT > Without additional filters, that's a _very_ efficient query for a sensible > database. The above isn't actually valid SQL. The WHERE clause must precede the ORDER BY clause. In any case, what I meant to say in the original post was that the way to provide consistent views of a result set is to do: SET @marker = SELECT * FROM WHERE updated_at <= @marker AND ORDER BY LIMIT OFFSET Cheers, jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Getting pagination right
I misspoke in my original email. Marker should be the timestamp of the initial query. -jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Getting pagination right
I definitely agree that the paging logic is questionable; we definitely should be specifying the sort order if we're relying on it, as I believe a database is otherwise free to return results however it sees fit. I'm not convinced that the proposed logic around the queries and update timestamps is typo-free. But even if it was correct, the problem with timestamps is that I don't think you can get a sensible set of results. If you order by increasing timestamp, then I think you see items repeatedly if they are concurrently updated. If you order by decreasing timestamp, I think you can miss an item altogether if it's updated in between page fetches. I think the 'normal' way to make this work is to order by primary key, and to have the marker be the PK of the 'last seen row'. Because the client can't know what the pk is, the marker is normally just an opaque value. It's also (normally) much easier for the DB to sort by primary key than by any other value. The behaviour of an iteration using the PK is fairly nice. You never see a row twice, so you don't need to de-dup client side. You are guaranteed to see all rows, unless they are concurrently inserted or deleted. If a row is inserted you'll see it if you haven't yet fetched the page it would be on. If a row is deleted you don't see it unless you've already fetched the page it was on. Zones may complicate any scheme, though I think both proposed methods could be made to work. I think in general the next N rows have to be fetched from each child zone and then combined (top N), throwing away the extra records. One way around this would be to define an order on the child zones, and then show the records from zone 1, then those from zone 2 etc etc. You just need to map from the marker to a child zone, and if you don't get the full complement of rows you need to merge in the first results from the next zone as well. The SQL query using the PK would look like this (and there's no need for an OFFSET clause): SELECT * FROM ORDER BY WHERE > @marker AND () LIMIT Without additional filters, that's a _very_ efficient query for a sensible database. Justin On Wed, May 25, 2011 at 9:28 AM, Jay Pipes wrote: > When a user issues a GET request for a resource collection, it's clear > that you don't want to return ALL results in the collection. Returning > hundreds of thousands of records per request isn't particularly a good > idea. > > Therefore, we use the concept of "pagination" -- returning a subset of > records matching the search criteria. > > In Glance's API, we've added this ability (currently in code review), > but our approach is a bit at odds with the way that the OpenStack > Compute API is structured. I'd like to explain why I think our > approach is better, both in terms of usability and in terms of > efficiency. > > First, here is how Nova does its pagination in the OpenStack API. > > 1) Look for a "marker" query parameter in the request > 2) Look for a "limit" query parameter in the request, otherwise use a > default max limit. > 3) If marker parameter is found, the value of the "marker" query > parameter is assumed to be an *integer* and assumed to be the > *identifier of the last object on the previous "page" of results* > 4) Request *all* results > 5) In the code, find the "page" of results by looking for the object > in the set of results that matches the marker parameter > 6) When found, grab the next object in the result set, plus X objects, > where X is the limit parameter > > The relevant code is in /nova/api/openstack/common.py, and looks like so: > > def limited_by_marker(items, request, max_limit=FLAGS.osapi_max_limit): >"""Return a slice of items according to the requested marker and > limit.""" > > > >limit = min(max_limit, limit) >start_index = 0 >if marker: >start_index = -1 >for i, item in enumerate(items): >if item['id'] == marker: >start_index = i + 1 >break >if start_index < 0: >raise webob.exc.HTTPBadRequest(_('marker [%s] not found' % > marker)) >range_end = start_index + limit >return items[start_index:range_end] > > There are three MAJOR problems with the above method of pagination: > > 1) Very inefficient > > The Python code does the job of determining the page of results to > find. This means that the database query returns EVERY row in the > resultset, instead of only returning a small subset of the results > using the LIMIT X OFFSET Y SQL construct. With hundreds of thousands > of records in the various databases, this method is simply not > scalable. > > 2) There is no ordering of the results in the queries the Nova > database API makes > > The only queries in the Nova database API that order results is the > instance_type_get_all() method. > > Without an order to the search results, paging is almost meaningless. > A page of results depends on the order of the list of results. The > reason the existing
[Openstack] Getting pagination right
When a user issues a GET request for a resource collection, it's clear that you don't want to return ALL results in the collection. Returning hundreds of thousands of records per request isn't particularly a good idea. Therefore, we use the concept of "pagination" -- returning a subset of records matching the search criteria. In Glance's API, we've added this ability (currently in code review), but our approach is a bit at odds with the way that the OpenStack Compute API is structured. I'd like to explain why I think our approach is better, both in terms of usability and in terms of efficiency. First, here is how Nova does its pagination in the OpenStack API. 1) Look for a "marker" query parameter in the request 2) Look for a "limit" query parameter in the request, otherwise use a default max limit. 3) If marker parameter is found, the value of the "marker" query parameter is assumed to be an *integer* and assumed to be the *identifier of the last object on the previous "page" of results* 4) Request *all* results 5) In the code, find the "page" of results by looking for the object in the set of results that matches the marker parameter 6) When found, grab the next object in the result set, plus X objects, where X is the limit parameter The relevant code is in /nova/api/openstack/common.py, and looks like so: def limited_by_marker(items, request, max_limit=FLAGS.osapi_max_limit): """Return a slice of items according to the requested marker and limit.""" limit = min(max_limit, limit) start_index = 0 if marker: start_index = -1 for i, item in enumerate(items): if item['id'] == marker: start_index = i + 1 break if start_index < 0: raise webob.exc.HTTPBadRequest(_('marker [%s] not found' % marker)) range_end = start_index + limit return items[start_index:range_end] There are three MAJOR problems with the above method of pagination: 1) Very inefficient The Python code does the job of determining the page of results to find. This means that the database query returns EVERY row in the resultset, instead of only returning a small subset of the results using the LIMIT X OFFSET Y SQL construct. With hundreds of thousands of records in the various databases, this method is simply not scalable. 2) There is no ordering of the results in the queries the Nova database API makes The only queries in the Nova database API that order results is the instance_type_get_all() method. Without an order to the search results, paging is almost meaningless. A page of results depends on the order of the list of results. The reason the existing Nova code works at all is because of the fact that all objects have an autoincrementing integer primary key, and by nature, this autoincrementing integer primary key increases over time. If the identifier was switched to, say, UUID, the results ordering would fall apart completely, and "pages" of results would get skewed and corrupted as records were inserted over time. For instance, assume a character primary key (like UUID keys would be stored in hex string format). Assume the following simplified set of instances: PK Created_at ABC 2011-05-25 12:00:00 DEF 2011-05-25 12:00:01 GHJ 2011-05-25 12:00:02 KLM 2011-05-25 12:00:03 Assume you want to get a list of instances with GET /servers?limit=2. With the existing code, you'd get this: ABC 2011-05-25 12:00:00 DEF 2011-05-25 12:00:01 With GET /servers?limit=2?marker=DEF, you'd get this: GHJ 2011-05-25 12:00:00 KLM 2011-05-25 12:00:03 However, let's say a new instance is added: BCD 2011-05-25 12:00:04 Because there is no sort order specified in the database queries, the order of results is indeterminate. Page 1 may now contain the BCD record, or it may not. It totally depends on the implementation of the relational database, because the SQL standard does not mandate a default sort order when one is not specified in the ORDER BY clause. 3) The marker parameter assumes an autoincrementing primary key Because the marker parameter assumes an autoincrementing primary key, the paging of results currently DOES solve the problem of new records messing up paging. As an example, let's assume I've fixed #2 above and put a default sort order on the primary key, descending. Also assume a simple resultset like so: PK Created_at 1 2011-05-25 12:00:00 2 2011-05-25 12:00:01 3 2011-05-25 12:00:02 4 2011-05-25 12:00:03 Assume I make a request for GET /servers?limit=2. I would get: 4 2011-05-25 12:00:03 3 2011-05-25 12:00:02 The results returned would be the "last" 10 server instances created. If I did a request for GET /servers?limit=2?marker=3, I would get the results: 2 2011-05-25 12:00:01 1 2011-05-25 12:00:00 If I add a new instance, say: 5 2011-05-25 12:00:05 and did the request for GET /servers?limit=2?marker=3, I
Re: [Openstack] OpenStack API, Reservation ID's and Num Instances ...
Heh, you're right, I was completely mistaken :) That's a really cool idea Soren! One of problems we're faced with is keeping instances from the same customer off a single compute node. We want to spread them out to lessen the impact of machine failure. In order to solve this we needed to put a weight on each host candidate and use that to decide the final ordering. The implication is we can't dump all the requests in the Scheduler queues and have them round-robin'ed, as is currently done. A single Scheduler has to devise a plan and then all the Compute nodes can execute them concurrently. A large part of the Distributed Scheduler work has been working around this issue. That said, your idea has tremendous merit for those customers that don't have that requirement. Right now, we have two ways of dispersing "Provision Resource" requests: 1. The way it is now: dump all the requests in the Scheduler queues and have them picked off concurrently. 2. The way the Distributed Scheduler works: Send the request for N resources out as an atomic unit. A single Scheduler devises a plan and sends tiny units-of-work to the Compute nodes. Your idea would be the third approach: 3. Place the request in a Flavor-specific queue and let any Compute node capable handle the request. Let me give a little thought how we can do this cleanly. It's going to get complicated in the code if each request potentially has a different dispersal strategy. The request dispersal strategy seems to go hand-in-hand with the underlying worker-node selection process. They both need to agree on the approach. But ... it's certainly not impossible. We just need to bundle the code in nova.compute.api.create() to the configured nova.scheduler.driver ... 1:1 Believe me, based on the stuff we've got going on in the Distributed Scheduler, that would be a really nice refactoring. As an aside, the implications of #3 that I can see at first blush: - We'd be effectively bypassing the scheduler, so we'd need to think about how that would work with Zones since Queues don't span Zones. - We'd need to create a per-Flavor FanOut queue. Some Compute nodes would express their interest in handling the request and we'd, somehow, need to decide which one gets the work. Who would decide that? How could they do it without creating a single point of failure? Regardless ... it's a great idea and definitely one that deserves more consideration. Thanks! -S From: Soren Hansen [so...@linux2go.dk] Sent: Tuesday, May 24, 2011 4:56 PM To: Sandy Walsh Cc: openstack@lists.launchpad.net Subject: Re: [Openstack] OpenStack API, Reservation ID's and Num Instances ... 2011/5/23 Sandy Walsh : > To Soren's point about "losing the ability to rely on a fixed set of > topics in the message queue for doing scheduling" this is not the case, > there are no new topics introduced. That's not exactly what I meant. If we stuck with the simple flavours that we have right now, we could schedule stuff exclusively using the message queue. The scheduler would not need to know *anything* about the various compute nodes. Scheduling an instance of flavour X would be achieved simply by sending a "run this instance" message on the message queue with the "flavour-X" topic. Any compute node able to accommodate an instance of that size would be subscribed to that topic, and the message queue would simply route the message to a "random" one of them. As a compute node fills up, it'll unsubscribe from the topics representing flavours that it no longer has room for. This sounds Very Scalable[tm] to me :) Even if all the scheduling attributes we come up with were discrete and enumerable, the cartesian product of all of them is potentially *huge*, so having a topic for each of the possible combinations sounds like very little fun. If any of them are not enumerable, it gets even less fun. So, adding these capabilities would get in the way of implementing something like the above. I guess it could be a configuration option, i.e. if you choose the rich scheduling option set, you don't get to use the cool scheduler. -- Soren Hansen| http://linux2go.dk/ Ubuntu Developer| http://www.ubuntu.com/ OpenStack Developer | http://www.openstack.org/ Confidentiality Notice: This e-mail message (including any attached or embedded documents) is intended for the exclusive and confidential use of the individual or entity to which this message is addressed, and unless otherwise expressly indicated, is confidential and privileged information of Rackspace. Any dissemination, distribution or copying of the enclosed material is prohibited. If you receive this transmission in error, please notify us immediately by e-mail at ab...@rackspace.com, and delete the original message. Your cooperation is appreciated. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.