I'm not proposing using marker/limit over offset/limit. I'm proposing using marker/limit/offset over searching manually through a giant set of results looking for a marker and then grabbing the next few results. ;)
On Wed, May 25, 2011 at 2:26 PM, Greg Holt <gh...@rackspace.com> wrote: > Also, using marker/limit rather than offset/limit makes it much easier to > shard for scale if desired. One of the reasons we chose that route with > Swift. Regular databases with indexes work better with marker/limit than > offset/limit too. > On May 25, 2011, at 12:38 PM, Justin Santa Barbara wrote: > > I definitely agree that the paging logic is questionable; we definitely > should be specifying the sort order if we're relying on it, as I believe a > database is otherwise free to return results however it sees fit. > I'm not convinced that the proposed logic around the queries and update > timestamps is typo-free. But even if it was correct, the problem with > timestamps is that I don't think you can get a sensible set of results. If > you order by increasing timestamp, then I think you see items repeatedly if > they are concurrently updated. If you order by decreasing timestamp, I > think you can miss an item altogether if it's updated in between page > fetches. > I think the 'normal' way to make this work is to order by primary key, and > to have the marker be the PK of the 'last seen row'. Because the client > can't know what the pk is, the marker is normally just an opaque value. > It's also (normally) much easier for the DB to sort by primary key than by > any other value. > The behaviour of an iteration using the PK is fairly nice. You never see a > row twice, so you don't need to de-dup client side. You are guaranteed to > see all rows, unless they are concurrently inserted or deleted. If a row is > inserted you'll see it if you haven't yet fetched the page it would be on. > If a row is deleted you don't see it unless you've already fetched the page > it was on. > Zones may complicate any scheme, though I think both proposed methods could > be made to work. I think in general the next N rows have to be fetched from > each child zone and then combined (top N), throwing away the extra records. > One way around this would be to define an order on the child zones, and > then show the records from zone 1, then those from zone 2 etc etc. You just > need to map from the marker to a child zone, and if you don't get the full > complement of rows you need to merge in the first results from the next zone > as well. > The SQL query using the PK would look like this (and there's no need for an > OFFSET clause): > SELECT * > FROM <table> > ORDER BY <pk> > WHERE <pk> > @marker > AND (<additional filters>) > LIMIT <pagesize> > Without additional filters, that's a _very_ efficient query for a sensible > database. > Justin > > > On Wed, May 25, 2011 at 9:28 AM, Jay Pipes <jaypi...@gmail.com> wrote: >> >> When a user issues a GET request for a resource collection, it's clear >> that you don't want to return ALL results in the collection. Returning >> hundreds of thousands of records per request isn't particularly a good >> idea. >> >> Therefore, we use the concept of "pagination" -- returning a subset of >> records matching the search criteria. >> >> In Glance's API, we've added this ability (currently in code review), >> but our approach is a bit at odds with the way that the OpenStack >> Compute API is structured. I'd like to explain why I think our >> approach is better, both in terms of usability and in terms of >> efficiency. >> >> First, here is how Nova does its pagination in the OpenStack API. >> >> 1) Look for a "marker" query parameter in the request >> 2) Look for a "limit" query parameter in the request, otherwise use a >> default max limit. >> 3) If marker parameter is found, the value of the "marker" query >> parameter is assumed to be an *integer* and assumed to be the >> *identifier of the last object on the previous "page" of results* >> 4) Request *all* results >> 5) In the code, find the "page" of results by looking for the object >> in the set of results that matches the marker parameter >> 6) When found, grab the next object in the result set, plus X objects, >> where X is the limit parameter >> >> The relevant code is in /nova/api/openstack/common.py, and looks like so: >> >> def limited_by_marker(items, request, max_limit=FLAGS.osapi_max_limit): >> """Return a slice of items according to the requested marker and >> limit.""" >> >> <snipped for brevity... > >> >> limit = min(max_limit, limit) >> start_index = 0 >> if marker: >> start_index = -1 >> for i, item in enumerate(items): >> if item['id'] == marker: >> start_index = i + 1 >> break >> if start_index < 0: >> raise webob.exc.HTTPBadRequest(_('marker [%s] not found' % >> marker)) >> range_end = start_index + limit >> return items[start_index:range_end] >> >> There are three MAJOR problems with the above method of pagination: >> >> 1) Very inefficient >> >> The Python code does the job of determining the page of results to >> find. This means that the database query returns EVERY row in the >> resultset, instead of only returning a small subset of the results >> using the LIMIT X OFFSET Y SQL construct. With hundreds of thousands >> of records in the various databases, this method is simply not >> scalable. >> >> 2) There is no ordering of the results in the queries the Nova >> database API makes >> >> The only queries in the Nova database API that order results is the >> instance_type_get_all() method. >> >> Without an order to the search results, paging is almost meaningless. >> A page of results depends on the order of the list of results. The >> reason the existing Nova code works at all is because of the fact that >> all objects have an autoincrementing integer primary key, and by >> nature, this autoincrementing integer primary key increases over time. >> If the identifier was switched to, say, UUID, the results ordering >> would fall apart completely, and "pages" of results would get skewed >> and corrupted as records were inserted over time. >> >> For instance, assume a character primary key (like UUID keys would be >> stored in hex string format). Assume the following simplified set of >> instances: >> >> PK Created_at >> ABC 2011-05-25 12:00:00 >> DEF 2011-05-25 12:00:01 >> GHJ 2011-05-25 12:00:02 >> KLM 2011-05-25 12:00:03 >> >> Assume you want to get a list of instances with GET /servers?limit=2. >> With the existing code, you'd get this: >> >> ABC 2011-05-25 12:00:00 >> DEF 2011-05-25 12:00:01 >> >> With GET /servers?limit=2?marker=DEF, you'd get this: >> >> GHJ 2011-05-25 12:00:00 >> KLM 2011-05-25 12:00:03 >> >> However, let's say a new instance is added: >> >> BCD 2011-05-25 12:00:04 >> >> Because there is no sort order specified in the database queries, the >> order of results is indeterminate. Page 1 may now contain the BCD >> record, or it may not. It totally depends on the implementation of the >> relational database, because the SQL standard does not mandate a >> default sort order when one is not specified in the ORDER BY clause. >> >> 3) The marker parameter assumes an autoincrementing primary key >> >> Because the marker parameter assumes an autoincrementing primary key, >> the paging of results currently DOES solve the problem of new records >> messing up paging. >> >> As an example, let's assume I've fixed #2 above and put a default sort >> order on the primary key, descending. Also assume a simple resultset >> like so: >> >> PK Created_at >> 1 2011-05-25 12:00:00 >> 2 2011-05-25 12:00:01 >> 3 2011-05-25 12:00:02 >> 4 2011-05-25 12:00:03 >> >> Assume I make a request for GET /servers?limit=2. I would get: >> >> 4 2011-05-25 12:00:03 >> 3 2011-05-25 12:00:02 >> >> The results returned would be the "last" 10 server instances created. >> If I did a request for GET /servers?limit=2?marker=3, I would get the >> results: >> >> 2 2011-05-25 12:00:01 >> 1 2011-05-25 12:00:00 >> >> If I add a new instance, say: >> >> 5 2011-05-25 12:00:05 >> >> and did the request for GET /servers?limit=2?marker=3, I would still >> get the same results: >> >> 2 2011-05-25 12:00:01 >> 1 2011-05-25 12:00:00 >> >> However, the ONLY reason this works is because the primary key is >> auto-incrementing, and new records always come at the "end" of the >> primary key. If I change the primary key to a UUID, the call to GET >> /servers?limit=2?marker=3 would no longer be guaranteed to return the >> same results. The newly-added instance's UUID may come between the >> values of the primary keys in the original "page 2", and therefore >> messed up the consistency of the pagination of my original query. >> >> The way to avoid this inconsistency is to ensure that you have a >> consistent view of the set of results of your original query. To do >> that, you must always use a WHERE condition on a temporal, >> always-incrementing (timestamp) column. >> >> In SQL, this is what is necessary to have a consistent view of pages >> of a result set (assuming 10 elements to a page): >> >> Page 1: >> >> SELECT i.* FROM instances i >> ORDER BY i.id DESC; >> LIMIT 10; >> >> SET @marker = <ID of *first* record> >> SET @updated_at_marker = SELECT updated_at FROM instances WHERE id = >> @marker; >> >> Page 2: >> >> SELECT i.* FROM instances i >> WHERE updated_at <= @updated_at_marker >> ORDER BY i.id DESC >> LIMIT 10 OFFSET 10; >> >> >> SUMMARY: >> >> My proposal is to overhaul the way that Nova does pagination so that we: >> >> 1) Pass the LIMIT and OFFSET parameters down into the database API queries >> 2) Add a default ORDER BY for all queries returning result sets >> 3) Add a "page" query parameter that would be used to calculate the >> OFFSET programmatically >> 4) Change the concept of the "marker" parameter to refer to *the ID of >> the FIRST record returned in the original query* >> >> Thoughts? >> >> -jay >> >> _______________________________________________ >> Mailing list: https://launchpad.net/~openstack >> Post to : openstack@lists.launchpad.net >> Unsubscribe : https://launchpad.net/~openstack >> More help : https://help.launchpad.net/ListHelp > > _______________________________________________ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp > > _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp