Re: LIMIT problem (Paging)

Richard Newman Mon, 25 May 2009 11:21:02 -0700

The results are to large to keep in memory, so I would like to pagethem using LIMIT and OFFSET. However it does not work with the abovequery. The query above needs all results to be loaded into memorywhen evaluating it. I assume this is because more than one statementis evaluated in the WHERE clause(?).


That's not why: it's because you're imposing an order with ORDER BY.


There are (broadly speaking) two ways this query could be executed.

If a store has an index on my:hasUserID (and that index happens to bein SPARQL's defined order!) then results can be generated in orderedsequence. Successive pages can be generated by re-running the query,skipping more and more results, or somehow holding on to a cursor.It's not enough to just skip userIDs: *rows* must be skipped, so thequery does have to be executed in order to skip to the right point.

If a store does not have such an index, or your ORDER BY clause ismore complicated, then all the results must be gathered in memory tobe sorted. There's really no way around that.

For a store that doesn't maintain state between queries, generatingsuccessive pages in this manner will essentially involve running thewhole query each time, returning a different chunk of the results. Ifyou have to sort 100,000 result rows in order to determine the first1,000, then the second 1,000, you're going to see pretty poorperformance.

Each query execution will reflect any changes in the store since thelast page was generated, which can produce confusing results.

So, how could I page the above query?

Do it in your application. That way you also avoid the data changingbetween pages.

I don't think that LIMIT and OFFSET are useful for supporting paging,because the spec does not mandate sufficient efficiency constraints onimplementations (such as cursors, as provided by Freebase MQLqueries). It's odd to say "you could do it using the method the specrecommends, but you'd be crazy to do so with real datasets". Iconsider LIMIT's only real use to be for constraining the size of theresult set, not defining a page size.

IMO it would be much more useful to separate SPARQL execution into twophases: a query that returns a result set, and then operations on theresult set (such as serializing slices of it). Conflating the twoplaces the burden of doing paging efficiently onto implementation, andthere's no one good solution for all clients.

-R

Re: LIMIT problem (Paging)

Reply via email to