On Mon, Apr 30, 2012 at 11:08:40PM +0200, Petr Onderka wrote: > > In other words, if I find some results for some page on API page n and > no results on API page n+1, can I be sure there will be no results on > pages > n?
Not necessarily. In most cases that assumption should be true, but I see a few cases offhand where it wouldn't be: * If you're using prop=revisions&revids=...&rvprop=content with revisions big enough that the API response size limit comes into play, you could wind up in a situation where the initial query returns revision 1 from page A, the second returns revision 2 from page B, and the third returns revision 3 from page A again. * Some modules, such as prop=extlinks, cannot use anything sane for the continue parameter (or else MySQL blows up), so they just use "offset into the arbitrarily-ordered set of results". It's possible that edits made to the wiki between your calls could change the result set so that values are repeated, skipped, or both. * If you are using multiple modules, it might be the case that one goes through the pages in order by page_id while the other goes by title, or something along those lines. In practice it seems that all modules that commonly continue will order by the page_id, so the only way you might run into this is if the API response size limit causes modules like categoryinfo or imageinfo that usually don't continue to do so. I haven't checked any of the prop modules provided by extensions, BTW. Chances are most of those are well-behaved and order by page_id, but it's possible some of them may do things differently. > I am writing a library to access the API and every collection in my > library is lazy. > > For example, a user requests to know categories of pages in > Category:Query languages. > > When he starts iterating over the result, I execute the query: > http://en.wikipedia.org/w/api.php?action=query&generator=categorymembers&gcmtitle=Category:Query%20languages&prop=categories > > When he then requests to know the categories of the third page in the > result (Access query language), > I will return to him the categories from the first query. If he > requests more, I execute the query: > http://en.wikipedia.org/w/api.php?action=query&generator=categorymembers&gcmtitle=Category:Query%20languages&prop=categories&clcontinue=494528|All%20pages%20needing%20cleanup How do you determine that you should look at "Access query language" first rather than one of the other pages? In my bot code, I have something that behaves similarly: you give it a query, and it gives back a series of result pages. But my version will process clcontinue all the way to the end right away; the laziness is only in handling gcmcontinue. That way I can be sure that the page nodes returned by successive calls will have all the necessary data without worrying about the ordering of the prop module results. _______________________________________________ Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api