Bertel, On Mon, Jan 2, 2017 at 7:40 AM, Bertel Teilfeldt Hansen <geilfe...@gmail.com > wrote:
> Hi Gabriel, > > The REST API looks promising - thank you! > > Having played around with it a bit, I seem to only be able to get one > revision per request. Is that correct, or am I doing something wrong? > this is correct. The requests themselves are quite cheap, and can be parallelized up to rate limit set out in the API documentation. > My project requires every revision and its references from a large number > of articles, so that would make a lot of requests. The regular API allows > for multiple revisions per request (only with action=query, though). > There is a caveat here in that we currently don't store all revisions for all articles. This means that requests for really old revisions will trigger a more expensive on-demand parse, just as with the action API. Can you say more about the number of articles you are targeting, and how this list is selected? Regarding the selection, I am mainly wondering if you are targeting especially frequently edited articles. Thanks, Gabriel > > Thanks! > > Bertel > > > > > > > > 2016-12-21 17:01 GMT+01:00 Gabriel Wicke <gwi...@wikimedia.org>: > >> Bertel, another option is to use the REST API: >> >> >> - HTML for a specific revision: https://en.wikipedia >> .org/api/rest_v1/#!/Page_content/getFormatRevision >> <https://en.wikipedia.org/api/rest_v1/#!/Page_content/getFormatRevision> >> - Within this HTML, references are marked up like this: >> https://www.mediawiki.org/wiki/Specs/HTML/1.3.0/Extensions/Cite >> <https://www.mediawiki.org/wiki/Specs/HTML/1.3.0/Extensions/Cite>. >> Any HTML or XML DOM parser can be used to extract this information. >> >> Hope this helps, >> >> Gabriel >> >> On Wed, Dec 21, 2016 at 3:20 AM, Bertel Teilfeldt Hansen < >> geilfe...@gmail.com> wrote: >> >>> Hi Brad and Gergo, >>> >>> Thanks for your responses! >>> >>> @Brad: Yeah, that was also my impression, but I wasn't sure. Seemed >>> strange that the example in the official docs would point to a place where >>> the feature was disabled. Thank you for clearing that up! >>> >>> @Gergo: I've been looking at action=parse, but as far as I understand >>> it, it is limited to one revision per API request, which makes it quite >>> slow to get a bunch of older revisions from a large number of articles. >>> action=query&prop=revisions&rvprop=content omits the references from >>> the output (just gives the string "{{reflist}}" after "References"). >>> "mvrefs" sounds very promising, though! I will definitely check that out - >>> thank you! >>> >>> Best, >>> >>> Bertel >>> >>> 2016-12-20 19:51 GMT+01:00 Gergo Tisza <gti...@wikimedia.org>: >>> >>>> On Tue, Dec 20, 2016 at 10:18 AM, Bertel Teilfeldt Hansen < >>>> geilfe...@gmail.com> wrote: >>>> >>>>> And is there no way of getting references through the API? >>>>> >>>> >>>> There is no nice way, but you can always get the HTML (or the parse >>>> tree, depending on whether you want parsed or raw refs) and process it; >>>> references are not hard to extract. For the wikitext version, there is a >>>> python tool: https://github.com/mediawiki-utilities/python-mwrefs >>>> >>>> _______________________________________________ >>>> Mediawiki-api mailing list >>>> Mediawiki-api@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >>>> >>>> >>> >>> _______________________________________________ >>> Mediawiki-api mailing list >>> Mediawiki-api@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >>> >>> >> >> >> -- >> Gabriel Wicke >> Principal Engineer, Wikimedia Foundation >> >> _______________________________________________ >> Mediawiki-api mailing list >> Mediawiki-api@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >> >> > > _______________________________________________ > Mediawiki-api mailing list > Mediawiki-api@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mediawiki-api > > -- Gabriel Wicke Principal Engineer, Wikimedia Foundation
_______________________________________________ Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api