Hi, I am making the queries to get the latest HTML content for a title. I am using the API documented here - https://en.wikipedia.org/api/rest_v1/#/Page%20content/get_page_html__title_ and I may be missing something, but I do not see an option to send a list of titles. Also I am working on a project to do some semistructured and unstructured data extractions from wikipedia html.
On Wed, May 8, 2019 at 1:23 PM Betacommand <betacomm...@gmail.com> wrote: > > Why are you making so queries? Have you tried batching pages together? > What kind of project needs a real-time copy of a large data set? > > On Wed, May 8, 2019 at 2:49 PM Aadithya C Udupa <udupa.adit...@gmail.com> > wrote: > >> Thank you for the quick response, Michael. >> I was making close to 10 requests per second previously. But would hit >> the HTTP 429 errors frequently. In the etiquette document here >> <https://www.mediawiki.org/wiki/API:Etiquette>, it suggested we make >> requests in serial manner rather than parallel. Hence started making >> requests in serial manner and one request per second, as I did not want to >> abuse the API. But as you can imagine it takes up a lot of time, especially >> when trying to expand to multiple languages. >> Also, I send a valid User-Agent header as described here >> <https://meta.wikimedia.org/wiki/User-Agent_policy>. >> What do you think could be other reasons why I hit the HTTP 429 error? Is >> there a cap on total number of requests per day/week etc.? >> >> >> On Wed, May 8, 2019 at 10:43 AM Michael Holloway <mhollo...@wikimedia.org> >> wrote: >> >>> Hi Aadithya, >>> >>> According to the information on the top of the REST API docs page >>> <https://wikimedia.org/api/rest_v1/>, you should in general be able to >>> make up to 200 read requests per second to the REST API without any >>> trouble. As far as I know, that information is accurate. Are you hitting >>> 429s at a lower request rate than that? >>> >>> To answer your question, sending requests in parallel to multiple >>> language subdomains should not be a problem so long as your overall request >>> rate remains lower than ~200/s. >>> >>> On Tue, May 7, 2019 at 8:27 PM Aadithya C Udupa <udupa.adit...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> For one of my projects, I need to be able to keep the most up to date >>>> version of wikipedia html pages for a few languages like en, zh, de, es, fr >>>> etc. So this is done currently in two steps, >>>> 1. Listen to changes on stream API documented here >>>> <https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams> and >>>> then extract the page titles. >>>> 2. For each of the titles, get the latest HTML using the Wikipedia >>>> REST api >>>> <https://en.wikipedia.org/api/rest_v1/#/Page%20content/get_page_title__title_> >>>> and >>>> persist the HTML. >>>> >>>> I understand that in order to avoid the 429 (Too many requests error), >>>> we need to make sure we limit the api request to 1 per second. Just wanted >>>> to check if we can make requests to different languages like >>>> en.wikipedia.org, fr.wikipedia.org etc in parallel or do those >>>> requests also need to be done in serial manner (1 per second), in order to >>>> not hit HTTP 429 error. >>>> >>>> Please let me know if you need more information. >>>> >>>> >>>> -- >>>> Regards, >>>> Aadithya >>>> -- >>>> Sent from my iPad3 >>>> _______________________________________________ >>>> Mediawiki-api mailing list >>>> Mediawiki-api@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >>>> >>> >>> >>> -- >>> Michael Holloway >>> Software Engineer, Reading Infrastructure >>> _______________________________________________ >>> Mediawiki-api mailing list >>> Mediawiki-api@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >>> >> >> >> -- >> Regards, >> Aadithya >> _______________________________________________ >> Mediawiki-api mailing list >> Mediawiki-api@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >> > _______________________________________________ > Mediawiki-api mailing list > Mediawiki-api@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mediawiki-api > -- Regards, Aadithya
_______________________________________________ Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api