Why are you making so queries? Have you tried batching pages together? What kind of project needs a real-time copy of a large data set?
On Wed, May 8, 2019 at 2:49 PM Aadithya C Udupa <udupa.adit...@gmail.com> wrote: > Thank you for the quick response, Michael. > I was making close to 10 requests per second previously. But would hit the > HTTP 429 errors frequently. In the etiquette document here > <https://www.mediawiki.org/wiki/API:Etiquette>, it suggested we make > requests in serial manner rather than parallel. Hence started making > requests in serial manner and one request per second, as I did not want to > abuse the API. But as you can imagine it takes up a lot of time, especially > when trying to expand to multiple languages. > Also, I send a valid User-Agent header as described here > <https://meta.wikimedia.org/wiki/User-Agent_policy>. > What do you think could be other reasons why I hit the HTTP 429 error? Is > there a cap on total number of requests per day/week etc.? > > > On Wed, May 8, 2019 at 10:43 AM Michael Holloway <mhollo...@wikimedia.org> > wrote: > >> Hi Aadithya, >> >> According to the information on the top of the REST API docs page >> <https://wikimedia.org/api/rest_v1/>, you should in general be able to >> make up to 200 read requests per second to the REST API without any >> trouble. As far as I know, that information is accurate. Are you hitting >> 429s at a lower request rate than that? >> >> To answer your question, sending requests in parallel to multiple >> language subdomains should not be a problem so long as your overall request >> rate remains lower than ~200/s. >> >> On Tue, May 7, 2019 at 8:27 PM Aadithya C Udupa <udupa.adit...@gmail.com> >> wrote: >> >>> Hi, >>> For one of my projects, I need to be able to keep the most up to date >>> version of wikipedia html pages for a few languages like en, zh, de, es, fr >>> etc. So this is done currently in two steps, >>> 1. Listen to changes on stream API documented here >>> <https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams> and >>> then extract the page titles. >>> 2. For each of the titles, get the latest HTML using the Wikipedia REST >>> api >>> <https://en.wikipedia.org/api/rest_v1/#/Page%20content/get_page_title__title_> >>> and >>> persist the HTML. >>> >>> I understand that in order to avoid the 429 (Too many requests error), >>> we need to make sure we limit the api request to 1 per second. Just wanted >>> to check if we can make requests to different languages like >>> en.wikipedia.org, fr.wikipedia.org etc in parallel or do those requests >>> also need to be done in serial manner (1 per second), in order to not hit >>> HTTP 429 error. >>> >>> Please let me know if you need more information. >>> >>> >>> -- >>> Regards, >>> Aadithya >>> -- >>> Sent from my iPad3 >>> _______________________________________________ >>> Mediawiki-api mailing list >>> Mediawiki-api@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >>> >> >> >> -- >> Michael Holloway >> Software Engineer, Reading Infrastructure >> _______________________________________________ >> Mediawiki-api mailing list >> Mediawiki-api@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >> > > > -- > Regards, > Aadithya > _______________________________________________ > Mediawiki-api mailing list > Mediawiki-api@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >
_______________________________________________ Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api