Thank you Michael. That was helpful.
Will reach out to ops team.

On Thu, May 9, 2019 at 6:24 AM Michael Holloway <mhollo...@wikimedia.org>
wrote:

> Aadithya,
>
> About title batching , you're not missing anything — unlike the action
> api (/w/api.php), the REST API (/api/rest_v1) page content endpoints take
> only a single title at a time.
>
> It sounds like you may indeed be running into some periodic rate limit.
> The best source of info on current rate limits are the Traffic engineers on
> the Site Reliability Engineering
> <https://www.mediawiki.org/wiki/Wikimedia_Site_Reliability_Engineering>
> team; I'm not sure if any of them are subscribed to this list.  You may
> have better luck asking on the Operations mailing list (
> o...@lists.wikimedia.org) or the #wikimedia-operations channel on IRC
> (irc://irc.freenode.net/wikimedia-operations).
>
> On Wed, May 8, 2019 at 5:20 PM Aadithya C Udupa <udupa.adit...@gmail.com>
> wrote:
>
>> Hi,
>> I am making the queries to get the latest HTML content for a title. I am
>> using the API documented here -
>> https://en.wikipedia.org/api/rest_v1/#/Page%20content/get_page_html__title_ 
>> and
>> I may be missing something, but I do not see an option to send a list of
>> titles.
>> Also I am working on a project to do some semistructured and unstructured
>> data extractions from wikipedia html.
>>
>>
>> On Wed, May 8, 2019 at 1:23 PM Betacommand <betacomm...@gmail.com> wrote:
>>
>>>
>>> Why are you making so queries? Have you tried  batching pages together?
>>> What kind of project needs a real-time copy of a large data set?
>>>
>>> On Wed, May 8, 2019 at 2:49 PM Aadithya C Udupa <udupa.adit...@gmail.com>
>>> wrote:
>>>
>>>> Thank you for the quick response, Michael.
>>>> I was making close to 10 requests per second previously. But would hit
>>>> the HTTP 429 errors frequently. In the etiquette document here
>>>> <https://www.mediawiki.org/wiki/API:Etiquette>, it suggested we make
>>>> requests in serial manner rather than parallel. Hence started making
>>>> requests in serial manner and one request per second, as I did not want to
>>>> abuse the API. But as you can imagine it takes up a lot of time, especially
>>>> when trying to expand to multiple languages.
>>>> Also, I send a valid User-Agent header as described here
>>>> <https://meta.wikimedia.org/wiki/User-Agent_policy>.
>>>> What do you think could be other reasons why I hit the HTTP 429 error?
>>>> Is there a cap on total number of requests per day/week etc.?
>>>>
>>>>
>>>> On Wed, May 8, 2019 at 10:43 AM Michael Holloway <
>>>> mhollo...@wikimedia.org> wrote:
>>>>
>>>>> Hi Aadithya,
>>>>>
>>>>> According to the information on the top of the REST API docs page
>>>>> <https://wikimedia.org/api/rest_v1/>, you should in general be able
>>>>> to make up to 200 read requests per second to the REST API without any
>>>>> trouble.  As far as I know, that information is accurate.  Are you hitting
>>>>> 429s at a lower request rate than that?
>>>>>
>>>>> To answer your question, sending requests in parallel to multiple
>>>>> language subdomains should not be a problem so long as your overall 
>>>>> request
>>>>> rate remains lower than ~200/s.
>>>>>
>>>>> On Tue, May 7, 2019 at 8:27 PM Aadithya C Udupa <
>>>>> udupa.adit...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>> For one of my projects, I need to be able to keep the most up to date
>>>>>> version of wikipedia html pages for a few languages like en, zh, de, es, 
>>>>>> fr
>>>>>> etc. So this is done currently in two steps,
>>>>>> 1. Listen to changes on stream API documented here
>>>>>> <https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams> and
>>>>>> then extract the page titles.
>>>>>> 2. For each of the titles, get the latest HTML using the Wikipedia
>>>>>> REST api
>>>>>> <https://en.wikipedia.org/api/rest_v1/#/Page%20content/get_page_title__title_>
>>>>>>  and
>>>>>> persist the HTML.
>>>>>>
>>>>>> I understand that in order to avoid the 429 (Too many requests
>>>>>> error), we need to make sure we limit the api request to 1 per second. 
>>>>>> Just
>>>>>> wanted to check if we can make requests to different languages like
>>>>>> en.wikipedia.org, fr.wikipedia.org etc in parallel or do those
>>>>>> requests also need to be done in serial manner (1 per second), in order 
>>>>>> to
>>>>>> not hit HTTP 429 error.
>>>>>>
>>>>>> Please let me know if you need more information.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Aadithya
>>>>>> --
>>>>>> Sent from my iPad3
>>>>>> _______________________________________________
>>>>>> Mediawiki-api mailing list
>>>>>> Mediawiki-api@lists.wikimedia.org
>>>>>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Michael Holloway
>>>>> Software Engineer, Reading Infrastructure
>>>>> _______________________________________________
>>>>> Mediawiki-api mailing list
>>>>> Mediawiki-api@lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Aadithya
>>>> _______________________________________________
>>>> Mediawiki-api mailing list
>>>> Mediawiki-api@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>>>>
>>> _______________________________________________
>>> Mediawiki-api mailing list
>>> Mediawiki-api@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>>>
>>
>>
>> --
>> Regards,
>> Aadithya
>> _______________________________________________
>> Mediawiki-api mailing list
>> Mediawiki-api@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>>
>
>
> --
> Michael Holloway
> Software Engineer, Reading Infrastructure
> _______________________________________________
> Mediawiki-api mailing list
> Mediawiki-api@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>


-- 
Regards,
Aadithya
_______________________________________________
Mediawiki-api mailing list
Mediawiki-api@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

Reply via email to