Hello,

i am currently gathering image data for my master thesis. I am using the 
QLabels from wikidata, to crawl specific image classes (like axe, car etc.).

I am using the Action API for my requests and now my problem:

The QLabel Q870 (train) has around 21k images. I am using the sroffset 
parameter and the "continue" parameter from the response to search for 500 
images at a time. The script is working until I reach the 10k limit (the 
message is like: 'you request exceeded the limit of 10000 items ..."). Is there 
any option, that I can crawl more than 10k items/images from one search query?

My search query looks like this:
params = {
            'action': 'query',
            'format': 'json',
            'list': 'search',
            'srsearch': search_query,
            'srnamespace': '0|6|12|14|100|106',  # Namespace filter based on 
the provided URL
            'srlimit': batch_size,  # Number of images per batch
            'sroffset': start,  # Offset for pagination
            'prop': 'info|imageinfo',  # Request additional information about 
the pages (images)
            'inprop': 'url'  # Include the URL information
        }
the 'sroffset' parameter is always updated, with the result from the "continue" 
param from the response I get.

It would be a great, if somebody could help me!

Thank you! 
Kind regards
Ruben
_______________________________________________
Mediawiki-api mailing list -- mediawiki-api@lists.wikimedia.org
To unsubscribe send an email to mediawiki-api-le...@lists.wikimedia.org

Reply via email to