jayvdb created this task. jayvdb added subscribers: jayvdb, Mpaa. jayvdb added projects: pywikibot-core, Wikisource. Herald added subscribers: pywikibot-bugs-list, StudiesWorld, Aklapper.
TASK DESCRIPTION https://gerrit.wikimedia.org/r/#/c/250221/ provides one way to filter by quality level. This is already possible, but certainly not efficient, using: ``` python pwb.py listpages -usercontribs:Mpaa -intersect -cat:Validated ``` That is terribly inefficient as those categories are so big. https://en.wikisource.org/wiki/Category:Validated and https://en.wikisource.org/wiki/Category:Proofread have approx 250,000 and 350,000 members atm respectively. However I was able to do an intersection using these categories in less than a minute, so for people on a good internet connection it is a feasible solution. What would be very cool is if pagegenerators had a proper filter version of `-cat`, so that only pages emitted by another generator are checked for the category, instead of fetching the entire category. Another approach would be to have a generic command line predicate pagegenerator, like ``` python pwb.py listpages -usercontribs:Mpaa -objectfilter:"x._quality in [1, 2]" ``` Finally, another approach which might be useful sometimes is to interface with http://tools.wmflabs.org/catscan2/catscan2.php to get the initial title list. However this is only more efficient if the desired query includes some limiter other than the proofread category, such as size or age. And it doesnt support all pagegenerator options, such as `-usercontribs` generator or `-grep` filter. TASK DETAIL https://phabricator.wikimedia.org/T122047 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: jayvdb Cc: Aklapper, StudiesWorld, Mpaa, jayvdb, pywikibot-bugs-list, Billinghurst, Krenair _______________________________________________ pywikibot-bugs mailing list pywikibot-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs