Thanks for sharing. This gives nice analysis from data to insights - how do we drive actions from this report? Do we plan to use this data to make better tools? For example have a common pitfalls and how to avoid them: searching for library of congress links with regex search instead of external links query ( https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bextlinks ) (and similar for iwlinks for interwiki links) This can be even actively pushed to tools (either using User-Agent to contact the tool devs, or using warnings in the API result)
On Wed, May 30, 2018 at 11:51 PM, Trey Jones <[email protected]> wrote: > Hey everyone, > > As part of T195491 <https://phabricator.wikimedia.org/T195491>, Erik has > been looking into the details of our regex processing and ways to handle > ridiculously long-running regex queries. He pulled all the regex queries > over the last 90 days to get a sense of what features people are using and > what impact certain changes he was considering would have on users. Turns > out there are a lot more users than I would have thought—which is good > news! And a lot of them look like bots. > > He also made the mistake of pointing me to the data and highlighting a > common pattern—searches for interwiki links. I couldn't help myself—I > started digging around found that the majority of the searches are looking > for those interwiki links, and the vast majority of regex searches fall > into three types—interwiki links, URLs, and Library of Congress collection > IDs. > > Overall, there are 5,613,506 regexes total across all projects and all > languages, over a 90-day period. That comes out to ~62K/day—which is a lot > more than I'd expected, though I hadn't thought about bots using regexes. > > Read more on MediaWiki > <https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Survey_of_Regular_Expression_Searches> > . > > —Trey > > Trey Jones > Sr. Software Engineer, Search Platform > Wikimedia Foundation > > _______________________________________________ > Discovery mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/discovery > >
_______________________________________________ Discovery mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/discovery
