Thanks for sharing.
This gives nice analysis from data to insights - how do we drive actions
from this report?
Do we plan to use this data to make better tools?
For example  have a common pitfalls and how to avoid them: searching for
library of congress links with regex search instead of external links query
( https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bextlinks )
(and similar for iwlinks for interwiki links)
This can be even actively pushed to tools (either using User-Agent to
contact the tool devs, or using warnings in the API result)


On Wed, May 30, 2018 at 11:51 PM, Trey Jones <[email protected]> wrote:

> Hey everyone,
>
> As part of T195491 <https://phabricator.wikimedia.org/T195491>, Erik has
> been looking into the details of our regex processing and ways to handle
> ridiculously long-running regex queries. He pulled all the regex queries
> over the last 90 days to get a sense of what features people are using and
> what impact certain changes he was considering would have on users. Turns
> out there are a lot more users than I would have thought—which is good
> news! And a lot of them look like bots.
>
> He also made the mistake of pointing me to the data and highlighting a
> common pattern—searches for interwiki links. I couldn't help myself—I
> started digging around found that the majority of the searches are looking
> for those interwiki links, and the vast majority of regex searches fall
> into three types—interwiki links, URLs, and Library of Congress collection
> IDs.
>
> Overall, there are 5,613,506 regexes total across all projects and all
> languages, over a 90-day period. That comes out to ~62K/day—which is a lot
> more than I'd expected, though I hadn't thought about bots using regexes.
>
> Read more on MediaWiki
> <https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Survey_of_Regular_Expression_Searches>
> .
>
> —Trey
>
> Trey Jones
> Sr. Software Engineer, Search Platform
> Wikimedia Foundation
>
> _______________________________________________
> Discovery mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/discovery
>
>
_______________________________________________
Discovery mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/discovery

Reply via email to