Re: [Cloud] Browser extension for unsourced Wikipedia articles

Guilherme Gonçalves Sun, 24 Dec 2017 04:40:51 -0800

Hi everyone,

Apologies for resurrecting this old thread, but I finally got around to
making this (mostly) work so I thought I'd come back with an update. You
can install the extension for either Chrome or Firefox below:


https://chrome.google.com/webstore/detail/wikipedia-needs-reference/michcligfeahibdmakjapmaigojkddmk
https://addons.mozilla.org/en-GB/firefox/addon/wikipedia-needs-references/

The full code for the extension, server and the script that populates
ElasticSearch are on GitHub (http://github.com/eggpi/similarity/), and the
backend is hosted on Toolforge.

It's definitely experimental and lacking in various ways (there's not even
a proper icon yet!), but I've used it for a few weeks and managed to make
some edits through it. If this sounds interesting, please give it a try and
feel free to file issues.

Thanks!


2017-10-08 14:34 GMT+02:00 Guilherme Gonçalves <guilherme.p.g...@gmail.com>:

> This is great, thank you all for your input!
>
> It does seem like ElasticSearch (and likely MoreLikeThis) are the way to
> go, and I'm very happy to hear that this could be integrated with other use
> cases relatively easily. I'll definitely keep those in mind and I hope to
> come back to this in a few weeks.
>
> Thanks again!
>
> 2017-10-06 19:17 GMT+01:00 Morten Wang <nett...@gmail.com>:
>
>> In my experience, the problem you're trying to solve boils down to
>> finding articles similar to a given search query that are in the given
>> category. Trying to outsmart Lucene on that kind of a problem is going to
>> be challenging given that it's for example used as a benchmark in
>> research[1], so switching over to ElasticSearch is arguably the way to go.
>>
>> There's a specific feature in Lucene called "MoreLikeThis", and it's also
>> exposed in WP's search API to find articles similar to other articles. The
>> documentation[2] of that feature provides a fairly good explanation of how
>> it works, making it a possible starting point on how to filter a given
>> document to improve the search results.
>>
>> If I remember correctly there are a couple of research papers that study
>> how to recommend sources for articles (or articles for a given source), but
>> I'd have to go look for them to find them. You might want to consider
>> searching the Research Newsletter archives and Google Scholar as that might
>> give you a couple of existing approaches.
>>
>>
>> Footnotes:
>> 1: A paper I reviewed for the Research Newsletter used it:
>> https://meta.wikimedia.org/wiki/Research:Newsletter/2016
>> /May#Evaluating_link-based_recommendations_for_Wikipedia
>> 2: https://lucene.apache.org/core/3_0_3/api/contrib-queries/
>> org/apache/lucene/search/similar/MoreLikeThis.html
>>
>>
>> Cheers,
>> Morten
>>
>>
>> On 1 October 2017 at 18:36, Mukunda Modell <mmod...@wikimedia.org> wrote:
>>
>>> I think this is a really cool idea. I don't know of other similar tools
>>> but it does sound like something that should be a good fit for
>>> elasticsearch.
>>>
>>> On Fri, Sep 29, 2017 at 9:34 AM Guilherme Gonçalves <
>>> guilherme.p.g...@gmail.com> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> I've been hacking on a new tool and I thought I'd share what (little) I
>>>> have so far to get some comments and learn of related approaches from the
>>>> community.
>>>>
>>>> The basic idea would be to have a browser extension that tells the user
>>>> if the current page they're viewing looks like a good reference for a
>>>> Wikipedia article, for some whitelisted domains like news websites. This
>>>> would hopefully prompt casual/opportunistic edits, especially for articles
>>>> that may be overlooked normally.
>>>>
>>>> As a proof of concept for a backend, I built a simple bag-of-words
>>>> model of the TextExtracts of enwiki's 
>>>> Category:All_articles_needing_additional_references.
>>>> I then set up a tool [1] to receive HTML input and retrieve the 5 most
>>>> similar articles to that input. You can try it out in your browser [2], or
>>>> on the command line [3]. The results could definitely be better, but having
>>>> tried it on a few different articles over the past few days, I think
>>>> there's some potential there.
>>>>
>>>> I'd be interested in hearing your thoughts on this. Specifically:
>>>>
>>>> * If such a backend/API were available, would you be interested in
>>>> using it for other tools? If so, what functionality would you expect from
>>>> it?
>>>> * I'm thinking of just throwing away the above proof of concept and
>>>> using ElasticSearch, though I don't know a lot about it. Is anyone aware of
>>>> a similar dataset that already exists there, by any chance? Or any reasons
>>>> not to go that way?
>>>> * Any other comments on the overall idea or implementation?
>>>>
>>>> Thanks!
>>>>
>>>> 1- https://github.com/eggpi/similarity
>>>> 2- https://tools.wmflabs.org/similarity/
>>>> 3- Example: curl https://www.nytimes.com/2017/0
>>>> 9/22/opinion/sunday/portugal-drug-decriminalization.html | curl -X
>>>> POST http://tools.wmflabs.org/similarity/search --form "text=<-"
>>>> --
>>>> Guilherme P. Gonçalves
>>>> _______________________________________________
>>>> Cloud mailing list
>>>> Cloud@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/cloud
>>>>
>>>
>>> _______________________________________________
>>> Cloud mailing list
>>> Cloud@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/cloud
>>>
>>>
>>
>> _______________________________________________
>> Cloud mailing list
>> Cloud@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/cloud
>>
>>
>
>
> --
> Guilherme P. Gonçalves
>



-- 
Guilherme P. Gonçalves

_______________________________________________
Wikimedia Cloud Services mailing list
Cloud@lists.wikimedia.org (formerly lab...@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud

Re: [Cloud] Browser extension for unsourced Wikipedia articles

Reply via email to