[Wikidata] Fwd: ORES extension soon be deployed, help us test it

2016-02-19 Thread Amir Ladsgroup
We are also in progress of deploying this extension for Wikidata too in
near future. So your help would be appreciated.

-- Forwarded message -
From: Amir Ladsgroup 
Date: Sat, Feb 20, 2016 at 2:05 AM
Subject: ORES extension soon be deployed, help us test it
To: wikitech-l , 


Hey all,
TLDR: ORES extension [1] which is an extension that integrates ORES service
[2] with Wikipedia to make fighting vandalism easier and more efficient is
in the progress of deployment. You can test it in
https://mw-revscoring.wmflabs.org (Enable it in your preferences first)

You probably know ORES. It's an API service that gives probably of an edit
being vandalism, it also does other AI-related stuff like guessing the
quality of articles in Wikipedia. We have a nice blog post in Wikimedia
Blog [3] and media paid some attention to it [4]. Thanks to Aaron Halfaker
and others [5] for their work in building this service. There are several
tools using ORES to highlight possibly vandalism edits. Huggle, gadgets
like ScoredRevisions, etc. But an extension does this job much more
efficiently.

The extension which is being developed by Adam Wight, Kunal Mehta and me
highlights unpatrolled edits in recentchanges, watchlists, related changes
and in future, user contributions if ORES score of those edits pass a
certain threshold. GUI design is made by May Galloway. ORES API (
ores.wmflabs.org) only gives you a score between 0 and 1. Zero means it's
not vandalism at all and one means it's vandalism for sure. You can test
its simple GUI in https://ores.wmflabs.org/ui/. It's possible to change the
threshold in your preferences in the recent changes tab (you have options
instead of numbers because we thought numbers are not very intuitive).
Also, we enabled it in a test wiki so you test it:
https://mw-revscoring.wmflabs.org. You need to make an account (use a dummy
password) and then enable it in beta features tab. Note that building AI
tool to detect vandalism in a test wiki sounds a little bit silly ;) so we
set up a dummy model that probability of an edit being vandalism is
backward of the last two digits (e.g. diff id:12345 = score:54%). In a more
technical aspect, we store these scores in ores_classification table so we
can do a lot more analysis with them once the extension is deployed. Fun
use cases such as the average score of a certain page or contributions of a
user or members of a category, etc.

We passed security review and we have consensus to enable it in Persian
Wikipedia. We are only blocked on ORES moving from Labs to production
(T106867 [6]). The next wiki is Wikidata, we are good to go once the
community finishes labeling edits so we can build the "damaging" model. We
can enable it Portuguese and Turkish Wikipedia after March because s2 and
s3 have database storage issues right now. For other Wikis, you need to
check if ORES supports the Wiki and if community finished labeling edits
for ORES (check out the table at [2])
If you want to report bugs or add feature requests you can find it in here
[7].

[1]: https://www.mediawiki.org/wiki/Extension:ORES
[2]: https://meta.wikimedia.org/wiki/Objective_Revision_Evaluation_Service
[3]:
https://blog.wikimedia.org/2015/11/30/artificial-intelligence-x-ray-specs/
[4]:
https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Media
[5]:
https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service#Team
[6]: https://phabricator.wikimedia.org/T106867
[7]: https://phabricator.wikimedia.org/tag/mediawiki-extensions-ores/

Best
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] SPARQL endpoint caching

2016-02-19 Thread Stas Malyshev
Hi!

> 
> I'll do a presentation next week, in which I intend to demonstrate
> that I can add a Wikidata value online, which then is available
> immediately for my application - as well as for the whole rest of the
> world. (In Library Land, that's a real blast, because business
> processes related to authority data often take weeks or month ...)

I think we'll always have some way to run un-cached query. The question
is only how easy would it be - i.e. would you need to add parameter,
click a checkbox, etc.

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [Analytics] [Wiki-Medicine] Zika

2016-02-19 Thread Dan Andreescu
Thanks, Reid.  When you say there's insufficient data history, do you mean
in other sources?  Zika was discovered in 1947 and the wiki page for it was
built in 2009.  We have high quality geolocated data since May 2015.

I'm still doing research (I admit the distractions at the foundation have
gotten in the way, I apologize for that).  I hope to get back to it with
renewed force this weekend.

On Fri, Feb 19, 2016 at 11:30 AM, Priedhorsky, Reid  wrote:

> We do have more work in progress to extend the 2014 paper, in particular
> to mosquito-borne diseases in a Spanish-speaking country, though not Zika
> because there is insufficient data history.
>
> I appreciate the pointer. Are there any specific questions folks would
> like me to address in this thread?
>
> Thanks,
> Reid
> ___
> Analytics mailing list
> analyt...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Make federated queries possible / was: SPARQL CONSTRUCT results truncated

2016-02-19 Thread Neubert, Joachim
Hi Stas,

Thanks for your explanation! I've to perhaps do some tests on my own systems ...

Cheers, Joachim

-Ursprüngliche Nachricht-
Von: Wikidata [mailto:wikidata-boun...@lists.wikimedia.org] Im Auftrag von Stas 
Malyshev
Gesendet: Donnerstag, 18. Februar 2016 19:12
An: Discussion list for the Wikidata project.
Betreff: Re: [Wikidata] Make federated queries possible / was: SPARQL CONSTRUCT 
results truncated

Hi!

> Now, obviously endpoints referenced in a federated query via a service 
> clause have to be open - so any attacker could send his queries 
> directly instead of squeezing them through some other endpoint. The 
> only scenario I can think of is that an attackers IP already is 
> blocked by the attacked site. If (instead of much more common ways to 
> fake an IP) the attacker would choose to do it by federated queries 
> through WDQS, this _could_ result in WDQS being blocked by this 
> endpoint.

This is not what we are concerned with. What we are concerned with is that 
federation essentially requires you to run an open proxy - i.e. to allow 
anybody to send requests to any URL. This is not acceptable to us because this 
means somebody could abuse this both to try and access our internal 
infrastructure and to launch attacks to other sites using our site as a 
platform.

We could allow, if there is enough demand, to access specific whitelisted 
endpoints but so far we haven't found any way to allow access to any SPARQL 
endpoint without essentially allowing anybody to launch arbitrary network 
connections from our server.

> provide for the linked data cloud. This must not involve the 
> highly-protected production environment, but could be solved by an 
> additional unstable/experimental endpoint under another address.

The problem is we can not run production-quality endpoint in non-production 
environment. We could set up an endpoint on the Labs, but this endpoint would 
be underpowered and we won't be able to guarantee any quality of service there. 
To serve the amount of Wikidata data and updates, the machines should have 
certain hardware capabilities, which Labs machines currently do not have.

Additionally, I'm not sure running open proxy even there would be a good idea. 
Unfortunately, in the internet environment of today there is no lack of players 
that would want to abuse such thing for nefarious purposes.

We will keep looking for solution for this, but so far we haven't found one.

Thanks,
--
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] from Freebase to Wikidata: the great migration

2016-02-19 Thread Marco Fossati

I couldn't wait for a detailed description of the primary sources tool.
Thanks a lot to the authors for mentioning the StrepHit soccer dataset!

Cheers,

Marco

On 2/19/16 13:00, wikidata-requ...@lists.wikimedia.org wrote:

Date: Thu, 18 Feb 2016 11:07:41 -0600
From: Maximilian Klein
To: "Discussion list for the Wikidata project."

Subject: Re: [Wikidata] from Freebase to Wikidata: the great migration
Message-ID:

Content-Type: text/plain; charset="utf-8"

Congratulations on a fantastic project and a your acceptance in WWW2016.

Make a great day,
Max Klein ‽http://notconfusing.com/

On Thu, Feb 18, 2016 at 10:54 AM, Federico Leva (Nemo)
wrote:


>Lydia Pintscher, 18/02/2016 15:59:
>

>>Thomas, Denny, Sebastian, Thomas, and I have published a paper which was
>>accepted for the industry track at WWW 2016. It covers the migration
>>from Freebase to Wikidata. You can now read it here:
>>http://research.google.com/pubs/archive/44818.pdf
>>
>>

>Nice!
>

> >Concluding, in a fairly short amount of time, we have been
> >able to provide the Wikidata community with more than
> >14 million new Wikidata statements using a customizable

>
>I must admit that, despite knowing the context, I wasn't able to
>understand whether this is the number of "mapped"/"translated" statements
>or the number of statements actually added via the primary sources tool. I
>assume the latter given paragraph 5.3:
>

> >after removing dupli
> >cates and facts already contained in Wikidata, we obtain
> >14 million new statements. If all these statements were
> >added to Wikidata, we would see a 21% increase of the num-
> >ber of statements in Wikidata.

>

I was confused about that too. "the [Primary Sources] tool has been
used by more than a hundred users who performed about
90,000 approval or rejection actions. More than 14 million
statements have been uploaded in total."  I think that means that ≤ 90,000
items or statements were added of 14 million available to be add through
Primary Sources tool.


>
>Nemo
>
>___
>Wikidata mailing list
>Wikidata@lists.wikimedia.org
>https://lists.wikimedia.org/mailman/listinfo/wikidata
>


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata