[Wikidata] Fwd: ORES extension soon be deployed, help us test it
We are also in progress of deploying this extension for Wikidata too in near future. So your help would be appreciated. -- Forwarded message - From: Amir Ladsgroup Date: Sat, Feb 20, 2016 at 2:05 AM Subject: ORES extension soon be deployed, help us test it To: wikitech-l , Hey all, TLDR: ORES extension [1] which is an extension that integrates ORES service [2] with Wikipedia to make fighting vandalism easier and more efficient is in the progress of deployment. You can test it in https://mw-revscoring.wmflabs.org (Enable it in your preferences first) You probably know ORES. It's an API service that gives probably of an edit being vandalism, it also does other AI-related stuff like guessing the quality of articles in Wikipedia. We have a nice blog post in Wikimedia Blog [3] and media paid some attention to it [4]. Thanks to Aaron Halfaker and others [5] for their work in building this service. There are several tools using ORES to highlight possibly vandalism edits. Huggle, gadgets like ScoredRevisions, etc. But an extension does this job much more efficiently. The extension which is being developed by Adam Wight, Kunal Mehta and me highlights unpatrolled edits in recentchanges, watchlists, related changes and in future, user contributions if ORES score of those edits pass a certain threshold. GUI design is made by May Galloway. ORES API ( ores.wmflabs.org) only gives you a score between 0 and 1. Zero means it's not vandalism at all and one means it's vandalism for sure. You can test its simple GUI in https://ores.wmflabs.org/ui/. It's possible to change the threshold in your preferences in the recent changes tab (you have options instead of numbers because we thought numbers are not very intuitive). Also, we enabled it in a test wiki so you test it: https://mw-revscoring.wmflabs.org. You need to make an account (use a dummy password) and then enable it in beta features tab. Note that building AI tool to detect vandalism in a test wiki sounds a little bit silly ;) so we set up a dummy model that probability of an edit being vandalism is backward of the last two digits (e.g. diff id:12345 = score:54%). In a more technical aspect, we store these scores in ores_classification table so we can do a lot more analysis with them once the extension is deployed. Fun use cases such as the average score of a certain page or contributions of a user or members of a category, etc. We passed security review and we have consensus to enable it in Persian Wikipedia. We are only blocked on ORES moving from Labs to production (T106867 [6]). The next wiki is Wikidata, we are good to go once the community finishes labeling edits so we can build the "damaging" model. We can enable it Portuguese and Turkish Wikipedia after March because s2 and s3 have database storage issues right now. For other Wikis, you need to check if ORES supports the Wiki and if community finished labeling edits for ORES (check out the table at [2]) If you want to report bugs or add feature requests you can find it in here [7]. [1]: https://www.mediawiki.org/wiki/Extension:ORES [2]: https://meta.wikimedia.org/wiki/Objective_Revision_Evaluation_Service [3]: https://blog.wikimedia.org/2015/11/30/artificial-intelligence-x-ray-specs/ [4]: https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Media [5]: https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service#Team [6]: https://phabricator.wikimedia.org/T106867 [7]: https://phabricator.wikimedia.org/tag/mediawiki-extensions-ores/ Best ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] SPARQL endpoint caching
Hi! > > I'll do a presentation next week, in which I intend to demonstrate > that I can add a Wikidata value online, which then is available > immediately for my application - as well as for the whole rest of the > world. (In Library Land, that's a real blast, because business > processes related to authority data often take weeks or month ...) I think we'll always have some way to run un-cached query. The question is only how easy would it be - i.e. would you need to add parameter, click a checkbox, etc. -- Stas Malyshev smalys...@wikimedia.org ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] [Analytics] [Wiki-Medicine] Zika
Thanks, Reid. When you say there's insufficient data history, do you mean in other sources? Zika was discovered in 1947 and the wiki page for it was built in 2009. We have high quality geolocated data since May 2015. I'm still doing research (I admit the distractions at the foundation have gotten in the way, I apologize for that). I hope to get back to it with renewed force this weekend. On Fri, Feb 19, 2016 at 11:30 AM, Priedhorsky, Reid wrote: > We do have more work in progress to extend the 2014 paper, in particular > to mosquito-borne diseases in a Spanish-speaking country, though not Zika > because there is insufficient data history. > > I appreciate the pointer. Are there any specific questions folks would > like me to address in this thread? > > Thanks, > Reid > ___ > Analytics mailing list > analyt...@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] Make federated queries possible / was: SPARQL CONSTRUCT results truncated
Hi Stas, Thanks for your explanation! I've to perhaps do some tests on my own systems ... Cheers, Joachim -Ursprüngliche Nachricht- Von: Wikidata [mailto:wikidata-boun...@lists.wikimedia.org] Im Auftrag von Stas Malyshev Gesendet: Donnerstag, 18. Februar 2016 19:12 An: Discussion list for the Wikidata project. Betreff: Re: [Wikidata] Make federated queries possible / was: SPARQL CONSTRUCT results truncated Hi! > Now, obviously endpoints referenced in a federated query via a service > clause have to be open - so any attacker could send his queries > directly instead of squeezing them through some other endpoint. The > only scenario I can think of is that an attackers IP already is > blocked by the attacked site. If (instead of much more common ways to > fake an IP) the attacker would choose to do it by federated queries > through WDQS, this _could_ result in WDQS being blocked by this > endpoint. This is not what we are concerned with. What we are concerned with is that federation essentially requires you to run an open proxy - i.e. to allow anybody to send requests to any URL. This is not acceptable to us because this means somebody could abuse this both to try and access our internal infrastructure and to launch attacks to other sites using our site as a platform. We could allow, if there is enough demand, to access specific whitelisted endpoints but so far we haven't found any way to allow access to any SPARQL endpoint without essentially allowing anybody to launch arbitrary network connections from our server. > provide for the linked data cloud. This must not involve the > highly-protected production environment, but could be solved by an > additional unstable/experimental endpoint under another address. The problem is we can not run production-quality endpoint in non-production environment. We could set up an endpoint on the Labs, but this endpoint would be underpowered and we won't be able to guarantee any quality of service there. To serve the amount of Wikidata data and updates, the machines should have certain hardware capabilities, which Labs machines currently do not have. Additionally, I'm not sure running open proxy even there would be a good idea. Unfortunately, in the internet environment of today there is no lack of players that would want to abuse such thing for nefarious purposes. We will keep looking for solution for this, but so far we haven't found one. Thanks, -- Stas Malyshev smalys...@wikimedia.org ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] from Freebase to Wikidata: the great migration
I couldn't wait for a detailed description of the primary sources tool. Thanks a lot to the authors for mentioning the StrepHit soccer dataset! Cheers, Marco On 2/19/16 13:00, wikidata-requ...@lists.wikimedia.org wrote: Date: Thu, 18 Feb 2016 11:07:41 -0600 From: Maximilian Klein To: "Discussion list for the Wikidata project." Subject: Re: [Wikidata] from Freebase to Wikidata: the great migration Message-ID: Content-Type: text/plain; charset="utf-8" Congratulations on a fantastic project and a your acceptance in WWW2016. Make a great day, Max Klein ‽http://notconfusing.com/ On Thu, Feb 18, 2016 at 10:54 AM, Federico Leva (Nemo) wrote: >Lydia Pintscher, 18/02/2016 15:59: > >>Thomas, Denny, Sebastian, Thomas, and I have published a paper which was >>accepted for the industry track at WWW 2016. It covers the migration >>from Freebase to Wikidata. You can now read it here: >>http://research.google.com/pubs/archive/44818.pdf >> >> >Nice! > > >Concluding, in a fairly short amount of time, we have been > >able to provide the Wikidata community with more than > >14 million new Wikidata statements using a customizable > >I must admit that, despite knowing the context, I wasn't able to >understand whether this is the number of "mapped"/"translated" statements >or the number of statements actually added via the primary sources tool. I >assume the latter given paragraph 5.3: > > >after removing dupli > >cates and facts already contained in Wikidata, we obtain > >14 million new statements. If all these statements were > >added to Wikidata, we would see a 21% increase of the num- > >ber of statements in Wikidata. > I was confused about that too. "the [Primary Sources] tool has been used by more than a hundred users who performed about 90,000 approval or rejection actions. More than 14 million statements have been uploaded in total." I think that means that ≤ 90,000 items or statements were added of 14 million available to be add through Primary Sources tool. > >Nemo > >___ >Wikidata mailing list >Wikidata@lists.wikimedia.org >https://lists.wikimedia.org/mailman/listinfo/wikidata > ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata