Re: [Wiki-research-l] feedback appreciated

2017-08-28 Thread Caroline Sinders
Hi all!

Sorry for the delay- I had a super jam packed weekend and now upcoming week.

A few points- thank you for the feedback! In general, I love feedback and
criticism and I definitely got it :) Two, didn't realize this was a *wiki
only* related research channel, so I'll try to bear that in mind in the
future when sharing things I am writing or have written.

But thirdly and lastly, this is not an academic article. This is an article
published in design magazine about research related to ethics within
product design, specifically products using utilizing machine learning and
artificial intelligence. Though, I would love to write an academic paper on
ethics of design utilizing machine learning *in* product design. If that
sounds interesting to any of you, please get at me. I love to collaborate.

So- the tone of voice is *quite* snarky but I stand by it, again because
this was written for Fast Company. I have much more academic writing, if
you are interested in reading that, but it is on online harassment and
automation. This article is designed to be a primer of information for
product designers who may have heard Elon focusing on the dangers of AI.
There are plenty of things to worry about in the future of AI, like the
integration of artificial intelligence into the military or drones, for
example. But publicly, there are no cases of that. There is, publicly, a
variety of investigations done by ProPublica, which I link to in my
article, about predictive policing and it's racial bias. The article itself
is designed to be *approachable* for all readers, *especially non technical
readers*.  And this piece, in it's tone which I stand by, was designed to
jokingly respond to Musk's hyperbolic freak out.

This is, instead, an article designed for lay people, and everyday
designers, to think about what are the current issues with AI, examples of
current issues with implicit bias in machine learning products right now,
and other articles and videos to watch. What this is is a class syllabus
wrapped in a layer of a very genial tone so everyday designers have
something to chew on and some real information to grasp.

There aren't a lot of resources for everyday designers out there. There are
not a lot of resources for start ups, product managers, designers, front
end developers, etc on what is out there in this new and emerging field of
artificial intelligence and how it exists currently within products already
out in the world. Truth be told, this is an article I wrote for my old
coworkers at IBM Watson Design- on why having a real conversation about
ethically how you should design, ethically how you should build products
using machine learning and what questions you should ask about what you are
building and why. I saw and had *very few* of those conversations. I am
writing for *those plumbers* who are out there making things right now, and
have bad leadership and bad guidance, but are generally excited about
product design and the future of AI, and they also have to ship their
products now. Because, I am, also, a plumber. What I am doing *right now*
at the Wikimedia Foundation is the fantastically weird but unsexy of job of
designing tools and UI to mitigate online harassment while studying on
wiki-harassment. It's not just research but a design schedule of rolling
out tools quickly for the community to mitigate the onslaught of a lot of
very real problems that are happening as we speak. I love it, I love the
research that I'm doing because it's about the present and the future.
Plumbing is important, it's how we all avoid cholera. Future city planning
is important, it's how larger society functions together. Both are
important.

I think we're really lucky to work where we all work and to be a part of
this community. We get to question, openly and transparently, we get to
solicit feedback, and we get to work on very meaningful software. Not every
technologist or researcher is as lucky as we are. And those are the
technologists I am most keen to talk to- what does it mean to fold in a
technology that you don't understand very well, how do you design and
utilize design thinking to make *something right now* and how do you do
that without recreating a surveillance tool? It's really hard if you don't
understand how to think about the threat model of your product, of what you
intend to make and how it can be used to harm. There are so few primers for
designers that exist on thinking about products from an ethical standpoint,
and a standpoint of implicit bias. All of which are such important things
to talk about when you are building products that use algorithms, and data,
and the algorithm + the data really will determine what your product does
more so than the design intends.

But you all know this already, it's lot's of other people that don't :)

Best,
Caroline

Ps. the briefest, tiniest of FYIs, in online harassment and security,
plumbers have a *hyper specific* connotation to them

Re: [Wiki-research-l] feedback appreciated

2017-08-28 Thread Aaron Halfaker
OK ok.  There's some hyperbole in this article and we are the type of
people bent on citations and support. This isn't a research publication and
Caroline admits in the beginning that she's going to get into a bit of a
lecturing tone.

But honestly I liked the article.  It makes a good point and pushes a
sentiment that I share.  Hearing about killer robots turning on humanity is
sort of like hearing someone tell you that they are worried about global
warming on Mars for future civilizations there when we ought to be more
alarmed and focused on the coastal cities on Earth right now.  We have so
many pressing issues with AIs that are affecting people right now that the
future focused alarm is, well, a bit alarmist!  Honestly, I think that's
the side of AI that lay people understand while the nuanced issues present
in the AIs alive today are poorly understood and desperately in need of
regulation

I don't think that the people who ought to worry about AIs current problems
are "plumbers".  They are you.  They are me.  They are Elon Musk.
Identifying and dealing with the structural inequalities that AIs create
today is state-of-the-art work.  If we knew how to do it, we'd be done
already.  If you disagree, please show me where I can go get a tradeschool
degree that will tell me what to do and negate the need for my research
agenda.

-Aaron

On Mon, Aug 28, 2017 at 1:58 AM, Robert West  wrote:

> Hi Caroline,
>
> The premise of this article seems to be that everyone needs to solve either
> the immediate or the distant problems. No one (and certainly not Elon Musk)
> would argue that there are no immediate problems with AI, but why should
> that keep us from thinking ahead?
>
> In a company, too, you have plumbers who fix the bathrooms today and
> strategists who plan business 20 years ahead. We need both. If the plumbers
> didn't worry about the immediate problems, the strategists couldn't do
> their jobs. If the strategists didn't worry about the distant problems, the
> plumbers might not have jobs down the road.
>
> Also, your argument stands on sandy ground from paragraph one, where you
> claim that AI will never threaten humanity, without giving the inkling of
> an argument.
>
> Bob
>
> On Fri, Aug 25, 2017 at 6:50 PM, Caroline Sinders 
> wrote:
>
> > hi all,
> > i just started a column with fast co and wrote an article about elon
> musk's
> > AI panic.
> >
> > https://www.fastcodesign.com/90137818/dear-elon-forget-
> > killer-robots-heres-what-you-should-really-worry-about
> >
> > would love some feedback :)
> >
> > best,
> > caroline
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] stat1002 and stat1003 deprecated. Please use new stat boxes

2017-08-28 Thread Andrew Otto
Hi all!  Just an update:  We plan to decommission stat1003 next week.

I’ll be sure to run a final home directory rsync from stat1003 -> stat1006
before we do.


On Tue, Jul 18, 2017 at 1:31 PM, Andrew Otto  wrote:

> Hi all!
>
> tl;dr: Stop using stat100[23] by September 1st.
>
> We’re finally replacing stat1002 and stat1003.  These boxes are out of
> warranty, and are running Ubuntu Trusty, while most of the production fleet
> is already on Debian Jessie or even Debian Stretch.
>
> stat1005 is the new stat1002 replacement.  If you have access to stat1002,
> you also have access to stat1005.  I’ve copied over home directories from
> stat1002.
>
> stat1006 is the new stat1003 replacement.  If you have access to stat1003,
> you also have access to stat1006.  I’ve copied over home directories from
> stat1003.
>
> I have not migrated any personal cron jobs running on stat1002 or
> stat1003.  I need your help for this!
>
> Both of these boxes are running Debian Stretch.  As such, packages that
> your work depends on may have upgraded.  Please log into the new boxes and
> try stuff out!  If you find anything that doesn’t work, please let me know
> by commenting on https://phabricator.wikimedia.org/T152712.
>
> Please be fully migrated to the new nodes by September 1st.  This will
> give us enough time to fully decommission stat1002 and stat1003 by the end
> of this quarter.
>
> I’ve only done a single rsync of home directories.  If there is new data
> on stat1002 or stat1003 that you want rsynced over, let me know on the
> ticket.
>
> A few notes:
> - stat1002 used to have /a.  This has been removed in favor of /srv.  /a
> no longer exists.
> - Home directories are now much larger.  You no longer need to create
> personal directories in /srv.
> - /tmp is still small, so please be careful.  If you are running long jobs
> that generate temporary data, please have those jobs write into your home
> directory, rather than /tmp.
> - We might implement user home directory quotas in the future.
>
> Thanks all!  I’ll send another email in about a months time to remind you
> of the impending deadline of Sept 1.
>
> -Andrew Otto
>
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] English proofreading of my AICCSA paper

2017-08-28 Thread abdelwaheb turki


Dear Mr. or Ms.,

I thank you for your support to my AICCSA paper. This is an honour of me. I 
have done quite all the required revisions. However, its English is still not 
proofread. I ask if you can verify and adjust the language quality of my AICCSA 
paper. It is currently available in 
https://1drv.ms/w/s!AiC69hcGxSVPl1UI1SV81mkr2uVu. As for the grant to attend 
AICCSA 2017, you can still endorse it in 
https://meta.wikimedia.org/wiki/Grants:Project/Rapid/Csisc/Presenting_Wikidata_for_Arab_Computational_Linguists.

Yours Sincerely,

Houcemeddine Turki

[https://6swnyw.bn1304.livefilestore.com/y4mYIqWzSGilmMo-Bfxw251baLYJvCcGZlQifGFNwwFykE_dD8_xPpcmz3k0QdX0_Uv3xmgcCBsBnwXgtQpcL4QyPiBGAAtFgSIVPq6F_CByK2vU-nngSbw1YtFJHKIufLqTQ3XbyiW1rARrXIf53T1P7hnHsBFRJ-i56hCl_EOFz8SAq3ql_HqS1KHvNF5zXUgampm5IYKXeuB-WpsIlDHnQ?width=800=800=none]

[https://r1.res.office365.com/owa/prem/images/dc-docx_40.png]
AICCSA-Copie-_2_ 2.docx
Partagé via OneDrive






De : ANLP2017 
Envoyé : mercredi 16 août 2017 00:53
À : Houcemeddine Turki
Objet : ANLP2017 notification for paper 3

Dear Dr.Houcemeddine Turki

Congratulations! On behalf of the ANLP 2017 workshop and Conference Committees 
of the 14th ACS/IEEE International Conference on Computer Systems and 
Applications AICCSA 2017, October 30th to November 3rd, 2017.
 we are happy to inform you that your paper entitled:

Using WikiData to create a multi-lingual multi-dialectal dictionary for Arabic 
dialects

has been accepted for presentation and inclusion in the Proceedings of AICCSA - 
ANLP 2017, published by IEEE.


Please see the reviewers’ comments below on your paper. These comments are 
intended to help you to improve your paper for final publication. The listed 
comments should be addressed, as final acceptance is conditional upon 
appropriate response to the requirements and comments. The conference committee 
retains a list of certain critical comments to be addressed by authors, and 
will control that these have been addressed in the camera-ready version.


What is next:
-
The AICCSA website is updated now with required information. Please find below 
the details for the camera ready submission and the registration.

Camera ready submission:
Due date: 31/8/2017
Submission information can be found at the following link: 
http://www.aiccsa.net/AICCSA2017/submission

Final Camera Ready and copyright 
instructions
www.aiccsa.net
14th ACS/IEEE International Conference on Computer Systems and Applications 
AICCSA 2017 October 30th to November 3rd, 2017




Paper Registration:
Due date: 8/9/2017
The registration information can be found at the following link: 
http://www.aiccsa.net/AICCSA2017/registration

Registration - AICCSA
www.aiccsa.net
14th ACS/IEEE International Conference on Computer Systems and Applications 
AICCSA 2017 October 30th to November 3rd, 2017




We are looking to meet you in AICCSA 2017.

Best Regards,

AICCSA - ANLP 2017 Organization Team.

==


--- REVIEW 1 -
PAPER: 3
TITLE: Using WikiData to create a multi-lingual multi-dialectal dictionary for 
Arabic dialects
AUTHORS: Houcemeddine Turki, Denny Vrandečić, Helmi Hamdi and Imed Adel

Overall evaluation: 1 (weak accept)

--- Overall evaluation ---
It is an interesting work with valid assumption and its proposed ideas also in 
line with the expectation of the event. Overall, I think, this paper is 
interesting and has good contribution to this topic. However, the authors are 
advised to have the following points on their revised version:
-Please elaborate in details about the proposed approach with more focusing on 
the relations between its components, as they are the core of the solution and 
need more justification of why to use them.
-Overall technical exposition must be strengthened with more concrete examples.
-The authors are urged to summarize and list the key observations from the 
paper.
-The paper is readable but a language improvement using a native speaker is 
recommended.
-Some minor editorial issue, like enhancing the plots quality, the equations, 
etc...
-Many reference are with *incomplete* bibliographic information (like lack of 
publication venue, for instance). This must be corrected.

In summery, it a well prepared paper.


--- REVIEW 2 -
PAPER: 3
TITLE: Using WikiData to create a multi-lingual multi-dialectal dictionary for 
Arabic dialects
AUTHORS: Houcemeddine Turki, Denny Vrandečić, Helmi Hamdi and Imed Adel

Overall evaluation: 0 

Re: [Wiki-research-l] ground truth for section alignment across languages

2017-08-28 Thread Gerard Meijssen
Hoi,
Sorry to state the obvious (for me) .. We datamine Wikipedias for
statements in Wikipedia. Consequently much information that could be /
should be in an article (in any and all languages) is reflected by
Wikidata. There is much that is not found in every language and information
on some subjects can easily be provided from Wikidata as a list (think
awards, books published etc). The good news is that Wikidata will provide
lists for this purpose. For all other topics like date of death / birth and
place of death / birth where people studied etc you have the benefit of
existing articles in a Wikipedia and the work done at Wikidata.

Hope this helps.
Thanks,
  GerardM

On 24 August 2017 at 19:56, Leila Zia  wrote:

> Hi all,
>
> ==Question==
> Do you know of a dataset we can use as ground truth for aligning
> sections of one article in two languages? I'm thinking a tool such as
> Content Translation may capture this data somewhere, or there may be
> some other community initiative that has matched a subset of the
> sections between two versions of one article in two languages. Any
> insights/directions is appreciated. :) I'm not going to worry about
> what language pairs we do have this dataset in right now, the first
> question is: do we have anything? :)
>
> ==Context==
> As part of the research we are doing to build recommendation systems
> that can recommend sections (or templates) for already existing
> Wikipedia articles, we are looking at the problem of section alignment
> between languages, i.e., given two languages x and y and two version
> of article a in these two languages, can an algorithm (with relatively
> high accuracy) tell us which section in the article in language x
> correspond to which other section in the article in language y?
>
> Thanks,
> Leila
>
> --
> Leila Zia
> Senior Research Scientist
> Wikimedia Foundation
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] ground truth for section alignment across languages

2017-08-28 Thread Scott Hale
Dear Leila,

==Question==
> Do you know of a dataset we can use as ground truth for aligning
> sections of one article in two languages?
>

This question is super interesting to me. I am not aware of any ground
truth data, but could imagine trying to build some from
[[Template:Translated_page]]. At least on enwiki it has a "section"
parameter that is to be set:

> If the inserted translation is contained in one section of the target
> page, insert its name here. (A direct link to that section will be created.)
>
It also has a "version" parameter, and it might be possible to identify
cases where a section was added to the source after the translation was
made. This could then become a corpus to "learn the missing section". I
guess something similar could be done with articles created with the
Content Translation tool where a section was later added to the source.


>
> ==Context==
> As part of the research we are doing to build recommendation systems
> that can recommend sections (or templates) for already existing
> Wikipedia articles, we are looking at the problem of section alignment
> between languages, i.e., given two languages x and y and two version
> of article a in these two languages, can an algorithm (with relatively
> high accuracy) tell us which section in the article in language x
> correspond to which other section in the article in language y?
>


While I am not aware of research on Wikipedia section alignment per se,
there is a good amount of work on sentence alignment and building
parallel/bilingual corpora that seems relevant to to this [1-4]. I can
imagine an approach that would look for near matches across two Wikipedia
articles in different languages and then examine the distribution of these
sentences within sections to see if one or more sections looked to be
omitted. One challenge is the sub-article problem [5], which of course you
are already familiar. I wonder whether computing the overlap in article
links a la Omnipedia [6] and then examining the distribution of these
between sections would work and be much less computationally intensive. I
fear, however, that this could over identify sections further down an
article as missing given (I believe) that article links are often
concentrated towards the beginning of an article.

[1] Learning Joint Multilingual Sentence Representations with Neural
Machine Translation. 2017
https://arxiv.org/abs/1704.04154

[2] Fast and Accurate Sentence Alignment of Bilingual Corpora. 2002.
https://www.microsoft.com/en-us/research/publication/fast-and-accurate-sentence-alignment-of-bilingual-corpora/

[3] Large scale parallel document mining for machine translation. 2010.
http://www.aclweb.org/anthology/C/C10/C10-1124.pdf

[4] Building Bilingual Parallel Corpora Based on Wikipedia. 2010.
http://www.academia.edu/download/39073036/building_bilingual_parallel_corpora.pdf

[5] Problematizing and Addressing the Article-as-Concept Assumption in
Wikipedia. 2017
http://www.brenthecht.com/publications/cscw17_subarticles.pdf

[6] Omnipedia: Bridging the Wikipedia Language Gap. 2012.
http://www.brenthecht.com/papers/bhecht_CHI2012_omnipedia.pdf

Best wishes,
Scott

-- 
Dr Scott Hale
Senior Data Scientist
Oxford Internet Institute, University of Oxford
Turing Fellow, Alan Turing Institute
http://www.scotthale.net/
scott.h...@oii.ox.ac.uk
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] feedback appreciated

2017-08-28 Thread Robert West
Hi Caroline,

The premise of this article seems to be that everyone needs to solve either
the immediate or the distant problems. No one (and certainly not Elon Musk)
would argue that there are no immediate problems with AI, but why should
that keep us from thinking ahead?

In a company, too, you have plumbers who fix the bathrooms today and
strategists who plan business 20 years ahead. We need both. If the plumbers
didn't worry about the immediate problems, the strategists couldn't do
their jobs. If the strategists didn't worry about the distant problems, the
plumbers might not have jobs down the road.

Also, your argument stands on sandy ground from paragraph one, where you
claim that AI will never threaten humanity, without giving the inkling of
an argument.

Bob

On Fri, Aug 25, 2017 at 6:50 PM, Caroline Sinders 
wrote:

> hi all,
> i just started a column with fast co and wrote an article about elon musk's
> AI panic.
>
> https://www.fastcodesign.com/90137818/dear-elon-forget-
> killer-robots-heres-what-you-should-really-worry-about
>
> would love some feedback :)
>
> best,
> caroline
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l