Re: [WikimediaMobile] What people think about Wikidata descriptions in search on mobile web beta, and a question about arbitrary access of Wikidata data

2015-08-19 Thread S Page
My hero Magnus Manske noted
 The situation, for most languages, is this: No manual descriptions, on
basically any item. And that will remain so for the (near) future.
Automatic descriptions can change that, literally over night, with a little
programming and linguistic effort. ... This is a force multiplier of
volunteer effort with a factor of 250. And we ignore that ... why, exactly?

The potential of AutoDesc is so enormous to attain a world in which every
single person on the planet is given free access to the sum of all human
knowledge that it should be the entire movement's top project. I nearly
wrote a career-limiting e-mail rant to WMF-all on that subject last night.

In this e-mail thread we're talking about it in the limited scope of Wikidata
descriptions in search on mobile web beta, where the mobile client
presents a useful signpost for *existing* articles, in an emblem on lead
images and in search results. That's important but we're missing the forest
for a single tree when discussing such a transformative technology. If only
WMF had a CTO for such things [1].

Anyway, returning to this specific use case:
* Nobody is saying store the AutoDesc in the Wikidata per-language
description field.
* Nobody is saying show the AutoDesc if there is an existing Wikidata
description.
* Is anybody against showing AutoDesc, after some refinement and
productization [2], in these mobile use cases when there is no Wikidata
description?
* I propose the AutoDesc as a quality bar that any edit to a Wikidata
description needs to improve on (but again that's a topic beyond this mail
thread).

Yours, excitedly,
=S Page

[1] http://grnh.se/30f54b , apply today!
[2] https://bitbucket.org/magnusmanske/autodesc/src/HEAD/www/js/?at=master
and https://github.com/dbrant/wikidata-autodesc .  It's already a nodejs
service, can we append oid and declare victory ? :-)

On Wed, Aug 19, 2015 at 2:57 AM, Magnus Manske magnusman...@googlemail.com
wrote:

 Oh, and as for examples, random-paging just got me this:

 https://en.wikipedia.org/wiki/Jules_Malou

 Manual description: Belgian politician

 Automatic description:  Belgian politician and lawyer, Prime Minister of
 Belgium, and member of the Chamber of Representatives of Belgium
 (1810–1886) ♂

 I know which one I'd prefer...


 On Wed, Aug 19, 2015 at 10:50 AM Magnus Manske 
 magnusman...@googlemail.com wrote:

 Thank you Dmitry! Well phrased and to the point!

 As for templating, that might be the worst of both worlds; without the
 flexibility and over-time improvement of automatic descriptions, but making
 it harder for people to enter (compared to free-style text). We have a
 Visual Editor on Wikipedia for a reason :-)



 On Wed, Aug 19, 2015 at 4:07 AM Dmitry Brant dbr...@wikimedia.org
 wrote:

 My thoughts, as ever(!), are as follows:

 - The tool that generates the descriptions deserves a lot more
 development. Magnus' tool is very much a prototype, and represents a tiny
 glimpse of what's possible. Looking at its current output is a straw man.
 - Auto-generated descriptions work for current articles, and *all
 future articles*. They automatically adapt to updated data. They
 automatically become more accurate as new data is added.
 - When you edit the descriptions yourself, you're not really making a
 meaningful contribution to the *data* that underpins the given Wikidata
 entry; i.e. you're not contributing any new information. You're simply
 paraphrasing the first sentence or two of the Wikipedia article. That can't
 possibly be a productive use of contributors' time.

 As for Brian's suggestion:
 It would be a step forward; we can even invent a whole template-type
 syntax for transcluding bits of actual data into the description. But IMO,
 that kind of effort would still be better spent on fully-automatic
 descriptions, because that's the ideal that semi-automatic descriptions can
 only approach.


 On Tue, Aug 18, 2015 at 10:36 PM, Brian Gerstle bgers...@wikimedia.org
 wrote:

 Could there be a way to have our nicely curated description cake and
 eat it too? For example, interpolating data into the description and/or
 marking data points which are referenced in the description (so as to mark
 it as outdated when they change)?

 I appreciate the potential benefits of generated descriptions (and
 other things), but Monte's examples might have swayed me towards human
 curated—when available.

 On Tuesday, August 18, 2015, Monte Hurd mh...@wikimedia.org wrote:

 Ok, so I just did what I proposed. I went to random enwiki articles
 and described the first ten I found which didn't already have 
 descriptions:


 - Courage Under Fire, *1996 film about a Gulf War friendly-fire
 incident*

 - Pebasiconcha immanis, *largest known species of land snail,
 extinct*

 - List of Kenyan writers, *notable Kenyan authors*

 - Solar eclipse of December 14, 1917, *annular eclipse which lasted
 77 seconds*

 - Natchaug Forest Lumber Shed, *historic Civilian Conservation
 Corps 

Re: [WikimediaMobile] What people think about Wikidata descriptions in search on mobile web beta, and a question about arbitrary access of Wikidata data

2015-08-19 Thread Magnus Manske
On Wed, Aug 19, 2015 at 11:19 PM Monte Hurd mh...@wikimedia.org wrote:

 No manual descriptions, on basically any item. And that will remain so for
 the (near) future. Automatic descriptions can change that, literally over
 night, with a little programming and linguistic effort. ... This is a
 force multiplier of volunteer effort with a factor of 250. And we ignore
 that ... why, exactly?


 Not ignoring. In fact, if the auto-generated descriptions near the quality
 of human curated descriptions, I'm totally and wholeheartedly onboard that
 their use should be strongly considered.

 I just disagree that closing the quality gap will involve little
 programming and linguistic effort. I lean more toward massive programming
 and linguistic effort end of the spectrum.

 Specifically, I think it will take massive effort to make the
 auto-generated descriptions so good that an average person would say, hey
 these auto generated descriptions are better than the human curated
 descriptions in the examples I posted.

 You are confusing (in the literal meaning of the word, fusing together)
several issues into one here, which you then call better. I see at least
five distinct types of better:

1. A description exists, vs. it does not. In that aspect, automatic
descriptions will always be better than manual ones.

2. One description is more complete than the other. From what I see in
random examples, this is already the case for many biographical items that
have a lot of statements. I have actually considered cutting them back a
little, because even these short descriptions can get quite extensive.

3. Context-aware, specifically, the context where the description is shown.
This one goes to the automatic descriptions. AutoDesc already can generate
plain text, links to Wikidata, links to a specific Wikipedia where there
are articles, and use plain text/redlinks/Wikidata links otherwise. It can
generate Wikitext, with some infoboxes. It could easily generate HTML
blurbs with a thumbnail if there is an image, and so on. This if contrasted
with plain text for manual descriptions.

4. Linguistic/style. Manual descriptions CAN be better phrased than
automatic ones, but can also be worse. Automatic descriptions are
unimaginative, but consistent. Here is where I probably beg to differ from
most other people on this thread: I firmly believe that a description, even
if it is slightly wrong grammatically, is preferable to no description, as
long as humans still can understand what is meant. If the German
description gets the gender of moon wrong, so what? (I don't think it
does, but just for the sake of argument) Eventually, someone will implement
a fix for that. Maybe we'll have gender for things per language as
statements at some point, which would be useful beyond autodesc.

5. To the point. That is where manual descriptions have their only
advantage in the long run. Even from a lot of statements, it is hard for an
algorithm to figure out why exactly that person, that thing, that event are
important. Sometime it is something obscure, something that does not fit
well into statements, or is hidden among them. And there, and only there,
do manual descriptions make sense, as I have always maintained.

I am well aware of the limitations of automatic descriptions. I can also
see that perfection will never be reached, that the algorithms will never
be finished.

Like Wikipedia.
___
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l


Re: [WikimediaMobile] What people think about Wikidata descriptions in search on mobile web beta, and a question about arbitrary access of Wikidata data

2015-08-19 Thread Monte Hurd
Those were literally the first 10 random articles I encountered which
didn't have descriptions.


The tool that generates the descriptions deserves a lot more development.
 Magnus' tool is very much a prototype, and represents a tiny glimpse of
 what's possible. Looking at its current output is a straw man.


It's not a straw man at all - it's a baseline to move the discussion away
from the abstract. We need to start looking at real examples.

One of my main concerns is a lot more development is actually an
understatement as many of the optimizations will be language dependent.


On Wed, Aug 19, 2015 at 2:57 AM, Magnus Manske magnusman...@googlemail.com
wrote:

 Oh, and as for examples, random-paging just got me this:

 https://en.wikipedia.org/wiki/Jules_Malou

 Manual description: Belgian politician

 Automatic description:  Belgian politician and lawyer, Prime Minister of
 Belgium, and member of the Chamber of Representatives of Belgium
 (1810–1886) ♂

 I know which one I'd prefer...


 On Wed, Aug 19, 2015 at 10:50 AM Magnus Manske 
 magnusman...@googlemail.com wrote:

 Thank you Dmitry! Well phrased and to the point!

 As for templating, that might be the worst of both worlds; without the
 flexibility and over-time improvement of automatic descriptions, but making
 it harder for people to enter (compared to free-style text). We have a
 Visual Editor on Wikipedia for a reason :-)



 On Wed, Aug 19, 2015 at 4:07 AM Dmitry Brant dbr...@wikimedia.org
 wrote:

 My thoughts, as ever(!), are as follows:

 - The tool that generates the descriptions deserves a lot more
 development. Magnus' tool is very much a prototype, and represents a tiny
 glimpse of what's possible. Looking at its current output is a straw man.
 - Auto-generated descriptions work for current articles, and *all
 future articles*. They automatically adapt to updated data. They
 automatically become more accurate as new data is added.
 - When you edit the descriptions yourself, you're not really making a
 meaningful contribution to the *data* that underpins the given Wikidata
 entry; i.e. you're not contributing any new information. You're simply
 paraphrasing the first sentence or two of the Wikipedia article. That can't
 possibly be a productive use of contributors' time.

 As for Brian's suggestion:
 It would be a step forward; we can even invent a whole template-type
 syntax for transcluding bits of actual data into the description. But IMO,
 that kind of effort would still be better spent on fully-automatic
 descriptions, because that's the ideal that semi-automatic descriptions can
 only approach.


 On Tue, Aug 18, 2015 at 10:36 PM, Brian Gerstle bgers...@wikimedia.org
 wrote:

 Could there be a way to have our nicely curated description cake and
 eat it too? For example, interpolating data into the description and/or
 marking data points which are referenced in the description (so as to mark
 it as outdated when they change)?

 I appreciate the potential benefits of generated descriptions (and
 other things), but Monte's examples might have swayed me towards human
 curated—when available.

 On Tuesday, August 18, 2015, Monte Hurd mh...@wikimedia.org wrote:

 Ok, so I just did what I proposed. I went to random enwiki articles
 and described the first ten I found which didn't already have 
 descriptions:


 - Courage Under Fire, *1996 film about a Gulf War friendly-fire
 incident*

 - Pebasiconcha immanis, *largest known species of land snail,
 extinct*

 - List of Kenyan writers, *notable Kenyan authors*

 - Solar eclipse of December 14, 1917, *annular eclipse which lasted
 77 seconds*

 - Natchaug Forest Lumber Shed, *historic Civilian Conservation
 Corps post-and-beam building*

 - Sun of Jamaica (album), *debut 1980 studio album by Goombay Dance
 Band*

 - E-1027, *modernist villa in France by architect Eileen Gray*

 - Daingerfield State Park, *park in Morris County, Texas, USA,
 bordering Lake Daingerfield*

 - Todo Lo Que Soy-En Vivo, *2014 Live album by Mexican pop singer
 Fey*

 - 2009 UEFA Regions' Cup, *6th UEFA Regions' Cup, won by Castile
 and Leon*



 And here are the respective descriptions from Magnus' (quite
 excellent) autodesc.js:



 - Courage Under Fire, *1996 film by Edward Zwick, produced by John
 Davis and David T. Friendly from United States of America*

 - Pebasiconcha immanis, *species of Mollusca*

 - List of Kenyan writers, *Wikimedia list article*

 - Solar eclipse of December 14, 1917, *solar eclipse*

 - Natchaug Forest Lumber Shed, *Construction in Connecticut, United
 States of America*

 - Sun of Jamaica (album), *album*

 - E-1027, *villa in Roquebrune-Cap-Martin, France*

 - Daingerfield State Park, *state park and state park of a state of
 the United States in Texas, United States of America*

 - Todo Lo Que Soy-En Vivo, *live album by Fey*

 - 2009 UEFA Regions' Cup, *none*



 Thoughts?

 Just trying to make my own bold assertions falsifiable :)



 On Tue, Aug 18, 2015 at 6:32 PM, Monte Hurd 

Re: [WikimediaMobile] What people think about Wikidata descriptions in search on mobile web beta, and a question about arbitrary access of Wikidata data

2015-08-19 Thread Monte Hurd

 No manual descriptions, on basically any item. And that will remain so for
 the (near) future. Automatic descriptions can change that, literally over
 night, with a little programming and linguistic effort. ... This is a
 force multiplier of volunteer effort with a factor of 250. And we ignore
 that ... why, exactly?


Not ignoring. In fact, if the auto-generated descriptions near the quality
of human curated descriptions, I'm totally and wholeheartedly onboard that
their use should be strongly considered.

I just disagree that closing the quality gap will involve little
programming and linguistic effort. I lean more toward massive programming
and linguistic effort end of the spectrum.

Specifically, I think it will take massive effort to make the
auto-generated descriptions so good that an average person would say, hey
these auto generated descriptions are better than the human curated
descriptions in the examples I posted.

But I may, of course, be wrong!

On Wed, Aug 19, 2015 at 1:27 PM, S Page sp...@wikimedia.org wrote:

 My hero Magnus Manske noted
  The situation, for most languages, is this: No manual descriptions, on
 basically any item. And that will remain so for the (near) future.
 Automatic descriptions can change that, literally over night, with a little
 programming and linguistic effort. ... This is a force multiplier of
 volunteer effort with a factor of 250. And we ignore that ... why, exactly?

 The potential of AutoDesc is so enormous to attain a world in which every
 single person on the planet is given free access to the sum of all human
 knowledge that it should be the entire movement's top project. I nearly
 wrote a career-limiting e-mail rant to WMF-all on that subject last night.

 In this e-mail thread we're talking about it in the limited scope of Wikidata
 descriptions in search on mobile web beta, where the mobile client
 presents a useful signpost for *existing* articles, in an emblem on lead
 images and in search results. That's important but we're missing the forest
 for a single tree when discussing such a transformative technology. If only
 WMF had a CTO for such things [1].

 Anyway, returning to this specific use case:
 * Nobody is saying store the AutoDesc in the Wikidata per-language
 description field.
 * Nobody is saying show the AutoDesc if there is an existing Wikidata
 description.
 * Is anybody against showing AutoDesc, after some refinement and
 productization [2], in these mobile use cases when there is no Wikidata
 description?
 * I propose the AutoDesc as a quality bar that any edit to a Wikidata
 description needs to improve on (but again that's a topic beyond this mail
 thread).

 Yours, excitedly,
 =S Page

 [1] http://grnh.se/30f54b , apply today!
 [2] https://bitbucket.org/magnusmanske/autodesc/src/HEAD/www/js/?at=master
 and https://github.com/dbrant/wikidata-autodesc .  It's already a nodejs
 service, can we append oid and declare victory ? :-)

 On Wed, Aug 19, 2015 at 2:57 AM, Magnus Manske 
 magnusman...@googlemail.com wrote:

 Oh, and as for examples, random-paging just got me this:

 https://en.wikipedia.org/wiki/Jules_Malou

 Manual description: Belgian politician

 Automatic description:  Belgian politician and lawyer, Prime Minister of
 Belgium, and member of the Chamber of Representatives of Belgium
 (1810–1886) ♂

 I know which one I'd prefer...


 On Wed, Aug 19, 2015 at 10:50 AM Magnus Manske 
 magnusman...@googlemail.com wrote:

 Thank you Dmitry! Well phrased and to the point!

 As for templating, that might be the worst of both worlds; without the
 flexibility and over-time improvement of automatic descriptions, but making
 it harder for people to enter (compared to free-style text). We have a
 Visual Editor on Wikipedia for a reason :-)



 On Wed, Aug 19, 2015 at 4:07 AM Dmitry Brant dbr...@wikimedia.org
 wrote:

 My thoughts, as ever(!), are as follows:

 - The tool that generates the descriptions deserves a lot more
 development. Magnus' tool is very much a prototype, and represents a tiny
 glimpse of what's possible. Looking at its current output is a straw man.
 - Auto-generated descriptions work for current articles, and *all
 future articles*. They automatically adapt to updated data. They
 automatically become more accurate as new data is added.
 - When you edit the descriptions yourself, you're not really making a
 meaningful contribution to the *data* that underpins the given Wikidata
 entry; i.e. you're not contributing any new information. You're simply
 paraphrasing the first sentence or two of the Wikipedia article. That can't
 possibly be a productive use of contributors' time.

 As for Brian's suggestion:
 It would be a step forward; we can even invent a whole template-type
 syntax for transcluding bits of actual data into the description. But IMO,
 that kind of effort would still be better spent on fully-automatic
 descriptions, because that's the ideal that semi-automatic descriptions can
 only approach.


 On Tue, 

Re: [WikimediaMobile] What people think about Wikidata descriptions in search on mobile web beta, and a question about arbitrary access of Wikidata data

2015-08-19 Thread Monte Hurd
True about algorithms never being finished, but aren't we essentially
stuck with the first run output, unless I misunderstand how you envision
this working?

(assuming you don't want to over-write non-blank descriptions the next time
you improve and re-run the process)
___
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l


Re: [WikimediaMobile] What people think about Wikidata descriptions in search on mobile web beta, and a question about arbitrary access of Wikidata data

2015-08-19 Thread Magnus Manske
Oh, and as for examples, random-paging just got me this:

https://en.wikipedia.org/wiki/Jules_Malou

Manual description: Belgian politician

Automatic description:  Belgian politician and lawyer, Prime Minister of
Belgium, and member of the Chamber of Representatives of Belgium
(1810–1886) ♂

I know which one I'd prefer...


On Wed, Aug 19, 2015 at 10:50 AM Magnus Manske magnusman...@googlemail.com
wrote:

 Thank you Dmitry! Well phrased and to the point!

 As for templating, that might be the worst of both worlds; without the
 flexibility and over-time improvement of automatic descriptions, but making
 it harder for people to enter (compared to free-style text). We have a
 Visual Editor on Wikipedia for a reason :-)



 On Wed, Aug 19, 2015 at 4:07 AM Dmitry Brant dbr...@wikimedia.org wrote:

 My thoughts, as ever(!), are as follows:

 - The tool that generates the descriptions deserves a lot more
 development. Magnus' tool is very much a prototype, and represents a tiny
 glimpse of what's possible. Looking at its current output is a straw man.
 - Auto-generated descriptions work for current articles, and *all future
 articles*. They automatically adapt to updated data. They automatically
 become more accurate as new data is added.
 - When you edit the descriptions yourself, you're not really making a
 meaningful contribution to the *data* that underpins the given Wikidata
 entry; i.e. you're not contributing any new information. You're simply
 paraphrasing the first sentence or two of the Wikipedia article. That can't
 possibly be a productive use of contributors' time.

 As for Brian's suggestion:
 It would be a step forward; we can even invent a whole template-type
 syntax for transcluding bits of actual data into the description. But IMO,
 that kind of effort would still be better spent on fully-automatic
 descriptions, because that's the ideal that semi-automatic descriptions can
 only approach.


 On Tue, Aug 18, 2015 at 10:36 PM, Brian Gerstle bgers...@wikimedia.org
 wrote:

 Could there be a way to have our nicely curated description cake and
 eat it too? For example, interpolating data into the description and/or
 marking data points which are referenced in the description (so as to mark
 it as outdated when they change)?

 I appreciate the potential benefits of generated descriptions (and other
 things), but Monte's examples might have swayed me towards human
 curated—when available.

 On Tuesday, August 18, 2015, Monte Hurd mh...@wikimedia.org wrote:

 Ok, so I just did what I proposed. I went to random enwiki articles and
 described the first ten I found which didn't already have descriptions:


 - Courage Under Fire, *1996 film about a Gulf War friendly-fire
 incident*

 - Pebasiconcha immanis, *largest known species of land snail,
 extinct*

 - List of Kenyan writers, *notable Kenyan authors*

 - Solar eclipse of December 14, 1917, *annular eclipse which lasted
 77 seconds*

 - Natchaug Forest Lumber Shed, *historic Civilian Conservation Corps
 post-and-beam building*

 - Sun of Jamaica (album), *debut 1980 studio album by Goombay Dance
 Band*

 - E-1027, *modernist villa in France by architect Eileen Gray*

 - Daingerfield State Park, *park in Morris County, Texas, USA,
 bordering Lake Daingerfield*

 - Todo Lo Que Soy-En Vivo, *2014 Live album by Mexican pop singer
 Fey*

 - 2009 UEFA Regions' Cup, *6th UEFA Regions' Cup, won by Castile and
 Leon*



 And here are the respective descriptions from Magnus' (quite excellent)
 autodesc.js:



 - Courage Under Fire, *1996 film by Edward Zwick, produced by John
 Davis and David T. Friendly from United States of America*

 - Pebasiconcha immanis, *species of Mollusca*

 - List of Kenyan writers, *Wikimedia list article*

 - Solar eclipse of December 14, 1917, *solar eclipse*

 - Natchaug Forest Lumber Shed, *Construction in Connecticut, United
 States of America*

 - Sun of Jamaica (album), *album*

 - E-1027, *villa in Roquebrune-Cap-Martin, France*

 - Daingerfield State Park, *state park and state park of a state of
 the United States in Texas, United States of America*

 - Todo Lo Que Soy-En Vivo, *live album by Fey*

 - 2009 UEFA Regions' Cup, *none*



 Thoughts?

 Just trying to make my own bold assertions falsifiable :)



 On Tue, Aug 18, 2015 at 6:32 PM, Monte Hurd mh...@wikimedia.org
 wrote:

 The whole human-vs-extracted descriptions quality question could be
 fairly easy to test I think:

 - Pick, some number of articles at random.
 - Run them through a description extraction script.
 - Have a human describe the same articles with, say, the app interface
 I demo'ed.

 If nothing else this exercise could perhaps make what's thus far been
 a wildly abstract discussion more concrete.




 On Tue, Aug 18, 2015 at 6:17 PM, Monte Hurd mh...@wikimedia.org
 wrote:

 If having the most elegant description extraction mechanism was the
 goal I would totally agree ;)

 On Tue, Aug 18, 2015 at 5:19 PM, Dmitry Brant dbr...@wikimedia.org
 wrote: