Re: [WikimediaMobile] What people think about Wikidata descriptions in search on mobile web beta, and a question about arbitrary access of Wikidata data
My hero Magnus Manske noted The situation, for most languages, is this: No manual descriptions, on basically any item. And that will remain so for the (near) future. Automatic descriptions can change that, literally over night, with a little programming and linguistic effort. ... This is a force multiplier of volunteer effort with a factor of 250. And we ignore that ... why, exactly? The potential of AutoDesc is so enormous to attain a world in which every single person on the planet is given free access to the sum of all human knowledge that it should be the entire movement's top project. I nearly wrote a career-limiting e-mail rant to WMF-all on that subject last night. In this e-mail thread we're talking about it in the limited scope of Wikidata descriptions in search on mobile web beta, where the mobile client presents a useful signpost for *existing* articles, in an emblem on lead images and in search results. That's important but we're missing the forest for a single tree when discussing such a transformative technology. If only WMF had a CTO for such things [1]. Anyway, returning to this specific use case: * Nobody is saying store the AutoDesc in the Wikidata per-language description field. * Nobody is saying show the AutoDesc if there is an existing Wikidata description. * Is anybody against showing AutoDesc, after some refinement and productization [2], in these mobile use cases when there is no Wikidata description? * I propose the AutoDesc as a quality bar that any edit to a Wikidata description needs to improve on (but again that's a topic beyond this mail thread). Yours, excitedly, =S Page [1] http://grnh.se/30f54b , apply today! [2] https://bitbucket.org/magnusmanske/autodesc/src/HEAD/www/js/?at=master and https://github.com/dbrant/wikidata-autodesc . It's already a nodejs service, can we append oid and declare victory ? :-) On Wed, Aug 19, 2015 at 2:57 AM, Magnus Manske magnusman...@googlemail.com wrote: Oh, and as for examples, random-paging just got me this: https://en.wikipedia.org/wiki/Jules_Malou Manual description: Belgian politician Automatic description: Belgian politician and lawyer, Prime Minister of Belgium, and member of the Chamber of Representatives of Belgium (1810–1886) ♂ I know which one I'd prefer... On Wed, Aug 19, 2015 at 10:50 AM Magnus Manske magnusman...@googlemail.com wrote: Thank you Dmitry! Well phrased and to the point! As for templating, that might be the worst of both worlds; without the flexibility and over-time improvement of automatic descriptions, but making it harder for people to enter (compared to free-style text). We have a Visual Editor on Wikipedia for a reason :-) On Wed, Aug 19, 2015 at 4:07 AM Dmitry Brant dbr...@wikimedia.org wrote: My thoughts, as ever(!), are as follows: - The tool that generates the descriptions deserves a lot more development. Magnus' tool is very much a prototype, and represents a tiny glimpse of what's possible. Looking at its current output is a straw man. - Auto-generated descriptions work for current articles, and *all future articles*. They automatically adapt to updated data. They automatically become more accurate as new data is added. - When you edit the descriptions yourself, you're not really making a meaningful contribution to the *data* that underpins the given Wikidata entry; i.e. you're not contributing any new information. You're simply paraphrasing the first sentence or two of the Wikipedia article. That can't possibly be a productive use of contributors' time. As for Brian's suggestion: It would be a step forward; we can even invent a whole template-type syntax for transcluding bits of actual data into the description. But IMO, that kind of effort would still be better spent on fully-automatic descriptions, because that's the ideal that semi-automatic descriptions can only approach. On Tue, Aug 18, 2015 at 10:36 PM, Brian Gerstle bgers...@wikimedia.org wrote: Could there be a way to have our nicely curated description cake and eat it too? For example, interpolating data into the description and/or marking data points which are referenced in the description (so as to mark it as outdated when they change)? I appreciate the potential benefits of generated descriptions (and other things), but Monte's examples might have swayed me towards human curated—when available. On Tuesday, August 18, 2015, Monte Hurd mh...@wikimedia.org wrote: Ok, so I just did what I proposed. I went to random enwiki articles and described the first ten I found which didn't already have descriptions: - Courage Under Fire, *1996 film about a Gulf War friendly-fire incident* - Pebasiconcha immanis, *largest known species of land snail, extinct* - List of Kenyan writers, *notable Kenyan authors* - Solar eclipse of December 14, 1917, *annular eclipse which lasted 77 seconds* - Natchaug Forest Lumber Shed, *historic Civilian Conservation Corps
Re: [WikimediaMobile] What people think about Wikidata descriptions in search on mobile web beta, and a question about arbitrary access of Wikidata data
On Wed, Aug 19, 2015 at 11:19 PM Monte Hurd mh...@wikimedia.org wrote: No manual descriptions, on basically any item. And that will remain so for the (near) future. Automatic descriptions can change that, literally over night, with a little programming and linguistic effort. ... This is a force multiplier of volunteer effort with a factor of 250. And we ignore that ... why, exactly? Not ignoring. In fact, if the auto-generated descriptions near the quality of human curated descriptions, I'm totally and wholeheartedly onboard that their use should be strongly considered. I just disagree that closing the quality gap will involve little programming and linguistic effort. I lean more toward massive programming and linguistic effort end of the spectrum. Specifically, I think it will take massive effort to make the auto-generated descriptions so good that an average person would say, hey these auto generated descriptions are better than the human curated descriptions in the examples I posted. You are confusing (in the literal meaning of the word, fusing together) several issues into one here, which you then call better. I see at least five distinct types of better: 1. A description exists, vs. it does not. In that aspect, automatic descriptions will always be better than manual ones. 2. One description is more complete than the other. From what I see in random examples, this is already the case for many biographical items that have a lot of statements. I have actually considered cutting them back a little, because even these short descriptions can get quite extensive. 3. Context-aware, specifically, the context where the description is shown. This one goes to the automatic descriptions. AutoDesc already can generate plain text, links to Wikidata, links to a specific Wikipedia where there are articles, and use plain text/redlinks/Wikidata links otherwise. It can generate Wikitext, with some infoboxes. It could easily generate HTML blurbs with a thumbnail if there is an image, and so on. This if contrasted with plain text for manual descriptions. 4. Linguistic/style. Manual descriptions CAN be better phrased than automatic ones, but can also be worse. Automatic descriptions are unimaginative, but consistent. Here is where I probably beg to differ from most other people on this thread: I firmly believe that a description, even if it is slightly wrong grammatically, is preferable to no description, as long as humans still can understand what is meant. If the German description gets the gender of moon wrong, so what? (I don't think it does, but just for the sake of argument) Eventually, someone will implement a fix for that. Maybe we'll have gender for things per language as statements at some point, which would be useful beyond autodesc. 5. To the point. That is where manual descriptions have their only advantage in the long run. Even from a lot of statements, it is hard for an algorithm to figure out why exactly that person, that thing, that event are important. Sometime it is something obscure, something that does not fit well into statements, or is hidden among them. And there, and only there, do manual descriptions make sense, as I have always maintained. I am well aware of the limitations of automatic descriptions. I can also see that perfection will never be reached, that the algorithms will never be finished. Like Wikipedia. ___ Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Re: [WikimediaMobile] What people think about Wikidata descriptions in search on mobile web beta, and a question about arbitrary access of Wikidata data
Those were literally the first 10 random articles I encountered which didn't have descriptions. The tool that generates the descriptions deserves a lot more development. Magnus' tool is very much a prototype, and represents a tiny glimpse of what's possible. Looking at its current output is a straw man. It's not a straw man at all - it's a baseline to move the discussion away from the abstract. We need to start looking at real examples. One of my main concerns is a lot more development is actually an understatement as many of the optimizations will be language dependent. On Wed, Aug 19, 2015 at 2:57 AM, Magnus Manske magnusman...@googlemail.com wrote: Oh, and as for examples, random-paging just got me this: https://en.wikipedia.org/wiki/Jules_Malou Manual description: Belgian politician Automatic description: Belgian politician and lawyer, Prime Minister of Belgium, and member of the Chamber of Representatives of Belgium (1810–1886) ♂ I know which one I'd prefer... On Wed, Aug 19, 2015 at 10:50 AM Magnus Manske magnusman...@googlemail.com wrote: Thank you Dmitry! Well phrased and to the point! As for templating, that might be the worst of both worlds; without the flexibility and over-time improvement of automatic descriptions, but making it harder for people to enter (compared to free-style text). We have a Visual Editor on Wikipedia for a reason :-) On Wed, Aug 19, 2015 at 4:07 AM Dmitry Brant dbr...@wikimedia.org wrote: My thoughts, as ever(!), are as follows: - The tool that generates the descriptions deserves a lot more development. Magnus' tool is very much a prototype, and represents a tiny glimpse of what's possible. Looking at its current output is a straw man. - Auto-generated descriptions work for current articles, and *all future articles*. They automatically adapt to updated data. They automatically become more accurate as new data is added. - When you edit the descriptions yourself, you're not really making a meaningful contribution to the *data* that underpins the given Wikidata entry; i.e. you're not contributing any new information. You're simply paraphrasing the first sentence or two of the Wikipedia article. That can't possibly be a productive use of contributors' time. As for Brian's suggestion: It would be a step forward; we can even invent a whole template-type syntax for transcluding bits of actual data into the description. But IMO, that kind of effort would still be better spent on fully-automatic descriptions, because that's the ideal that semi-automatic descriptions can only approach. On Tue, Aug 18, 2015 at 10:36 PM, Brian Gerstle bgers...@wikimedia.org wrote: Could there be a way to have our nicely curated description cake and eat it too? For example, interpolating data into the description and/or marking data points which are referenced in the description (so as to mark it as outdated when they change)? I appreciate the potential benefits of generated descriptions (and other things), but Monte's examples might have swayed me towards human curated—when available. On Tuesday, August 18, 2015, Monte Hurd mh...@wikimedia.org wrote: Ok, so I just did what I proposed. I went to random enwiki articles and described the first ten I found which didn't already have descriptions: - Courage Under Fire, *1996 film about a Gulf War friendly-fire incident* - Pebasiconcha immanis, *largest known species of land snail, extinct* - List of Kenyan writers, *notable Kenyan authors* - Solar eclipse of December 14, 1917, *annular eclipse which lasted 77 seconds* - Natchaug Forest Lumber Shed, *historic Civilian Conservation Corps post-and-beam building* - Sun of Jamaica (album), *debut 1980 studio album by Goombay Dance Band* - E-1027, *modernist villa in France by architect Eileen Gray* - Daingerfield State Park, *park in Morris County, Texas, USA, bordering Lake Daingerfield* - Todo Lo Que Soy-En Vivo, *2014 Live album by Mexican pop singer Fey* - 2009 UEFA Regions' Cup, *6th UEFA Regions' Cup, won by Castile and Leon* And here are the respective descriptions from Magnus' (quite excellent) autodesc.js: - Courage Under Fire, *1996 film by Edward Zwick, produced by John Davis and David T. Friendly from United States of America* - Pebasiconcha immanis, *species of Mollusca* - List of Kenyan writers, *Wikimedia list article* - Solar eclipse of December 14, 1917, *solar eclipse* - Natchaug Forest Lumber Shed, *Construction in Connecticut, United States of America* - Sun of Jamaica (album), *album* - E-1027, *villa in Roquebrune-Cap-Martin, France* - Daingerfield State Park, *state park and state park of a state of the United States in Texas, United States of America* - Todo Lo Que Soy-En Vivo, *live album by Fey* - 2009 UEFA Regions' Cup, *none* Thoughts? Just trying to make my own bold assertions falsifiable :) On Tue, Aug 18, 2015 at 6:32 PM, Monte Hurd
Re: [WikimediaMobile] What people think about Wikidata descriptions in search on mobile web beta, and a question about arbitrary access of Wikidata data
No manual descriptions, on basically any item. And that will remain so for the (near) future. Automatic descriptions can change that, literally over night, with a little programming and linguistic effort. ... This is a force multiplier of volunteer effort with a factor of 250. And we ignore that ... why, exactly? Not ignoring. In fact, if the auto-generated descriptions near the quality of human curated descriptions, I'm totally and wholeheartedly onboard that their use should be strongly considered. I just disagree that closing the quality gap will involve little programming and linguistic effort. I lean more toward massive programming and linguistic effort end of the spectrum. Specifically, I think it will take massive effort to make the auto-generated descriptions so good that an average person would say, hey these auto generated descriptions are better than the human curated descriptions in the examples I posted. But I may, of course, be wrong! On Wed, Aug 19, 2015 at 1:27 PM, S Page sp...@wikimedia.org wrote: My hero Magnus Manske noted The situation, for most languages, is this: No manual descriptions, on basically any item. And that will remain so for the (near) future. Automatic descriptions can change that, literally over night, with a little programming and linguistic effort. ... This is a force multiplier of volunteer effort with a factor of 250. And we ignore that ... why, exactly? The potential of AutoDesc is so enormous to attain a world in which every single person on the planet is given free access to the sum of all human knowledge that it should be the entire movement's top project. I nearly wrote a career-limiting e-mail rant to WMF-all on that subject last night. In this e-mail thread we're talking about it in the limited scope of Wikidata descriptions in search on mobile web beta, where the mobile client presents a useful signpost for *existing* articles, in an emblem on lead images and in search results. That's important but we're missing the forest for a single tree when discussing such a transformative technology. If only WMF had a CTO for such things [1]. Anyway, returning to this specific use case: * Nobody is saying store the AutoDesc in the Wikidata per-language description field. * Nobody is saying show the AutoDesc if there is an existing Wikidata description. * Is anybody against showing AutoDesc, after some refinement and productization [2], in these mobile use cases when there is no Wikidata description? * I propose the AutoDesc as a quality bar that any edit to a Wikidata description needs to improve on (but again that's a topic beyond this mail thread). Yours, excitedly, =S Page [1] http://grnh.se/30f54b , apply today! [2] https://bitbucket.org/magnusmanske/autodesc/src/HEAD/www/js/?at=master and https://github.com/dbrant/wikidata-autodesc . It's already a nodejs service, can we append oid and declare victory ? :-) On Wed, Aug 19, 2015 at 2:57 AM, Magnus Manske magnusman...@googlemail.com wrote: Oh, and as for examples, random-paging just got me this: https://en.wikipedia.org/wiki/Jules_Malou Manual description: Belgian politician Automatic description: Belgian politician and lawyer, Prime Minister of Belgium, and member of the Chamber of Representatives of Belgium (1810–1886) ♂ I know which one I'd prefer... On Wed, Aug 19, 2015 at 10:50 AM Magnus Manske magnusman...@googlemail.com wrote: Thank you Dmitry! Well phrased and to the point! As for templating, that might be the worst of both worlds; without the flexibility and over-time improvement of automatic descriptions, but making it harder for people to enter (compared to free-style text). We have a Visual Editor on Wikipedia for a reason :-) On Wed, Aug 19, 2015 at 4:07 AM Dmitry Brant dbr...@wikimedia.org wrote: My thoughts, as ever(!), are as follows: - The tool that generates the descriptions deserves a lot more development. Magnus' tool is very much a prototype, and represents a tiny glimpse of what's possible. Looking at its current output is a straw man. - Auto-generated descriptions work for current articles, and *all future articles*. They automatically adapt to updated data. They automatically become more accurate as new data is added. - When you edit the descriptions yourself, you're not really making a meaningful contribution to the *data* that underpins the given Wikidata entry; i.e. you're not contributing any new information. You're simply paraphrasing the first sentence or two of the Wikipedia article. That can't possibly be a productive use of contributors' time. As for Brian's suggestion: It would be a step forward; we can even invent a whole template-type syntax for transcluding bits of actual data into the description. But IMO, that kind of effort would still be better spent on fully-automatic descriptions, because that's the ideal that semi-automatic descriptions can only approach. On Tue,
Re: [WikimediaMobile] What people think about Wikidata descriptions in search on mobile web beta, and a question about arbitrary access of Wikidata data
True about algorithms never being finished, but aren't we essentially stuck with the first run output, unless I misunderstand how you envision this working? (assuming you don't want to over-write non-blank descriptions the next time you improve and re-run the process) ___ Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Re: [WikimediaMobile] What people think about Wikidata descriptions in search on mobile web beta, and a question about arbitrary access of Wikidata data
Oh, and as for examples, random-paging just got me this: https://en.wikipedia.org/wiki/Jules_Malou Manual description: Belgian politician Automatic description: Belgian politician and lawyer, Prime Minister of Belgium, and member of the Chamber of Representatives of Belgium (1810–1886) ♂ I know which one I'd prefer... On Wed, Aug 19, 2015 at 10:50 AM Magnus Manske magnusman...@googlemail.com wrote: Thank you Dmitry! Well phrased and to the point! As for templating, that might be the worst of both worlds; without the flexibility and over-time improvement of automatic descriptions, but making it harder for people to enter (compared to free-style text). We have a Visual Editor on Wikipedia for a reason :-) On Wed, Aug 19, 2015 at 4:07 AM Dmitry Brant dbr...@wikimedia.org wrote: My thoughts, as ever(!), are as follows: - The tool that generates the descriptions deserves a lot more development. Magnus' tool is very much a prototype, and represents a tiny glimpse of what's possible. Looking at its current output is a straw man. - Auto-generated descriptions work for current articles, and *all future articles*. They automatically adapt to updated data. They automatically become more accurate as new data is added. - When you edit the descriptions yourself, you're not really making a meaningful contribution to the *data* that underpins the given Wikidata entry; i.e. you're not contributing any new information. You're simply paraphrasing the first sentence or two of the Wikipedia article. That can't possibly be a productive use of contributors' time. As for Brian's suggestion: It would be a step forward; we can even invent a whole template-type syntax for transcluding bits of actual data into the description. But IMO, that kind of effort would still be better spent on fully-automatic descriptions, because that's the ideal that semi-automatic descriptions can only approach. On Tue, Aug 18, 2015 at 10:36 PM, Brian Gerstle bgers...@wikimedia.org wrote: Could there be a way to have our nicely curated description cake and eat it too? For example, interpolating data into the description and/or marking data points which are referenced in the description (so as to mark it as outdated when they change)? I appreciate the potential benefits of generated descriptions (and other things), but Monte's examples might have swayed me towards human curated—when available. On Tuesday, August 18, 2015, Monte Hurd mh...@wikimedia.org wrote: Ok, so I just did what I proposed. I went to random enwiki articles and described the first ten I found which didn't already have descriptions: - Courage Under Fire, *1996 film about a Gulf War friendly-fire incident* - Pebasiconcha immanis, *largest known species of land snail, extinct* - List of Kenyan writers, *notable Kenyan authors* - Solar eclipse of December 14, 1917, *annular eclipse which lasted 77 seconds* - Natchaug Forest Lumber Shed, *historic Civilian Conservation Corps post-and-beam building* - Sun of Jamaica (album), *debut 1980 studio album by Goombay Dance Band* - E-1027, *modernist villa in France by architect Eileen Gray* - Daingerfield State Park, *park in Morris County, Texas, USA, bordering Lake Daingerfield* - Todo Lo Que Soy-En Vivo, *2014 Live album by Mexican pop singer Fey* - 2009 UEFA Regions' Cup, *6th UEFA Regions' Cup, won by Castile and Leon* And here are the respective descriptions from Magnus' (quite excellent) autodesc.js: - Courage Under Fire, *1996 film by Edward Zwick, produced by John Davis and David T. Friendly from United States of America* - Pebasiconcha immanis, *species of Mollusca* - List of Kenyan writers, *Wikimedia list article* - Solar eclipse of December 14, 1917, *solar eclipse* - Natchaug Forest Lumber Shed, *Construction in Connecticut, United States of America* - Sun of Jamaica (album), *album* - E-1027, *villa in Roquebrune-Cap-Martin, France* - Daingerfield State Park, *state park and state park of a state of the United States in Texas, United States of America* - Todo Lo Que Soy-En Vivo, *live album by Fey* - 2009 UEFA Regions' Cup, *none* Thoughts? Just trying to make my own bold assertions falsifiable :) On Tue, Aug 18, 2015 at 6:32 PM, Monte Hurd mh...@wikimedia.org wrote: The whole human-vs-extracted descriptions quality question could be fairly easy to test I think: - Pick, some number of articles at random. - Run them through a description extraction script. - Have a human describe the same articles with, say, the app interface I demo'ed. If nothing else this exercise could perhaps make what's thus far been a wildly abstract discussion more concrete. On Tue, Aug 18, 2015 at 6:17 PM, Monte Hurd mh...@wikimedia.org wrote: If having the most elegant description extraction mechanism was the goal I would totally agree ;) On Tue, Aug 18, 2015 at 5:19 PM, Dmitry Brant dbr...@wikimedia.org wrote: