[WikimediaMobile] CommonsMetadata API returning HTML?

2014-12-08 Thread Dan Garry
Greetings, Multimedia Team!

*Background:* The Mobile Apps Team is working on a restyling of the way
content the first fold of content is presented in the Wikipedia app. You
can see this image http://i.imgur.com/dxqfJKd.png to see what this looks
like. Having a high-resolution image so prominently at the top of the page
will likely drive a lot of clicks, so we're working on a lightweight image
viewer to deal with file pages, which are poorly styled monstrosities on
the mobile app. We're going to use the CommonsMetadata API to help us out.
:-)

*Problem:* The CommonsMetadata API can sometimes return HTML [1]. Having
HTML in the API response is a bit problematic for us. Native apps make next
to no use of HTML when creating links or layouts, so we have to strip the
HTML from every API response, lest it be displayed as plaintext to the
user. In the short term this is fine, we can strip it and throw the
information away. But in the long run it'd be better if the API didn't
return HTML.

*Our ask: *Can the CommonsMetadata API please not return HTML in its
responses? :-)

Thanks,
Dan

[1]: Run this query
https://commons.wikimedia.org/w/api.php?action=queryprop=imageinfoformat=xmliiprop=extmetadataiilimit=10titles=File%3ACommon%20Kingfisher%20Alcedo%20atthis.jpg.
,
and look at artist key. The API response has an HTML link in it.

-- 
Dan Garry
Associate Product Manager, Mobile Apps
Wikimedia Foundation
___
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l


Re: [WikimediaMobile] CommonsMetadata API returning HTML?

2014-12-08 Thread Dan Garry
Sorry, the example query I provided was incorrect. Use this instead:
https://en.wikipedia.org/w/api.php?action=queryprop=imageinfoformat=jsonfmiiprop=extmetadataiilimit=10titles=File%3ACommon%20Kingfisher%20Alcedo%20atthis.jpg

Thanks,
Dan

On 8 December 2014 at 11:29, Dan Garry dga...@wikimedia.org wrote:

 Greetings, Multimedia Team!

 *Background:* The Mobile Apps Team is working on a restyling of the way
 content the first fold of content is presented in the Wikipedia app. You
 can see this image http://i.imgur.com/dxqfJKd.png to see what this
 looks like. Having a high-resolution image so prominently at the top of the
 page will likely drive a lot of clicks, so we're working on a lightweight
 image viewer to deal with file pages, which are poorly styled monstrosities
 on the mobile app. We're going to use the CommonsMetadata API to help us
 out. :-)

 *Problem:* The CommonsMetadata API can sometimes return HTML [1]. Having
 HTML in the API response is a bit problematic for us. Native apps make next
 to no use of HTML when creating links or layouts, so we have to strip the
 HTML from every API response, lest it be displayed as plaintext to the
 user. In the short term this is fine, we can strip it and throw the
 information away. But in the long run it'd be better if the API didn't
 return HTML.

 *Our ask: *Can the CommonsMetadata API please not return HTML in its
 responses? :-)

 Thanks,
 Dan

 [1]: Run this query
 https://commons.wikimedia.org/w/api.php?action=queryprop=imageinfoformat=xmliiprop=extmetadataiilimit=10titles=File%3ACommon%20Kingfisher%20Alcedo%20atthis.jpg.,
 and look at artist key. The API response has an HTML link in it.

 --
 Dan Garry
 Associate Product Manager, Mobile Apps
 Wikimedia Foundation




-- 
Dan Garry
Associate Product Manager, Mobile Apps
Wikimedia Foundation
___
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l


[WikimediaMobile] Fwd: CommonsMetadata API returning HTML?

2014-12-08 Thread Florian Schmidt
Forward to mobile-l, forget to press answer all :)

-Ursprüngliche Nachricht-
Von: Florian Schmidt [mailto:florian.schmidt.wel...@t-online.de] 
Gesendet: Montag, 8. Dezember 2014 20:46
An: 'Dan Garry'
Betreff: AW: [WikimediaMobile] CommonsMetadata API returning HTML?

First: That looks great! Maybe a good input for mobile web, too :)

I remember a similar problem for mobile media viewer (you see the author of an 
image, too, if returned by commonsmeta api). The problem is, iirc, that the 
html is in the input (the template used on commons to describe images where the 
information is extracted from), so CommonsMeta has to strip the html out, like 
the app do it locally. I don't know, what would be the best way :/

Kind regards / Freundliche Grüße
Florian

Von: mobile-l-boun...@lists.wikimedia.org 
[mailto:mobile-l-boun...@lists.wikimedia.org] Im Auftrag von Dan Garry
Gesendet: Montag, 8. Dezember 2014 20:31
An: multime...@lists.wikimedia.org; mobile-l
Betreff: Re: [WikimediaMobile] CommonsMetadata API returning HTML?

Sorry, the example query I provided was incorrect. Use this instead: 
https://en.wikipedia.org/w/api.php?action=queryprop=imageinfoformat=jsonfmiiprop=extmetadataiilimit=10titles=File%3ACommon%20Kingfisher%20Alcedo%20atthis.jpg

Thanks,
Dan

On 8 December 2014 at 11:29, Dan Garry dga...@wikimedia.org wrote:
Greetings, Multimedia Team!

Background: The Mobile Apps Team is working on a restyling of the way content 
the first fold of content is presented in the Wikipedia app. You can see this 
image to see what this looks like. Having a high-resolution image so 
prominently at the top of the page will likely drive a lot of clicks, so we're 
working on a lightweight image viewer to deal with file pages, which are poorly 
styled monstrosities on the mobile app. We're going to use the CommonsMetadata 
API to help us out. :-)

Problem: The CommonsMetadata API can sometimes return HTML [1]. Having HTML in 
the API response is a bit problematic for us. Native apps make next to no use 
of HTML when creating links or layouts, so we have to strip the HTML from every 
API response, lest it be displayed as plaintext to the user. In the short term 
this is fine, we can strip it and throw the information away. But in the long 
run it'd be better if the API didn't return HTML.

Our ask: Can the CommonsMetadata API please not return HTML in its responses? 
:-)

Thanks,
Dan

[1]: Run this query, and look at artist key. The API response has an HTML 
link in it.

-- 
Dan Garry
Associate Product Manager, Mobile Apps
Wikimedia Foundation




-- 
Dan Garry
Associate Product Manager, Mobile Apps
Wikimedia Foundation


___
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l


Re: [WikimediaMobile] CommonsMetadata API returning HTML?

2014-12-08 Thread Dan Garry
On 8 December 2014 at 12:52, Derk-Jan Hartman d.j.hart...@gmail.com wrote:

 Welcome to the problem of 'there is no structured metadata for files' :)

 This is a garbage in, garbage out problem and probably when you start
 filtering you will break attribution requirements (more than the community
 will appreciate).


I figured. :-(

So, given that we can't do anything meaningful with the HTML in a native
app, that means we only have three options:

   - Display the raw HTML directly to the user
   - Try to parse the HTML for interesting information and update the
   relevant view's properties using native code
   - Strip any and all HTML tags that are given to us in the JSON

The first two aren't sounding workable at all to me; the first is
unworkable from a product standpoint, and the second is an absolutely
gigantic can of worms. So I guess we'll be stripping the HTML until such
time that this is fixed. :-)

Thanks,
Dan

-- 
Dan Garry
Associate Product Manager, Mobile Apps
Wikimedia Foundation
___
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l


Re: [WikimediaMobile] CommonsMetadata API returning HTML?

2014-12-08 Thread Jon Robson
It would actually be great to get this problem fixed rather than add
yet more band aids on it.
What can we actually do to start moving towards structured metadata on
files? What needs to happen? Can we lean on Wikidata in anyway?


On Mon, Dec 8, 2014 at 2:04 PM, Dan Garry dga...@wikimedia.org wrote:
 On 8 December 2014 at 12:52, Derk-Jan Hartman d.j.hart...@gmail.com wrote:

 Welcome to the problem of 'there is no structured metadata for files' :)

 This is a garbage in, garbage out problem and probably when you start
 filtering you will break attribution requirements (more than the community
 will appreciate).


 I figured. :-(

 So, given that we can't do anything meaningful with the HTML in a native
 app, that means we only have three options:

 Display the raw HTML directly to the user
 Try to parse the HTML for interesting information and update the relevant
 view's properties using native code
 Strip any and all HTML tags that are given to us in the JSON

 The first two aren't sounding workable at all to me; the first is unworkable
 from a product standpoint, and the second is an absolutely gigantic can of
 worms. So I guess we'll be stripping the HTML until such time that this is
 fixed. :-)

 Thanks,
 Dan

 --
 Dan Garry
 Associate Product Manager, Mobile Apps
 Wikimedia Foundation

 ___
 Mobile-l mailing list
 Mobile-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/mobile-l




-- 
Jon Robson
* http://jonrobson.me.uk
* https://www.facebook.com/jonrobson
* @rakugojon

___
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l


Re: [WikimediaMobile] [Multimedia] CommonsMetadata API returning HTML?

2014-12-08 Thread Gergo Tisza
Hi Dan!

On Mon, Dec 8, 2014 at 11:29 AM, Dan Garry dga...@wikimedia.org wrote:

 *Background:* The Mobile Apps Team is working on a restyling of the way
 content the first fold of content is presented in the Wikipedia app. You
 can see this image http://i.imgur.com/dxqfJKd.png to see what this
 looks like.


That looks awesome, can't wait to see it live! Any chance of something like
this eventually hitting the desktop site? :-)

Having a high-resolution image so prominently at the top of the page will
 likely drive a lot of clicks, so we're working on a lightweight image
 viewer to deal with file pages, which are poorly styled monstrosities on
 the mobile app. We're going to use the CommonsMetadata API to help us out.
 :-)


Keep in mind that there is no guarantee the API output is an accurate
representation of the file page (lack of machine-readable template markup
etc. - for example, CommonsMetadata can't figure out the license name for
about 5% of the MediaViewer pageviews), so you'll still need a link to the
raw file page somewhere.

*Problem:* The CommonsMetadata API can sometimes return HTML [1]. Having
 HTML in the API response is a bit problematic for us. Native apps make next
 to no use of HTML when creating links or layouts, so we have to strip the
 HTML from every API response, lest it be displayed as plaintext to the
 user. In the short term this is fine, we can strip it and throw the
 information away. But in the long run it'd be better if the API didn't
 return HTML.


In the long run CommonsMetadata should die in a fire, together with the
Commons paradigm of storing information in license parameters.
You can see the related plans at Commons:Structured data
https://commons.wikimedia.org/wiki/Commons:Structured_data; these include
migrating most information to plaintext (file descriptions will probably
remain rich text).

In the not so long run, some HTML markup is fairly important. Links can be
necessary for the attribution, paragraphs for making long descriptions more
readable; removing lists and tables makes some descriptions unreadable (map
legends tend to use tables, for example). So I think the API would be much
less useful if it started stripping HTML. (It does that already in a few
cases where the intent is clear, such as stripping the enclosing p
generated by MediaWiki, or stripping certain kinds of purely presentational
markup such as creator templates
https://commons.wikimedia.org/wiki/Template:Creator, but that only works
when the source and intent of the markup is known.)

We could add an API parameter to provide a plaintext version, but that
would split the cache (both varnish and memcached). Not a huge deal, but
tag stripping is very easy, so if you don't need anything more specific
than that, I would say it is simpler to do it on the client side. If more
complex logic is needed (e.g. turning uls into star lists), it makes
sense to do that in the API instead of forcing each client to reimplement
it, but I am not sure how generic such a text representation would be.

So, given that we can't do anything meaningful with the HTML in a native
 app, that means we only have three options:

- Display the raw HTML directly to the user


- Try to parse the HTML for interesting information and update the
relevant view's properties using native code


- Strip any and all HTML tags that are given to us in the JSON

 The first two aren't sounding workable at all to me; the first is
 unworkable from a product standpoint, and the second is an absolutely
 gigantic can of worms. So I guess we'll be stripping the HTML until such
 time that this is fixed. :-)


I'm not sure some limited HTML parsing is that bad. The low-hanging fruit
is links (MediaViewer currently strips everything else, and most of the
time that works decently), and those are never nested, so they can be
processed by a trivial SAX parser, for which all platforms surely have
libraries.
___
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l


Re: [WikimediaMobile] [Multimedia] CommonsMetadata API returning HTML?

2014-12-08 Thread Dan Garry
Hey Gergo,

Responses in-line.

On 8 December 2014 at 15:03, Gergo Tisza gti...@wikimedia.org wrote:

 Hi Dan!

 On Mon, Dec 8, 2014 at 11:29 AM, Dan Garry dga...@wikimedia.org wrote:

 *Background:* The Mobile Apps Team is working on a restyling of the way
 content the first fold of content is presented in the Wikipedia app. You
 can see this image http://i.imgur.com/dxqfJKd.png to see what this
 looks like.


 That looks awesome, can't wait to see it live! Any chance of something
 like this eventually hitting the desktop site? :-)


Hah, complicated question! I'd love to see that happen, but unfortunately
it seems unlikely in the near future. :-(

Keep in mind that there is no guarantee the API output is an accurate
 representation of the file page (lack of machine-readable template markup
 etc. - for example, CommonsMetadata can't figure out the license name for
 about 5% of the MediaViewer pageviews), so you'll still need a link to the
 raw file page somewhere.


Fortunately, we knew this going in! We'll be dumping a link to the file
page into the overflow menu. :-)


 In the long run CommonsMetadata should die in a fire, together with the
 Commons paradigm of storing information in license parameters.
 You can see the related plans at Commons:Structured data
 https://commons.wikimedia.org/wiki/Commons:Structured_data; these
 include migrating most information to plaintext (file descriptions will
 probably remain rich text).


Yay! Looking forward to this. \o/


 In the not so long run, some HTML markup is fairly important. Links can be
 necessary for the attribution, paragraphs for making long descriptions more
 readable; removing lists and tables makes some descriptions unreadable (map
 legends tend to use tables, for example). So I think the API would be much
 less useful if it started stripping HTML. (It does that already in a few
 cases where the intent is clear, such as stripping the enclosing p
 generated by MediaWiki, or stripping certain kinds of purely presentational
 markup such as creator templates
 https://commons.wikimedia.org/wiki/Template:Creator, but that only
 works when the source and intent of the markup is known.)


Given that this API is hopefully going to soon die a painful death, it
probably just makes sense for us to strip the HTML ourselves rather than
making you deal with that.

Unfortunately, tables are going to be an issue. On Android, we get some
limited HTML parsing for free using the Html class [1], but the native
TextView class doesn't support displaying tables. On iOS, it's worse,
because we don't get *any* HTML parsing for free, and we actually have to
strip the HTML manually too.

In the interests of keeping this simple, we'll probably be able to handle
links on Android, but not on iOS. And tables will probably just be totally
stripped.

Thanks for your help!

Dan


[1]: On Android, this apparently does the trick where the HTML only
contains links:
textView.setText(htmlStringWithLinks);
textView.setAutoLinkMask(Linkify.WEB_URLS);
textView.setLinksClickable(true);

-- 
Dan Garry
Associate Product Manager, Mobile Apps
Wikimedia Foundation
___
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l


Re: [WikimediaMobile] [Multimedia] CommonsMetadata API returning HTML?

2014-12-08 Thread Dan Garry
On 8 December 2014 at 18:05, Monte Hurd mh...@wikimedia.org wrote:


 On iOS, it's worse, because we don't get *any* HTML parsing for free,
 and we actually have to strip the HTML manually too

 Oops, Dan I may have misspoken - on iOS we can strip html w/NSXMLParser
 which is SAX style. What we don't get for free is labels which can render
 html links like the android ones you showed me.


Okay, thanks for clarifying! Still, we will have to omit links for
simplicity. :-)

Dan

-- 
Dan Garry
Associate Product Manager, Mobile Apps
Wikimedia Foundation
___
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l


Re: [WikimediaMobile] [Multimedia] CommonsMetadata API returning HTML?

2014-12-08 Thread Monte Hurd
Ya could always do a UIWebview for the descriptions but that just seems icky :)


 On Dec 8, 2014, at 6:26 PM, Dan Garry dga...@wikimedia.org wrote:
 
 On 8 December 2014 at 18:05, Monte Hurd mh...@wikimedia.org wrote:
 
 On iOS, it's worse, because we don't get any HTML parsing for free, and we 
 actually have to strip the HTML manually too
 
 Oops, Dan I may have misspoken - on iOS we can strip html w/NSXMLParser 
 which is SAX style. What we don't get for free is labels which can render 
 html links like the android ones you showed me. 
 
 Okay, thanks for clarifying! Still, we will have to omit links for 
 simplicity. :-)
 
 Dan
 
 -- 
 Dan Garry
 Associate Product Manager, Mobile Apps
 Wikimedia Foundation
___
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l


Re: [WikimediaMobile] [Multimedia] CommonsMetadata API returning HTML?

2014-12-08 Thread Monte Hurd
Bahaha!!


 On Dec 8, 2014, at 10:17 PM, Dan Garry dga...@wikimedia.org wrote:
 
 Darth WebView: Your native code is weak old man.
 Objecti-Cee Kenobi: You can't win, WebView. If you strike me down, I shall 
 become more native than you could possibly imagine.
 
 ;-)
 
 Dan
 
 On 8 December 2014 at 22:06, Monte Hurd mh...@wikimedia.org wrote:
 Ya could always do a UIWebview for the descriptions but that just seems icky 
 :)
 
 
 On Dec 8, 2014, at 6:26 PM, Dan Garry dga...@wikimedia.org wrote:
 
 On 8 December 2014 at 18:05, Monte Hurd mh...@wikimedia.org wrote:
 
 On iOS, it's worse, because we don't get any HTML parsing for free, and 
 we actually have to strip the HTML manually too
 
 Oops, Dan I may have misspoken - on iOS we can strip html w/NSXMLParser 
 which is SAX style. What we don't get for free is labels which can render 
 html links like the android ones you showed me. 
 
 Okay, thanks for clarifying! Still, we will have to omit links for 
 simplicity. :-)
 
 Dan
 
 -- 
 Dan Garry
 Associate Product Manager, Mobile Apps
 Wikimedia Foundation
 
 
 
 -- 
 Dan Garry
 Associate Product Manager, Mobile Apps
 Wikimedia Foundation
___
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l