Re: [WikimediaMobile] Thumbnail filename URL extraction rules

2014-12-05 Thread Monte Hurd
Max showed me how to get the file page url from the api. So if all we have
is the image name we can get the file page url automagically. I attached a
sample query to:
https://trello.com/c/cXEMxGb3/8-5-retrieve-file-metadata-from-commonsmetadata-api-and-display-it-in-the-panel

On Fri, Dec 5, 2014 at 3:52 PM, Brion Vibber bvib...@wikimedia.org wrote:

 Per request in meeting, thought I'd stick it on the public list for
 references. :)

 As I recall there should be three possible URL formats for images embedded
 in img tags in wiki pages or returned as thumbnails via the API:

 http(s)?://
 upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/(base-filename)
 ^ original-size images

 http(s)?://
 upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/thumb/(base-filename)/(size)px(possible-other-options)-(base-filename)(.render-extension)
 ?
 ^ thumbnails

 http(s)?://
 upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/thumb/(base-filename)/(size)px(possible-other-options)-thumbnail.(render-extension)
  ^ this last is used in cases where the filename is very very long and we
 can't actually prepend all the options to the filename (happens mostly in
 South Asian languages where UTF-8 is 3 bytes per letter)

 * project: 'wikipedia' in all cases we need to handle; local files on
 Wiktionary etc will have it separate but we don't use these.
 * subdomain: language 'en' etc for Wikipedias, subproject for special-case
 wikis like Commons/'commons'
 * hash1: first digit of md5 hash of the filename (you don't need to use
 this here, consider it opaque)
 * hash2: first 2 digits of md5 hash of the filename
 * base-filename: the base filename -- you want this! This is the raw
 filename for files served at original size; thumbnails will use it as a
 directory component.
 * render-extension: files other than PNG, GIF, and JPEG are rendered to
 one of those, usually PNG. So you'll see things like .svg.png at times --
 but never .png.png. These only appear on thumbnails.
 * size: thumbnails are always given with the pixel size.
 * possible-other-options: Note that other options may include a page
 number for PDF, DjVu, or TIFF files, or a time position for video
 thumbnails. To avoid parsing that stuff out, consider using the
 subdirectory base name on thumbnails if possible.

 -- brion

 ___
 Mobile-l mailing list
 Mobile-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/mobile-l


___
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l


Re: [WikimediaMobile] Thumbnail filename URL extraction rules

2014-12-05 Thread Monte Hurd
Oh, here's the query for the record:

http://en.wikipedia.org/wiki/Special:ApiSandbox#action=queryprop=imageinfoformat=jsoniiprop=urliiurlwidth=55titles=File%3AWiki.png

Should see a  descriptionurl in the results with the file page url.

On Fri, Dec 5, 2014 at 4:23 PM, Monte Hurd mh...@wikimedia.org wrote:

 Max showed me how to get the file page url from the api. So if all we have
 is the image name we can get the file page url automagically. I attached a
 sample query to:
 https://trello.com/c/cXEMxGb3/8-5-retrieve-file-metadata-from-commonsmetadata-api-and-display-it-in-the-panel

 On Fri, Dec 5, 2014 at 3:52 PM, Brion Vibber bvib...@wikimedia.org
 wrote:

 Per request in meeting, thought I'd stick it on the public list for
 references. :)

 As I recall there should be three possible URL formats for images
 embedded in img tags in wiki pages or returned as thumbnails via the API:

 http(s)?://
 upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/(base-filename)
 ^ original-size images

 http(s)?://
 upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/thumb/(base-filename)/(size)px(possible-other-options)-(base-filename)(.render-extension)
 ?
 ^ thumbnails

 http(s)?://
 upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/thumb/(base-filename)/(size)px(possible-other-options)-thumbnail.(render-extension)
  ^ this last is used in cases where the filename is very very long and we
 can't actually prepend all the options to the filename (happens mostly in
 South Asian languages where UTF-8 is 3 bytes per letter)

 * project: 'wikipedia' in all cases we need to handle; local files on
 Wiktionary etc will have it separate but we don't use these.
 * subdomain: language 'en' etc for Wikipedias, subproject for
 special-case wikis like Commons/'commons'
 * hash1: first digit of md5 hash of the filename (you don't need to use
 this here, consider it opaque)
 * hash2: first 2 digits of md5 hash of the filename
 * base-filename: the base filename -- you want this! This is the raw
 filename for files served at original size; thumbnails will use it as a
 directory component.
 * render-extension: files other than PNG, GIF, and JPEG are rendered to
 one of those, usually PNG. So you'll see things like .svg.png at times --
 but never .png.png. These only appear on thumbnails.
 * size: thumbnails are always given with the pixel size.
 * possible-other-options: Note that other options may include a page
 number for PDF, DjVu, or TIFF files, or a time position for video
 thumbnails. To avoid parsing that stuff out, consider using the
 subdirectory base name on thumbnails if possible.

 -- brion

 ___
 Mobile-l mailing list
 Mobile-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/mobile-l



___
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l


Re: [WikimediaMobile] Thumbnail filename URL extraction rules

2014-12-05 Thread Jon Robson
For reference as a learning experiment in alpha, mobile web is
generating infoboxes from Wikidata.
See example @ 
https://en.m.wikipedia.org/wiki/Albert%20Einstein?mobileaction=alpha

We are interested in converting filenames into thumbnail URLs.
Basically wikidata api returns just the title. We are currently thus
using an md5 library to get to the thumbnail. It would be nice to have
a better way of doing this.

I opened a bug [1] but it's not clear what the path forward is for this...

[1] https://phabricator.wikimedia.org/T76827

On Fri, Dec 5, 2014 at 4:24 PM, Monte Hurd mh...@wikimedia.org wrote:
 Oh, here's the query for the record:

 http://en.wikipedia.org/wiki/Special:ApiSandbox#action=queryprop=imageinfoformat=jsoniiprop=urliiurlwidth=55titles=File%3AWiki.png

 Should see a  descriptionurl in the results with the file page url.

 On Fri, Dec 5, 2014 at 4:23 PM, Monte Hurd mh...@wikimedia.org wrote:

 Max showed me how to get the file page url from the api. So if all we have
 is the image name we can get the file page url automagically. I attached a
 sample query to:
 https://trello.com/c/cXEMxGb3/8-5-retrieve-file-metadata-from-commonsmetadata-api-and-display-it-in-the-panel

 On Fri, Dec 5, 2014 at 3:52 PM, Brion Vibber bvib...@wikimedia.org
 wrote:

 Per request in meeting, thought I'd stick it on the public list for
 references. :)

 As I recall there should be three possible URL formats for images
 embedded in img tags in wiki pages or returned as thumbnails via the API:


 http(s)?://upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/(base-filename)
 ^ original-size images


 http(s)?://upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/thumb/(base-filename)/(size)px(possible-other-options)-(base-filename)(.render-extension)?
 ^ thumbnails


 http(s)?://upload.wikimedia.org/(project)/(subdomain)/(hash1)/(hash2)/thumb/(base-filename)/(size)px(possible-other-options)-thumbnail.(render-extension)
  ^ this last is used in cases where the filename is very very long and we
 can't actually prepend all the options to the filename (happens mostly in
 South Asian languages where UTF-8 is 3 bytes per letter)

 * project: 'wikipedia' in all cases we need to handle; local files on
 Wiktionary etc will have it separate but we don't use these.
 * subdomain: language 'en' etc for Wikipedias, subproject for
 special-case wikis like Commons/'commons'
 * hash1: first digit of md5 hash of the filename (you don't need to use
 this here, consider it opaque)
 * hash2: first 2 digits of md5 hash of the filename
 * base-filename: the base filename -- you want this! This is the raw
 filename for files served at original size; thumbnails will use it as a
 directory component.
 * render-extension: files other than PNG, GIF, and JPEG are rendered to
 one of those, usually PNG. So you'll see things like .svg.png at times --
 but never .png.png. These only appear on thumbnails.
 * size: thumbnails are always given with the pixel size.
 * possible-other-options: Note that other options may include a page
 number for PDF, DjVu, or TIFF files, or a time position for video
 thumbnails. To avoid parsing that stuff out, consider using the subdirectory
 base name on thumbnails if possible.

 -- brion

 ___
 Mobile-l mailing list
 Mobile-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/mobile-l




 ___
 Mobile-l mailing list
 Mobile-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/mobile-l




-- 
Jon Robson
* http://jonrobson.me.uk
* https://www.facebook.com/jonrobson
* @rakugojon

___
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l