I'm convinced that using embedded resources is a better solution. Thank Nick @Matt, I ignored that we had a reflect on metadata structure. Interesting.
We would adapt TIKA-90 title & description. I hope provide an initiative on this work. Hong-Thai -----Message d'origine----- De : Nick Burch [mailto:apa...@gagravarr.org] Envoyé : jeudi 9 janvier 2014 15:25 À : dev@tika.apache.org Objet : RE: Extract thumbnail from openxml office files On Thu, 9 Jan 2014, Hong-Thai Nguyen wrote: > I agree with you that metadata is not the best place to store > thumbnail result. Until now, our metadata is simple map with > key:values. This structure is not really flexiable in some cases. Currently, we have four kinds of "things" that we return for content: * Type * Metadata * Content, as xhtml * Any resources embedded in it (eg nested documents, images etc) I'm not disputing that our Metadata setup could use some more work to make it richer (within reason!), what I'm not sure is that an expanded metadata system is the right place to put thumbnails and full-page renderings. Those feel a lot more like embedded resources to me > An other example is for our futur thumbnail. If we can have a metadata > 'thumbnail' with hierarchical structure like: > > Thumbnail: > Dimension > Width > Length > MimeType > Extension > Pages > Description If we returned the thumbnail as an embedded resource, you'd get the type + full metadata on the image (not just width/length), along with extension etc. If we had a common naming scheme for them, possibly with some custom metadata keys, we could return the page number it applies to, along with if it's a thumbnail or a full size rendering (some formats have one, the other, or both) Are you able to explain how your scheme would be simpler and easier to use than returning them as embedded resources? Nick