Hi Antoni,

> The roadmap doesn't give much detail about the intended vocabularies. 
> Dublin core is great, but what else? Joerg? What other kinds of metadata 
> information would you like to extract with Tika, and what vocabularies would 
> you like to use to express them?
>
> At Adobe, you'll likely want Tika to transparently get the XMP metadata from 
> the docs (using whatever vocabularies you use to express whatever info you 
> need) into your metadata-processing software, that already "understands" the 
> semantics of those XMP properties and values. What data would you like to 
> have Tika transform to common vocabularies and what vocabularies will that be?

Your description about how we handle metadata at Adobe is correct.
Regarding the intended vocabularies I think we have to distinguish between 
"common" file format neutral metadata and data that is specific for a given 
format or purpose.
For the common metadata the proposal was to use the vocabulary as defined in 
the ISO XMP specification Part One, section 8 (see [1]). That vocabulary is 
essentially DublinCore with additional elements from IPTC and Adobe Media 
Management namespace.

Apart from the core properties, the general Idea is to extract as much metadata 
from resources as possible. And the used vocabulary that data is mapped to 
really depends on the use case (i.e. file format and purpose), I think. Here a 
pragmatic approach that uses established standards wherever possible is 
preferable. Unfortunately the established standards often overlap or define 
contradictory mappings and that's where the pragmatic aspect comes into play :)
The Media Annotation Working Group [2] has made a nice try to come up with a 
decent vocabulary where all sorts of information could be mapped into, but 
unfortunately they leave out a lot of information a developer needs to actually 
use it. A - from my point of view - more usable recommendation for at least 
common image formats has been defined by the Metadata Working Group [3]. 

Does this answer your questions?
Regards
Jörg

[1] 
http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=57421
[1b] 
http://wwwimages.adobe.com/www.adobe.com/content/dam/Adobe/en/devnet/xmp/pdfs/XMPSpecificationPart1.pdf
[2] http://www.w3.org/TR/2012/REC-mediaont-10-20120209/
[3] http://metadataworkinggroup.com/

Reply via email to