2012/04/25 Joerg Ehrlich napisał/wrote:
Hi,

I have put a proposal of a roadmap for the metadata features in Tika on the 
wiki:
http://wiki.apache.org/tika/MetadataRoadmap

The proposal is based on a discussion around this topic I have had with Jukka.
Please review and feel free to edit the wiki for the discussion. I will also 
update the wiki according to the discussion.

BTW, how do I attach an image on that wiki? The documentation mentions the 
"attachment" link, which I am not able to find.

My 2c.

The proposal is great. At last, after five years a way to squeeze some sort of semantics into Tika metadata, that actually looks doable without having to rewrite the library from scratch.

The roadmap seems clear on the todos required from the coding POV. The XMP data model, while more limited than full RDF will likely be enough. The roadmap doesn't give much detail about the intended vocabularies. Dublin core is great, but what else? Joerg? What other kinds of metadata information would you like to extract with Tika, and what vocabularies would you like to use to express them?

At Adobe, you'll likely want Tika to transparently get the XMP metadata from the docs (using whatever vocabularies you use to express whatever info you need) into your metadata-processing software, that already "understands" the semantics of those XMP properties and values. What data would you like to have Tika transform to common vocabularies and what vocabularies will that be?

Antoni Myłka
antoni.my...@gmail.com

Reply via email to