Re: [metadata] roadmap proposal available on the wiki

Ray Gauss II Thu, 26 Apr 2012 09:04:01 -0700

I think besides the namespaces, one of the issues Jörg is trying to tackle is 
the structured metadata and the extra time and effort referred to is dealing 
with serialization of structured data to and from a hashmap.


For example I may have metadata similar to:

Contact1
|-- First Name
|-- Last Name
|-- Email
|-- Address
    |-- Street
    |-- City
    ...
Contact 2
|-- First Name
|-- Last Name
|-- Email
|-- Address
    |-- Street
    |-- City
    ...

which could be modeled in a HashMap<String, String[]>, but would be better 
handled by a structured store, be that XMP or something else.

We could consider replacing the underlying Hashmap in Metadata with a 
structured store while still leaving methods like:
   public String Metadata.get(Property property)
intact but could then make a best guess when the requested property is within a 
structure, then add methods like:
   public Object Metadata.getStructured(Property property)
when a user wants the entire structured object.

That approach should be able to maintain backwards compatibility for existing 
implementations and allow for structured and namespaced metadata.

Just a thought,

Ray


On Apr 26, 2012, at 11:37 AM, Mattmann, Chris A (388J) wrote:

> Hi Jörg,
> 
> Thanks for your email, comments below:
> 
> On Apr 26, 2012, at 3:35 AM, Joerg Ehrlich wrote:
> 
>> Hi Chris,
>> 
>> Those are all valid points and I agree that you could do everything with a 
>> Hashmap. 
>> Having the parsers fill the Metadata class and its Hashmap with all needed 
>> information which is then consumed by an XMP component sitting on top of 
>> Tika-Core is definitely an interesting solution which would keep Tika-Core 
>> clean of any dependencies and give the ability to introduce new XMP related 
>> APIs in a least intrusive way.
>> But from my point of view it is also about how much time and effort you 
>> would like to spend implementing and testing code in the Metadata class when 
>> you have something tested and stable that is already available for exactly 
>> that purpose. 
> 
> Well I think our Metadata object is fairly well tested and implemented atm, 
> so I'm not sure what
> extra time and effort we're talking about here? The only extra time and 
> effort I see is in adding
> this XMP extension to it.
> 
>> Another thought that just comes to my mind is that a lot of file formats 
>> already use XMP as one or even the only metadata container and you would 
>> then end up filling the metadata map with the data from the file's XMP and 
>> converting it back to XMP later on, compared to just being able to parse it 
>> as is and having most of the metadata available right away. 
> 
> Yep in tika-xmp (new module) this might be less efficient, but it will 
> maintain a lot of familiarity with folks who
> are used to maintaining the existing Metadata object internals and models/etc.
> 
> Anyways, feel free to push forward, I am just letting you know I am against 
> changing the
> internals of the Metadata model, at least at the moment :) At the same time 
> your enthusiasm
> is great and all I can say is you are doing great and push forward and we'll 
> see where we 
> get...
> 
> Cheers,
> Chris

Re: [metadata] roadmap proposal available on the wiki

Reply via email to