Yes, that is exactly my biggest concern. 
Another nice example is regional metadata like from a face detection (taken 
from MWG guidance V2):
<mwg-rs:Regions rdf:parseType="Resource">
      <mwg-rs:AppliedToDimensions stDim:w="4288" stDim:h="2848" 
stDim:unit="pixel"/>
      <mwg-rs:RegionList>
        <rdf:Bag>
          <rdf:li rdf:parseType="Resource">
            <mwg-rs:Area stArea:x="0.5" stArea:y="0.5" stArea:w="0.06" 
stArea:h="0.09" stArea:unit="normalized"/>
            <mwg-rs:Type>Face</mwg-rs:Type>
            <mwg-rs:Title>John Doe</mwg-rs:Title>
          </rdf:li>
        ...

And I also definitely meant to keep the current metadata class API, while doing 
a best-guess mapping to the internal structural data representation which would 
at least work pretty well for the common set of properties.

But as Chris said, let's get started with step 1 and then for step 2 we can 
start with an extra XMP module for the XMP output. I will update the wiki 
tomorrow.
Thanks for taking the time to discuss this.

Regards
Jörg


-----Original Message-----
From: Ray Gauss II [mailto:ray.ga...@alfresco.com] 
Sent: Donnerstag, 26. April 2012 18:03
To: dev@tika.apache.org
Subject: Re: [metadata] roadmap proposal available on the wiki

I think besides the namespaces, one of the issues Jörg is trying to tackle is 
the structured metadata and the extra time and effort referred to is dealing 
with serialization of structured data to and from a hashmap.

For example I may have metadata similar to:

Contact1
|-- First Name
|-- Last Name
|-- Email
|-- Address
    |-- Street
    |-- City
    ...
Contact 2
|-- First Name
|-- Last Name
|-- Email
|-- Address
    |-- Street
    |-- City
    ...

which could be modeled in a HashMap<String, String[]>, but would be better 
handled by a structured store, be that XMP or something else.

We could consider replacing the underlying Hashmap in Metadata with a 
structured store while still leaving methods like:
   public String Metadata.get(Property property) intact but could then make a 
best guess when the requested property is within a structure, then add methods 
like:
   public Object Metadata.getStructured(Property property) when a user wants 
the entire structured object.

That approach should be able to maintain backwards compatibility for existing 
implementations and allow for structured and namespaced metadata.

Just a thought,

Ray


On Apr 26, 2012, at 11:37 AM, Mattmann, Chris A (388J) wrote:

> Hi Jörg,
> 
> Thanks for your email, comments below:
> 
> On Apr 26, 2012, at 3:35 AM, Joerg Ehrlich wrote:
> 
>> Hi Chris,
>> 
>> Those are all valid points and I agree that you could do everything with a 
>> Hashmap. 
>> Having the parsers fill the Metadata class and its Hashmap with all needed 
>> information which is then consumed by an XMP component sitting on top of 
>> Tika-Core is definitely an interesting solution which would keep Tika-Core 
>> clean of any dependencies and give the ability to introduce new XMP related 
>> APIs in a least intrusive way.
>> But from my point of view it is also about how much time and effort you 
>> would like to spend implementing and testing code in the Metadata class when 
>> you have something tested and stable that is already available for exactly 
>> that purpose. 
> 
> Well I think our Metadata object is fairly well tested and implemented 
> atm, so I'm not sure what extra time and effort we're talking about 
> here? The only extra time and effort I see is in adding this XMP extension to 
> it.
> 
>> Another thought that just comes to my mind is that a lot of file formats 
>> already use XMP as one or even the only metadata container and you would 
>> then end up filling the metadata map with the data from the file's XMP and 
>> converting it back to XMP later on, compared to just being able to parse it 
>> as is and having most of the metadata available right away. 
> 
> Yep in tika-xmp (new module) this might be less efficient, but it will 
> maintain a lot of familiarity with folks who are used to maintaining the 
> existing Metadata object internals and models/etc.
> 
> Anyways, feel free to push forward, I am just letting you know I am 
> against changing the internals of the Metadata model, at least at the 
> moment :) At the same time your enthusiasm is great and all I can say 
> is you are doing great and push forward and we'll see where we get...
> 
> Cheers,
> Chris

Reply via email to