[ 
https://issues.apache.org/jira/browse/TIKA-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132808#comment-13132808
 ] 

Chris A. Mattmann commented on TIKA-759:
----------------------------------------

+1 to this Jukka!

In OODT-ville, for many years we've had something called a "Profile", see:

http://svn.apache.org/repos/asf/oodt/trunk/profile/src/main/java/org/apache/oodt/profile/Profile.java

A Profile is a metadata description of a resource with 3 different sets of 
attributes:

* housekeeping information about the Profile (its ID, created time, etc.)
* information about the data that the Profile points to (this is the Dublin 
Core set of information + some mods, and is housed in the 
http://svn.apache.org/repos/asf/oodt/trunk/profile/src/main/java/org/apache/oodt/profile/ResourceAttributes.java
 file)
* domain-specific metadata, which we keep as a set of ProfileElements (housed 
in the 
http://svn.apache.org/repos/asf/oodt/trunk/profile/src/main/java/org/apache/oodt/profile/ProfileElement.java)
 and its sub-classes, RangedProfileElement.java and 
EnumeratedProfileElement.java. ProfileElements correspond to ISO-11179 style 
elements, with information about (e.g., valid values, ranges, min/max, etc.)

Not saying we should adopt the above. Our OODT stuff is bloated in some areas, 
and could be reduced, but just thought I'd pass it along for some inspiration! 
:-)
                
> Better handling of content type metadata
> ----------------------------------------
>
>                 Key: TIKA-759
>                 URL: https://issues.apache.org/jira/browse/TIKA-759
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata, mime
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>            Priority: Minor
>
> Currently we use the "Content-Type" metadata key for storing (and looking up) 
> the media type of a document. This is simple enough and works well especially 
> with HTTP, but not too well in line with XMP or other metadata standards like 
> Dublin Core. So as an improvement I propose the following:
> * Switch to "dc:format" as the standard metadata key for the content type
> * Keep the existing "Content-Type" key for backwards compatibility with 
> existing clients
> * Make the Metadata class aware of such aliases
> * Add getFormat() and setFormat() utility methods to Metadata to simplify 
> client code and to make the exact metadata key more of an implementation 
> detail in Tika

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to