[ https://issues.apache.org/jira/browse/TIKA-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132808#comment-13132808 ]
Chris A. Mattmann commented on TIKA-759: ---------------------------------------- +1 to this Jukka! In OODT-ville, for many years we've had something called a "Profile", see: http://svn.apache.org/repos/asf/oodt/trunk/profile/src/main/java/org/apache/oodt/profile/Profile.java A Profile is a metadata description of a resource with 3 different sets of attributes: * housekeeping information about the Profile (its ID, created time, etc.) * information about the data that the Profile points to (this is the Dublin Core set of information + some mods, and is housed in the http://svn.apache.org/repos/asf/oodt/trunk/profile/src/main/java/org/apache/oodt/profile/ResourceAttributes.java file) * domain-specific metadata, which we keep as a set of ProfileElements (housed in the http://svn.apache.org/repos/asf/oodt/trunk/profile/src/main/java/org/apache/oodt/profile/ProfileElement.java) and its sub-classes, RangedProfileElement.java and EnumeratedProfileElement.java. ProfileElements correspond to ISO-11179 style elements, with information about (e.g., valid values, ranges, min/max, etc.) Not saying we should adopt the above. Our OODT stuff is bloated in some areas, and could be reduced, but just thought I'd pass it along for some inspiration! :-) > Better handling of content type metadata > ---------------------------------------- > > Key: TIKA-759 > URL: https://issues.apache.org/jira/browse/TIKA-759 > Project: Tika > Issue Type: Improvement > Components: metadata, mime > Reporter: Jukka Zitting > Assignee: Jukka Zitting > Priority: Minor > > Currently we use the "Content-Type" metadata key for storing (and looking up) > the media type of a document. This is simple enough and works well especially > with HTTP, but not too well in line with XMP or other metadata standards like > Dublin Core. So as an improvement I propose the following: > * Switch to "dc:format" as the standard metadata key for the content type > * Keep the existing "Content-Type" key for backwards compatibility with > existing clients > * Make the Metadata class aware of such aliases > * Add getFormat() and setFormat() utility methods to Metadata to simplify > client code and to make the exact metadata key more of an implementation > detail in Tika -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira