[ 
https://issues.apache.org/jira/browse/TIKA-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284968#comment-13284968
 ] 

Ray Gauss II commented on TIKA-930:
-----------------------------------

Creator:
"The DublinCore creator is usually considered the creator of the intellectual 
property, not the creator of the file"

If we're talking about developers that deal with metadata frequently or 
librarians, taxonomists, etc. then I'd agree, but the average developer may 
not.  I don't have any data to back that up but I don't think we can assume 
everyone knows DublinCore.CREATOR should be used as the IP creator and as such 
we should have separate properties since knowing who created a file can be 
quite useful.

Rating:
I don't have a strong opinion here, but we should elicit conversation on it 
from the group.  This should probably be a separate issue.

Geographic:
We're not really concerned with how the geo data is stored in the file, just 
how we want to present the metadata key and value to users, as generically as 
possible.  This page seems to list several other formats that might conceivably 
be used with Tika: http://en.wikipedia.org/wiki/Geotagging
                
> Consolidation of Some Tika Core Properties
> ------------------------------------------
>
>                 Key: TIKA-930
>                 URL: https://issues.apache.org/jira/browse/TIKA-930
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata
>    Affects Versions: 1.2
>            Reporter: Ray Gauss II
>
> There are a few properties in TikaCoreProperties which overlap and I think we 
> should minimize ambiguity by consolidating them into a single composite 
> property with the clearest name, the most general specification referenced as 
> its primary property, and the others and deprecated strings as its 
> secondaries.
> Here's the proposed pseudo-code for the changes:
> Remove TikaCoreProperties.SUBJECT
> TikaCoreProperties.KEYWORDS <- DublinCore.SUBJECT, { Office.KEYWORDS, 
> MSOffice.KEYWORDS, Metadata.SUBJECT }
> Remove TikaCoreProperties.DATE
> TikaCoreProperties.CREATION_DATE <- DublinCore.DATE, { Office.CREATION_DATE, 
> MSOffice.CREATION_DATE, Metadata.DATE }
> Remove TikaCoreProperties.MODIFIED
> TikaCoreProperties.SAVE_DATE <- DublinCore.MODIFIED, { Office.SAVE_DATE, 
> MSOffice.LAST_SAVED, Metadata.MODIFIED, "Last-Modified" }
> and an example of the Java changes:
> {code:title=TikaCoreProperties.java *Before*}
>     /**
>      * @see DublinCore#SUBJECT
>      */
>     public static final Property SUBJECT = 
> Property.composite(DublinCore.SUBJECT, 
>             new Property[] { Property.internalText(Metadata.SUBJECT) });
>       
>     /**
>      * @see Office#KEYWORDS
>      */
>     public static final Property KEYWORDS = 
> Property.composite(Office.KEYWORDS,
>             new Property[] { Property.internalTextBag(MSOffice.KEYWORDS) });
> {code}
> would become
> {code:title= TikaCoreProperties.java *After*}
>     /**
>      * @see DublinCore#SUBJECT
>      * @see Office#KEYWORDS
>      */
>     public static final Property KEYWORDS = 
> Property.composite(DublinCore.SUBJECT,
>             new Property[] { 
>                   Office.KEYWORDS, 
>                   Property.internalTextBag(MSOffice.KEYWORDS),
>                   Property.internalText(Metadata.SUBJECT)
>               });
> {code}
> Since this would require a bit of refactoring for parsers that use the 
> properties being removed I thought it best to get some feedback before 
> working up a full patch.
> Does this seem like a reasonable approach?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to