Ray Gauss II created TIKA-930: --------------------------------- Summary: Consolidation of Some Tika Core Properties Key: TIKA-930 URL: https://issues.apache.org/jira/browse/TIKA-930 Project: Tika Issue Type: Improvement Components: metadata Affects Versions: 1.2 Reporter: Ray Gauss II
There are a few properties in TikaCoreProperties which overlap and I think we should minimize ambiguity by consolidating them into a single composite property with the clearest name, the most general specification referenced as its primary property, and the others and deprecated strings as its secondaries. Here's the proposed pseudo-code for the changes: Remove TikaCoreProperties.SUBJECT TikaCoreProperties.KEYWORDS <- DublinCore.SUBJECT, { Office.KEYWORDS, MSOffice.KEYWORDS, Metadata.SUBJECT } Remove TikaCoreProperties.DATE TikaCoreProperties.CREATION_DATE <- DublinCore.DATE, { Office.CREATION_DATE, MSOffice.CREATION_DATE, Metadata.DATE } Remove TikaCoreProperties.MODIFIED TikaCoreProperties.SAVE_DATE <- DublinCore.MODIFIED, { Office.SAVE_DATE, MSOffice.LAST_SAVED, Metadata.MODIFIED, "Last-Modified" } and an example of the Java changes: {code:title=TikaCoreProperties.java *Before*} /** * @see DublinCore#SUBJECT */ public static final Property SUBJECT = Property.composite(DublinCore.SUBJECT, new Property[] { Property.internalText(Metadata.SUBJECT) }); /** * @see Office#KEYWORDS */ public static final Property KEYWORDS = Property.composite(Office.KEYWORDS, new Property[] { Property.internalTextBag(MSOffice.KEYWORDS) }); {code} would become {code:title= TikaCoreProperties.java *After*} /** * @see DublinCore#SUBJECT * @see Office#KEYWORDS */ public static final Property KEYWORDS = Property.composite(DublinCore.SUBJECT, new Property[] { Office.KEYWORDS, Property.internalTextBag(MSOffice.KEYWORDS), Property.internalText(Metadata.SUBJECT) }); {code} Since this would require a bit of refactoring for parsers that use the properties being removed I thought it best to get some feedback before working up a full patch. Does this seem like a reasonable approach? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira