Ray Gauss II created TIKA-930:
---------------------------------
Summary: Consolidation of Some Tika Core Properties
Key: TIKA-930
URL: https://issues.apache.org/jira/browse/TIKA-930
Project: Tika
Issue Type: Improvement
Components: metadata
Affects Versions: 1.2
Reporter: Ray Gauss II
There are a few properties in TikaCoreProperties which overlap and I think we
should minimize ambiguity by consolidating them into a single composite
property with the clearest name, the most general specification referenced as
its primary property, and the others and deprecated strings as its secondaries.
Here's the proposed pseudo-code for the changes:
Remove TikaCoreProperties.SUBJECT
TikaCoreProperties.KEYWORDS <- DublinCore.SUBJECT, { Office.KEYWORDS,
MSOffice.KEYWORDS, Metadata.SUBJECT }
Remove TikaCoreProperties.DATE
TikaCoreProperties.CREATION_DATE <- DublinCore.DATE, { Office.CREATION_DATE,
MSOffice.CREATION_DATE, Metadata.DATE }
Remove TikaCoreProperties.MODIFIED
TikaCoreProperties.SAVE_DATE <- DublinCore.MODIFIED, { Office.SAVE_DATE,
MSOffice.LAST_SAVED, Metadata.MODIFIED, "Last-Modified" }
and an example of the Java changes:
{code:title=TikaCoreProperties.java *Before*}
/**
* @see DublinCore#SUBJECT
*/
public static final Property SUBJECT =
Property.composite(DublinCore.SUBJECT,
new Property[] { Property.internalText(Metadata.SUBJECT) });
/**
* @see Office#KEYWORDS
*/
public static final Property KEYWORDS = Property.composite(Office.KEYWORDS,
new Property[] { Property.internalTextBag(MSOffice.KEYWORDS) });
{code}
would become
{code:title= TikaCoreProperties.java *After*}
/**
* @see DublinCore#SUBJECT
* @see Office#KEYWORDS
*/
public static final Property KEYWORDS =
Property.composite(DublinCore.SUBJECT,
new Property[] {
Office.KEYWORDS,
Property.internalTextBag(MSOffice.KEYWORDS),
Property.internalText(Metadata.SUBJECT)
});
{code}
Since this would require a bit of refactoring for parsers that use the
properties being removed I thought it best to get some feedback before working
up a full patch.
Does this seem like a reasonable approach?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira