[ https://issues.apache.org/jira/browse/TIKA-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ray Gauss II updated TIKA-845: ------------------------------ Attachment: xml-check-multi-value-existing.diff Patch to check for existing multi-value. > Check for Existing Value in Multi-Value Fields in XML Metadata Handler > ---------------------------------------------------------------------- > > Key: TIKA-845 > URL: https://issues.apache.org/jira/browse/TIKA-845 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 1.0 > Reporter: Ray Gauss II > Fix For: 1.1 > > Attachments: xml-check-multi-value-existing.diff > > > The XML Abstract metdata handler should check for an existing value for > multi-valued fields as well as simple text fields. > Similar metadata may be stored in multiple fields in the source and a > developer may choose to map several source fields to the same tika field, in > which case no check is made for duplicates of existing delimited values. > For example, a developer may want to dump any values contained in legacy IPTC > keywords and dc:subject into tika keywords. If IPTC keywords = > ['rock','tree','dog'] and dc:subject = ['rock','tree','K9'] then currently > tika keywords = ['rock','tree','dog','rock','tree','K9'] instead of the > desired ['rock','tree','dog','K9']. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira