[ https://issues.apache.org/jira/browse/TIKA-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13191805#comment-13191805 ]
Ray Gauss II commented on TIKA-845: ----------------------------------- I was following precedence there and actually not even calling that code since ElementMetadataHandler correctly stores as a multivalued object, but you're right and your changes look spot on. > Check for Existing Value in Multi-Value Fields in XML Metadata Handler > ---------------------------------------------------------------------- > > Key: TIKA-845 > URL: https://issues.apache.org/jira/browse/TIKA-845 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 1.0 > Reporter: Ray Gauss II > Fix For: 1.1 > > Attachments: xml-check-multi-value-existing.diff > > > The XML Abstract metdata handler should check for an existing value for > multi-valued fields as well as simple text fields. > Similar metadata may be stored in multiple fields in the source and a > developer may choose to map several source fields to the same tika field, in > which case no check is made for duplicates of existing delimited values. > For example, a developer may want to dump any values contained in legacy IPTC > keywords and dc:subject into tika keywords. If IPTC keywords = > ['rock','tree','dog'] and dc:subject = ['rock','tree','K9'] then currently > tika keywords = ['rock','tree','dog','rock','tree','K9'] instead of the > desired ['rock','tree','dog','K9']. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira