[ 
https://issues.apache.org/jira/browse/TIKA-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13191228#comment-13191228
 ] 

Nick Burch commented on TIKA-845:
---------------------------------

I think the current logic isn't quite correct. Rather than ending up with a 
proper multivalued metadata object, we end up with a single string of comma 
separated values, which seems wrong to me

What I've done is fix up the logic, which allows for what looks to be a cleaner 
way to check for duplicates

I've also fixed up the single unit test that depending on the old comma 
concatination, DcXMLParserTest, to now check for the correct multivalued 
approach

I've committed this in r1234873.
                
> Check for Existing Value in Multi-Value Fields in XML Metadata Handler
> ----------------------------------------------------------------------
>
>                 Key: TIKA-845
>                 URL: https://issues.apache.org/jira/browse/TIKA-845
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Ray Gauss II
>             Fix For: 1.1
>
>         Attachments: xml-check-multi-value-existing.diff
>
>
> The XML Abstract metdata handler should check for an existing value for 
> multi-valued fields as well as simple text fields.
> Similar metadata may be stored in multiple fields in the source and a 
> developer may choose to map several source fields to the same tika field, in 
> which case no check is made for duplicates of existing delimited values.
> For example, a developer may want to dump any values contained in legacy IPTC 
> keywords and dc:subject into tika keywords.  If IPTC keywords = 
> ['rock','tree','dog'] and dc:subject = ['rock','tree','K9'] then currently 
> tika keywords = ['rock','tree','dog','rock','tree','K9'] instead of the 
> desired ['rock','tree','dog','K9'].

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to