[ https://issues.apache.org/jira/browse/TIKA-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920002#comment-13920002 ]
Alexandre Madurell commented on TIKA-1252: ------------------------------------------ Hi, [~talli...@apache.org] I will check on the PDFBox board (and open the issue if appropriate and/or link to this one too). Your proposal sounds good. I don't think author-related information would appear anywhere else (unless on a custom metadata tag). And we could check on both values and keep just one if they're equal, or both if they differ. I'm about to tackle the replacement of the <rdf:Bag> with the <rdf:Seq> on a 540+ collection of documents, so... what's 7 more gonna hurt? :D While I let the machine work on the 540+, it would be my pleasure to contribute manually with the other 7 (and the least I can do after having you find a solution for our issue!). I'll check that out and create 4.x through 10.x documents with Acrobat (11.x trial expired on my office's notebook, but I'll install it in another pc and get a 11.x document too). Thanks again! > Tika is not indexing all authors of a PDF > ----------------------------------------- > > Key: TIKA-1252 > URL: https://issues.apache.org/jira/browse/TIKA-1252 > Project: Tika > Issue Type: Bug > Components: metadata, parser > Affects Versions: 1.4 > Environment: Ubuntu 12.04 (x64) Solr 4.6.0 (Amazon Web Services, > Bitnami Stack) > Reporter: Alexandre Madurell > Attachments: Sample (Acrobat 4.x).pdf, Sample (Acrobat 5.x).pdf, > Sample-One-Author.pdf, Sample-Two-Authors.pdf, Sample.pdf, Sample.xmp, > XMP-Import-with-Seq.jpg > > > When submitting a PDF with this information in its XMP metadata: > ... > <dc:creator> > <rdf:Bag> > <rdf:li>Author 1</rdf:li> > <rdf:li>Author 2</rdf:li> > </rdf:Bag> > </dc:creator> > ... > Only the first one appears in the collection: > ... > "author":["Author 1"], > "author_s":"Author 1", > ... > In spite of having set the field to multiValued in the Solr schema: > <field name="author" type="text_general" indexed="true" stored="true" > multiValued="true"/> > Let me know if there's any further specific information I could provide. > Thanks in advance! -- This message was sent by Atlassian JIRA (v6.2#6252)