[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830677#comment-17830677 ]
Hudson commented on TIKA-4171: ------------------------------ SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1571 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1571/]) TIKA-4171 -- fix regression when field names are missing in the XFAExtractor (#1679) (tallison: [https://github.com/apache/tika/commit/b9ab4813ed16f53a0bf3aa61883da2cebdf7f3a1]) * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/XFAExtractor.java > Tika server only returns last value for PDFs that have multiple of the same > key > ------------------------------------------------------------------------------- > > Key: TIKA-4171 > URL: https://issues.apache.org/jira/browse/TIKA-4171 > Project: Tika > Issue Type: Bug > Components: tika-server > Reporter: Cassandra Xia > Priority: Major > Fix For: 3.0.0-BETA, 2.9.2 > > Attachments: 20230801-5207_QF20-270 East River Solar Form 556 recert > FINAL.pdf, 876503.pdf, example-output.txt, screenshot.png, > testPDF_XFA_govdocs1_258578.pdf.html > > > Thanks for the great work on Tika server, it is the only OSS that can handle > Adobe's protected form format that FERC uses. > One problem that I'm hitting is that the FERC form that I am parsing has > multiple values for the same key name, e.g. in the screenshot below line 1-7 > all have the same key name. When Tika Server parses this PDF, it only returns > the value in row 7 (losing the previous 6 values). > My hunch is that somewhere in Tika Server, the values are getting stored in > some dictionary object, so the final value is the only survivor. Would it be > possible to return the extra values as a list from Tika Server? > Example PDF attached - thank you for taking a look! > !https://mail.google.com/mail/u/0?ui=2&ik=ee87dc4bd1&attid=0.0.7&permmsgid=msg-f:1782641700487887488&th=18bd372e8760fa80&view=fimg&fur=ip&sz=s0-l75-ft&attbid=ANGjdJ9qEkw6kZ9yBDfMBOUuvFB1Tk8Pti0rRvReEq-eWUoJQxLA6rZ0TQvWCsKUySaDPjjrSi-IiyKseDYpFGzF44A3iSaFw9sOanoBdFMNEZciDnaGhsUFvLSIH_0&disp=emb&realattid=ii_lmdun7ff6! -- This message was sent by Atlassian Jira (v8.20.10#820010)