[ 
https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830677#comment-17830677
 ] 

Hudson commented on TIKA-4171:
------------------------------

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1571 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1571/])
TIKA-4171 -- fix regression when field names are missing in the XFAExtractor 
(#1679) (tallison: 
[https://github.com/apache/tika/commit/b9ab4813ed16f53a0bf3aa61883da2cebdf7f3a1])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/XFAExtractor.java


> Tika server only returns last value for PDFs that have multiple of the same 
> key
> -------------------------------------------------------------------------------
>
>                 Key: TIKA-4171
>                 URL: https://issues.apache.org/jira/browse/TIKA-4171
>             Project: Tika
>          Issue Type: Bug
>          Components: tika-server
>            Reporter: Cassandra Xia
>            Priority: Major
>             Fix For: 3.0.0-BETA, 2.9.2
>
>         Attachments: 20230801-5207_QF20-270 East River Solar Form 556 recert 
> FINAL.pdf, 876503.pdf, example-output.txt, screenshot.png, 
> testPDF_XFA_govdocs1_258578.pdf.html
>
>
> Thanks for the great work on Tika server, it is the only OSS that can handle 
> Adobe's protected form format that FERC uses. 
> One problem that I'm hitting is that the FERC form that I am parsing has 
> multiple values for the same key name, e.g. in the screenshot below line 1-7 
> all have the same key name. When Tika Server parses this PDF, it only returns 
> the value in row 7 (losing the previous 6 values).
> My hunch is that somewhere in Tika Server, the values are getting stored in 
> some dictionary object, so the final value is the only survivor. Would it be 
> possible to return the extra values as a list from Tika Server? 
> Example PDF attached - thank you for taking a look!
> !https://mail.google.com/mail/u/0?ui=2&ik=ee87dc4bd1&attid=0.0.7&permmsgid=msg-f:1782641700487887488&th=18bd372e8760fa80&view=fimg&fur=ip&sz=s0-l75-ft&attbid=ANGjdJ9qEkw6kZ9yBDfMBOUuvFB1Tk8Pti0rRvReEq-eWUoJQxLA6rZ0TQvWCsKUySaDPjjrSi-IiyKseDYpFGzF44A3iSaFw9sOanoBdFMNEZciDnaGhsUFvLSIH_0&disp=emb&realattid=ii_lmdun7ff6!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to