[ https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153851#comment-15153851 ]
Maruan Sahyoun commented on TIKA-1857: -------------------------------------- Sorry for my delay in answering your question. May I propose the following strategy: a) for static XFA if there is datasets.data use that content for the filed values otherwise extract from the AcroForm. b) for dynamic XFA scrape/extract info from the XFA. Why a different proposal for a) from yours? Adobe Reader/Acrobat use the information from dataset.data for the field value over the possibly differing content in AcroForm (which might happen if the form has been filled out with an XFA aware processor and afterwards was amended with a non XFA aware processor) > Enhance PDFParser to extract text from XFA forms > ------------------------------------------------ > > Key: TIKA-1857 > URL: https://issues.apache.org/jira/browse/TIKA-1857 > Project: Tika > Issue Type: Improvement > Components: parser > Reporter: Pascal Essiembre > Priority: Trivial > Labels: patch > Fix For: 1.13 > > Attachments: 041617_filled_out.pdf, xfa_in_govdocs1.txt > > > Extract text from PDF Forms (XFA). Information about XFA: > https://en.wikipedia.org/wiki/XFA -- This message was sent by Atlassian JIRA (v6.3.4#6332)