[
https://issues.apache.org/jira/browse/TIKA-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18019500#comment-18019500
]
Sandeep Kulkarni commented on TIKA-4482:
----------------------------------------
After upgrading to 3.2.2 we are observing failures for many PDF files like
below:
{noformat}
Caused by: java.lang.IllegalArgumentException: Unrecognized property
'http://javax.xml.XMLConstants/property/accessExternalDTD'
at
com.ctc.wstx.api.CommonConfig.reportUnknownProperty(CommonConfig.java:167)
at com.ctc.wstx.api.CommonConfig.setProperty(CommonConfig.java:158)
at com.ctc.wstx.api.ReaderConfig.setProperty(ReaderConfig.java:35)
at
com.ctc.wstx.stax.WstxInputFactory.setProperty(WstxInputFactory.java:402)
at
org.apache.tika.utils.XMLReaderUtils.getXMLInputFactory(XMLReaderUtils.java:305)
{noformat}
Is this the same error reported by Michael? Or something else altogether?
Somehow the link to the user mailing list thread is not working for me.
> tika-server and other modules that bring in woodstox can no longer parse a
> PDF with XFA
> ---------------------------------------------------------------------------------------
>
> Key: TIKA-4482
> URL: https://issues.apache.org/jira/browse/TIKA-4482
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Blocker
>
> Original title: Update stax configuration to account for woodstox not
> handling XMLConstants.ACCESS_EXTERNAL_DTD
>
> On the user list, Michael reports that if woodstox is on the classpath, stax
> parsing fails:
> [https://lists.apache.org/thread/fvvg4lxh301os48kprd8m9sv5wvx98f7]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)