[ 
https://issues.apache.org/jira/browse/TIKA-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18019500#comment-18019500
 ] 

Sandeep Kulkarni commented on TIKA-4482:
----------------------------------------

After upgrading to 3.2.2 we are observing failures for many PDF files like 
below:
{noformat}
Caused by: java.lang.IllegalArgumentException: Unrecognized property 
'http://javax.xml.XMLConstants/property/accessExternalDTD'
        at 
com.ctc.wstx.api.CommonConfig.reportUnknownProperty(CommonConfig.java:167)
        at com.ctc.wstx.api.CommonConfig.setProperty(CommonConfig.java:158)
        at com.ctc.wstx.api.ReaderConfig.setProperty(ReaderConfig.java:35)
        at 
com.ctc.wstx.stax.WstxInputFactory.setProperty(WstxInputFactory.java:402)
        at 
org.apache.tika.utils.XMLReaderUtils.getXMLInputFactory(XMLReaderUtils.java:305)
{noformat}
Is this the same error reported by Michael? Or something else altogether? 
Somehow the link to the user mailing list thread is not working for me.

> tika-server and other modules that bring in woodstox can no longer parse a 
> PDF with XFA
> ---------------------------------------------------------------------------------------
>
>                 Key: TIKA-4482
>                 URL: https://issues.apache.org/jira/browse/TIKA-4482
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Blocker
>
> Original title: Update stax configuration to account for woodstox not 
> handling XMLConstants.ACCESS_EXTERNAL_DTD 
>  
> On the user list, Michael reports that if woodstox is on the classpath, stax 
> parsing fails:
> [https://lists.apache.org/thread/fvvg4lxh301os48kprd8m9sv5wvx98f7]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to