[ 
https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15137519#comment-15137519
 ] 

Tim Allison commented on TIKA-741:
----------------------------------

bq. Many thanks! I'll upload the fix on our end when I get a chance.

Happy to help.

bq. Given PDFBox 2.0.0 is not out yet, are you open to upgrade Tika code base 
to support that version of PDFBox (replacing support for PDFBox 1.x)?

Once 2.0 is out, y, I think we'll upgrade pretty quickly.  See: TIKA-1285 and 
PDFBOX-3058 for our collaboration in support of 2.0 regression testing.  My dev 
branch for the integration with Tika is on 
[github|https://github.com/tballison/tika/tree/pdfbox2_0] 

bq. like extracting XFA text. I can submit a patch for that as well if you are 
open. 
Yes, please!  

I also noticed that you have some wrappers around Tika more generally.  Again, 
if there's anything that would generally help Tika, please send along.  You may 
want to check out our RecursiveParserWrapper...looks like that has some 
overlapping functionality with what you're doing.

Happy extraction!  Cheers!


> "Zip bomb" (XML nesting) detection is too strict
> ------------------------------------------------
>
>                 Key: TIKA-741
>                 URL: https://issues.apache.org/jira/browse/TIKA-741
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.10
>            Reporter: Erik Hetzner
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 1.0
>
>
> I get "zip bomb" errors from many HTML documents, e.g. 
> http://www.akhbaar.org/wesima_articles/index-20100101-82736.html
> Is there a way that the element nesting level could be made configurable? 30 
> elements just doesn't seem to be enough.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to