[ https://issues.apache.org/jira/browse/TIKA-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tyler Palsulich closed TIKA-307. -------------------------------- Resolution: Fixed Zip and other type Parsers are much more robust at this point. Can reopen if still an issue. > Better handling of partial/truncated input data to parsers > ---------------------------------------------------------- > > Key: TIKA-307 > URL: https://issues.apache.org/jira/browse/TIKA-307 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 0.4 > Reporter: Ken Krugler > > Some parsers (e.g. ZipParser) can hang if they prematurely reach the end of > the input stream. > As a way of avoiding this issue, Jukka had suggested the following approach > on the list: > The input stream could be wrapped into a decorator that throws a tagged > IOException when the given size limit has been reached. This assumes that all > parsers will correctly propagate up an IOException (at the very least). > Smarter parsers could cleanly close the emitted XHTML stream, potentially > adding a metadata entry that signifies that the extracted text has been > truncated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)