[ 
https://issues.apache.org/jira/browse/TIKA-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-2787.
-------------------------------
    Fix Version/s: 2.0.0
       Resolution: Fixed

> Make WriteLimitReachedException public and not subclass of SAXException
> -----------------------------------------------------------------------
>
>                 Key: TIKA-2787
>                 URL: https://issues.apache.org/jira/browse/TIKA-2787
>             Project: Tika
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.19.1
>            Reporter: Dmitry Goldenberg
>            Priority: Major
>             Fix For: 2.0.0
>
>
> The idea behind being able to set a limit on text extraction is to be able to 
> get up to N characters extracted back. We just got tripped up by the fact 
> that Tika throws an exception once the limit has been reached.
> This, in and of itself, is not a major hindrance especially since the error 
> message itself clearly states that the extracted text is, "however, 
> available".
> OK, but why is WriteLimitReachedException private? why not public so it can 
> be explicitly caught when the parse() method is called? and why not add it to 
> the signature of the parse method? I don't think it should extend 
> SAXException, either; just cleanly throw it as is.
> Right now, our code makes this cumbersome adjustment around the condition:
> {code:java}
> ContentHandler handler = new BodyContentHandler(limit); // <-- e.g. set to 
> 1000000
> try {
>     parser.parse(dataStream, handler, metadata, parseCtx);
> } catch (IOException | TikaException ex) {
>     throw ex;
> } catch (SAXException ex) {
>     String message = (ex.getMessage() == null) ? "" : ex.getMessage();
>     if (!message.contains("Your document contained more than")) {
>         throw new TikaException("Tika error has occurred.", ex);
>     } else {
>         log.warn("TE limit reached on file {}.", filePath);
>     }
> }
> // Keep the extracted text regardless of WriteLimitReachedException
> String text = handler.toString();
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to