[ 
https://issues.apache.org/jira/browse/TIKA-645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved TIKA-645.
--------------------------------

    Resolution: Fixed

The problem was that when the parser was using TikaInputStream.getFile(), no 
bytes were recorded as being read from the stream and the SecureContentHandler 
couldn't figure out where the all the output is coming from.

In revision 1124788 I changed the logic a bit so that when the stream is based 
on a file, the SecureContentHandler class looks at the total size of the input 
file instead of the number of bytes read from the input stream.

> Parsers can't get at an underlying TikaInputStream to get the file if they 
> wanted one
> -------------------------------------------------------------------------------------
>
>                 Key: TIKA-645
>                 URL: https://issues.apache.org/jira/browse/TIKA-645
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.9
>            Reporter: Nick Burch
>            Assignee: Jukka Zitting
>             Fix For: 1.0
>
>
> Spotted this with the office parser, but it should be general. The user 
> creates a TikaInputStream, and passes that off to the parser framework. The 
> Parser that is called may wish to spot that the input is a File backed 
> TikaInputStream, and take a shortcut to use the file instead of the 
> InputStream.
> However, what the parser gets is a TaggedInputStream wrapping a 
> CountingInputStream wrapping the original TikaInputStream. As such, it can't 
> get at the file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to