[ 
https://issues.apache.org/jira/browse/TIKA-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18052191#comment-18052191
 ] 

ASF GitHub Bot commented on TIKA-4623:
--------------------------------------

tballison merged PR #2534:
URL: https://github.com/apache/tika/pull/2534




> Improve rewind performance on generic InputStreams in 4.x
> ---------------------------------------------------------
>
>                 Key: TIKA-4623
>                 URL: https://issues.apache.org/jira/browse/TIKA-4623
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Minor
>
> On TIKA-4619, we made TikaInputStream rewindable. The benefits of 
> rewindability:
>  * Can go beyond 2gb.
>  * Does not interfere with parser/detector needs to mark reset at non-zero 
> offsets
> There are now three types of backing inputstream that are used by 
> TikaInputStream: file, bytearray, generic. With generic, we buffer to memory 
> and then spool to disk at a certain threshold.
> The one downside with this setup is that we're buffering to memory for the 
> generic inputstream when mark/reset might be sufficient.
> On this ticket, we'll look into adding an "enableRewind()" call in 
> TikaInputStream. This would be a no-op for file and bytearray backed streams 
> (because those are already rewindable). But what it would do is allow for 
> basic BufferedInputStream for most file formats that require only that and 
> for which we do not need rewindability. This would put the responsibility on 
> the digester/detector/parser to know when an inputstream needs to be 
> rewindable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to