[
https://issues.apache.org/jira/browse/TIKA-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18052191#comment-18052191
]
ASF GitHub Bot commented on TIKA-4623:
--------------------------------------
tballison merged PR #2534:
URL: https://github.com/apache/tika/pull/2534
> Improve rewind performance on generic InputStreams in 4.x
> ---------------------------------------------------------
>
> Key: TIKA-4623
> URL: https://issues.apache.org/jira/browse/TIKA-4623
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Minor
>
> On TIKA-4619, we made TikaInputStream rewindable. The benefits of
> rewindability:
> * Can go beyond 2gb.
> * Does not interfere with parser/detector needs to mark reset at non-zero
> offsets
> There are now three types of backing inputstream that are used by
> TikaInputStream: file, bytearray, generic. With generic, we buffer to memory
> and then spool to disk at a certain threshold.
> The one downside with this setup is that we're buffering to memory for the
> generic inputstream when mark/reset might be sufficient.
> On this ticket, we'll look into adding an "enableRewind()" call in
> TikaInputStream. This would be a no-op for file and bytearray backed streams
> (because those are already rewindable). But what it would do is allow for
> basic BufferedInputStream for most file formats that require only that and
> for which we do not need rewindability. This would put the responsibility on
> the digester/detector/parser to know when an inputstream needs to be
> rewindable.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)