[
https://issues.apache.org/jira/browse/TIKA-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18052210#comment-18052210
]
Hudson commented on TIKA-4623:
------------------------------
SUCCESS: Integrated in Jenkins build Tika ยป tika-main-jdk17 #1167 (See
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk17/1167/])
TIKA-4623 -- for general updates, don't buffer unless enableRewind has been set
(#2534) (github:
[https://github.com/apache/tika/commit/5f9a808ac316ca09699484d207e66614ed7fef5e])
* (edit) tika-core/src/main/java/org/apache/tika/io/CachingInputStream.java
* (edit) tika-core/src/test/java/org/apache/tika/io/TikaInputStreamTest.java
* (edit) tika-core/src/main/java/org/apache/tika/io/TikaInputSource.java
* (edit)
tika-core/src/main/java/org/apache/tika/parser/multiple/AbstractMultipleParser.java
* (edit) tika-core/src/main/java/org/apache/tika/io/TikaInputStream.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pkg-module/src/main/java/org/apache/tika/parser/pkg/PackageParser.java
* (edit) docs/spooling.adoc
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-crypto-module/src/main/java/org/apache/tika/parser/crypto/TSDParser.java
* (edit) tika-core/src/main/java/org/apache/tika/io/FileSource.java
* (edit) tika-core/src/main/java/org/apache/tika/io/CachingSource.java
* (edit)
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-zip-commons/src/main/java/org/apache/tika/zip/utils/ZipSalvager.java
* (edit) tika-core/src/main/java/org/apache/tika/io/ByteArraySource.java
> Improve rewind performance on generic InputStreams in 4.x
> ---------------------------------------------------------
>
> Key: TIKA-4623
> URL: https://issues.apache.org/jira/browse/TIKA-4623
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Minor
>
> On TIKA-4619, we made TikaInputStream rewindable. The benefits of
> rewindability:
> * Can go beyond 2gb.
> * Does not interfere with parser/detector needs to mark reset at non-zero
> offsets
> There are now three types of backing inputstream that are used by
> TikaInputStream: file, bytearray, generic. With generic, we buffer to memory
> and then spool to disk at a certain threshold.
> The one downside with this setup is that we're buffering to memory for the
> generic inputstream when mark/reset might be sufficient.
> On this ticket, we'll look into adding an "enableRewind()" call in
> TikaInputStream. This would be a no-op for file and bytearray backed streams
> (because those are already rewindable). But what it would do is allow for
> basic BufferedInputStream for most file formats that require only that and
> for which we do not need rewindability. This would put the responsibility on
> the digester/detector/parser to know when an inputstream needs to be
> rewindable.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)