[ 
https://issues.apache.org/jira/browse/TIKA-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18052210#comment-18052210
 ] 

Hudson commented on TIKA-4623:
------------------------------

SUCCESS: Integrated in Jenkins build Tika ยป tika-main-jdk17 #1167 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk17/1167/])
TIKA-4623 -- for general updates, don't buffer unless enableRewind has been set 
(#2534) (github: 
[https://github.com/apache/tika/commit/5f9a808ac316ca09699484d207e66614ed7fef5e])
* (edit) tika-core/src/main/java/org/apache/tika/io/CachingInputStream.java
* (edit) tika-core/src/test/java/org/apache/tika/io/TikaInputStreamTest.java
* (edit) tika-core/src/main/java/org/apache/tika/io/TikaInputSource.java
* (edit) 
tika-core/src/main/java/org/apache/tika/parser/multiple/AbstractMultipleParser.java
* (edit) tika-core/src/main/java/org/apache/tika/io/TikaInputStream.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pkg-module/src/main/java/org/apache/tika/parser/pkg/PackageParser.java
* (edit) docs/spooling.adoc
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-crypto-module/src/main/java/org/apache/tika/parser/crypto/TSDParser.java
* (edit) tika-core/src/main/java/org/apache/tika/io/FileSource.java
* (edit) tika-core/src/main/java/org/apache/tika/io/CachingSource.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-zip-commons/src/main/java/org/apache/tika/zip/utils/ZipSalvager.java
* (edit) tika-core/src/main/java/org/apache/tika/io/ByteArraySource.java


> Improve rewind performance on generic InputStreams in 4.x
> ---------------------------------------------------------
>
>                 Key: TIKA-4623
>                 URL: https://issues.apache.org/jira/browse/TIKA-4623
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Minor
>
> On TIKA-4619, we made TikaInputStream rewindable. The benefits of 
> rewindability:
>  * Can go beyond 2gb.
>  * Does not interfere with parser/detector needs to mark reset at non-zero 
> offsets
> There are now three types of backing inputstream that are used by 
> TikaInputStream: file, bytearray, generic. With generic, we buffer to memory 
> and then spool to disk at a certain threshold.
> The one downside with this setup is that we're buffering to memory for the 
> generic inputstream when mark/reset might be sufficient.
> On this ticket, we'll look into adding an "enableRewind()" call in 
> TikaInputStream. This would be a no-op for file and bytearray backed streams 
> (because those are already rewindable). But what it would do is allow for 
> basic BufferedInputStream for most file formats that require only that and 
> for which we do not need rewindability. This would put the responsibility on 
> the digester/detector/parser to know when an inputstream needs to be 
> rewindable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to