[
https://issues.apache.org/jira/browse/TIKA-4618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18051821#comment-18051821
]
Tim Allison commented on TIKA-4618:
-----------------------------------
May or may not help?
> Improve spooling strategies in 4.x
> ----------------------------------
>
> Key: TIKA-4618
> URL: https://issues.apache.org/jira/browse/TIKA-4618
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
> On TIKA-4474, there's a request to spool zip based doc formats. With solid
> state drives, there just isn't the performance hit that there once was. We'd
> probably be generally better off spooling "random-access" file formats
> (zip-based, ole and pdf and ?).
>
> I'm not sure if we do some simple "pre-detection" step to augment
> "maybeSpool" in the AutoDetectParser, or maybe we just beef up the detectors
> and allow configuration there so that the zip detector runs the strategy?
>
> The idea would be to use the underlying file if it exists. If it doesn't,
> check that the stream is less than a threshold (default = 100kb?), and if so,
> don't spool...otherwise spool.
> If anyone has any thoughts on the cleanest design, please offer input.
> cc [~manish003]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)