[ 
https://issues.apache.org/jira/browse/TIKA-4618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18051821#comment-18051821
 ] 

Tim Allison commented on TIKA-4618:
-----------------------------------

May or may not help?

> Improve spooling strategies in 4.x
> ----------------------------------
>
>                 Key: TIKA-4618
>                 URL: https://issues.apache.org/jira/browse/TIKA-4618
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> On TIKA-4474, there's a request to spool zip based doc formats. With solid 
> state drives, there just isn't the performance hit that there once was. We'd 
> probably be generally better off spooling "random-access" file formats 
> (zip-based, ole and pdf and ?).
>  
> I'm not sure if we do some simple "pre-detection" step to augment 
> "maybeSpool" in the AutoDetectParser, or maybe we just beef up the detectors 
> and allow configuration there so that the zip detector runs the strategy?
>  
> The idea would be to use the underlying file if it exists. If it doesn't, 
> check that the stream is less than a threshold (default = 100kb?), and if so, 
> don't spool...otherwise spool.
> If anyone has any thoughts on the cleanest design, please offer input.
> cc [~manish003] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to