[ 
https://issues.apache.org/jira/browse/TIKA-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17078338#comment-17078338
 ] 

Boris Petrov commented on TIKA-2849:
------------------------------------

[~tallison] - We hit the same problem as the original issue was about but this 
time for parsing. These two:

{noformat}
org.apache.tika.parser.mp4.MP4Parser.parse(MP4Parser.java:132)
org.apache.tika.parser.external.ExternalParser.parse(ExternalParser.java:222)
{noformat}

On the latest Tika (1.24) copy the file. Could the same fix be done for them? 
If not, my previous question remains very relevant - for us copying the whole 
file is horrible and we have to protect from that happening. So an option to 
tell Tika not do it (just blow up or return an empty string or something) is 
very important. Or at least to have a way of knowing whether Tika will copy or 
not.

What do you think?

> TikaInputStream copies the input stream locally
> -----------------------------------------------
>
>                 Key: TIKA-2849
>                 URL: https://issues.apache.org/jira/browse/TIKA-2849
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.20
>            Reporter: Boris Petrov
>            Assignee: Tim Allison
>            Priority: Major
>             Fix For: 1.21
>
>
> When doing "tika.detect(stream, name)" and the stream is a "TikaInputStream", 
> execution gets to "TikaInputStream#getPath" which does a "Files.copy(in, 
> path, REPLACE_EXISTING);" which is very, very bad. This input stream could 
> be, as in our case, an input stream from a network file which is tens or 
> hundreds of gigabytes large. Copying it locally is a huge waste of resources 
> to say the least. Why does it do that and can I make it not do it? Or is this 
> something that has to be fixed in Tika?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to