[ https://issues.apache.org/jira/browse/TIKA-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17080195#comment-17080195 ]
LuĂs Filipe Nassif commented on TIKA-2849: ------------------------------------------ Hi [~boris-petrov], There are a number of Tika parsers that need a java.io.File because it is needed by Tika's dependencies. Looking at current sources, I found File is needed by parsers of rar, 7z, pst, mp4, jpg, tif, webp, sqlite, maybe others... Currently there is no way to know if a parser will spool the stream or not. But, my organization have a project with a hard requirement to run a search tool in computers/cellphones with very limited resources in the field, and we prefer to receive an IOException("File size larger than max spool limit") from parsers instead of waiting too long in dangerous places or exhausting computer resources and crashing the app... [~tallison], What do you think of a new TikaInputStream constructor that takes the spool limit or some setMaxSpoolSize() method to set this limit? If reached, TikaInputStream should throw the IOException above. If approved, I can code that, is simple. > TikaInputStream copies the input stream locally > ----------------------------------------------- > > Key: TIKA-2849 > URL: https://issues.apache.org/jira/browse/TIKA-2849 > Project: Tika > Issue Type: Bug > Affects Versions: 1.20 > Reporter: Boris Petrov > Assignee: Tim Allison > Priority: Major > Fix For: 1.21 > > > When doing "tika.detect(stream, name)" and the stream is a "TikaInputStream", > execution gets to "TikaInputStream#getPath" which does a "Files.copy(in, > path, REPLACE_EXISTING);" which is very, very bad. This input stream could > be, as in our case, an input stream from a network file which is tens or > hundreds of gigabytes large. Copying it locally is a huge waste of resources > to say the least. Why does it do that and can I make it not do it? Or is this > something that has to be fixed in Tika? -- This message was sent by Atlassian Jira (v8.3.4#803005)