[ https://issues.apache.org/jira/browse/TIKA-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245572#comment-17245572 ]
Isabelle Giguere commented on TIKA-3203: ---------------------------------------- I upgraded Tika to 1.25 in our application, and deleted the ugly workaround from our code. I can confirm the fix works. Thank you. > MP4Parser temporary files are not deleted from Tomcat temp folder > ----------------------------------------------------------------- > > Key: TIKA-3203 > URL: https://issues.apache.org/jira/browse/TIKA-3203 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.24.1 > Environment: CentOS 7.8 > Tomcat webapp > OpenJDK JRE 11.0.5 > Reporter: Isabelle Giguere > Priority: Major > Fix For: 1.25 > > > In our application, Tika is used as part of a Tomcat webapp. Tomcat sets its > temp folder ($CATALINA_HOME/temp) as "java.io.tmpdir". The MP4Parser creates > files in java.io.tmpdir. > The files created by the MP4Parser are never deleted from temp/. Ex: > MediaDataBox10544109451805035303org.mp4parser.boxes.iso14496.part12.MediaDataBox@77cb1ee8 > Oddly, there are no errors in logs. Nothing about files that cannot be > deleted or not found. > Other processes in our application needs to create other files in temp/, so > we can't simply delete everything in that folder. > I assume from TIKA-1040, TIKA-1361, and TIKA-3084 that most file deletion > issues in the MP4Parser have been fixed. This may be a little gremlin in > CentOS or in Tomcat ... ? > I have tried using TemporaryResources (i.e.: replace the > "TikaInputStream.get" in the code below by TikaInputStream.get(InputStream, > TemporaryResources)) to put the parser's temporary files in a folder that we > can control, but to no avail. Tika's MP4Parser "parse" method initializes a > new instance of TemporaryResources, so the TemporaryResources that I created > is never used. The default TemporaryResources would use java.io.tmpdir > anyways, right? > So, why aren't these files deleted ? > And, while we are on the subject, there should be a way to set a temporary > files folder that parsers actually use (and the parser's dependencies). How > can a user-defined TemporaryResources be useful if the parser ignores it ? > Relevant code: > {code} > Parser parser = new AutoDetectParser(); // injected by Spring > Path input = ...; // some mp4 audio file > Path output = ...; > final Metadata metadata = new Metadata(); > try(InputStream stream = TikaInputStream.get(input, metadata); > OutputStream outputstream = new FileOutputStream(output.toFile()); > OutputStreamWriter outputStreamWriter = new > OutputStreamWriter(outputstream, "UTF-8")){ > ParseContext parseContext = new ParseContext(); > > parser.parse(stream, new BodyContentHandler(outputStreamWriter), > metadata, parseContext); > > // do something with the metadata and the output > } > {code} > Note that I also tried to set java.io.tmpdir to another folder, > programmatically. That had no effect either. Since the application needs to > use Tomcat's temp folder for other processing, setting java.io.tmpdir on the > command line is not an option. -- This message was sent by Atlassian Jira (v8.3.4#803005)