[ 
https://issues.apache.org/jira/browse/TIKA-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabelle Giguere updated TIKA-3203:
-----------------------------------
    Description: 
In our application, Tika is used as part of a Tomcat webapp.  Tomcat sets its 
temp folder ($CATALINA_HOME/temp) as "java.io.tmpdir".  The MP4Parser creates 
files in java.io.tmpdir.  

The files created by the MP4Parser are never deleted from temp/.  Ex: 
MediaDataBox10544109451805035303org.mp4parser.boxes.iso14496.part12.MediaDataBox@77cb1ee8

Oddly, there are no errors in logs.  Nothing about files that cannot be deleted 
or not found.

Other processes in our application needs to create other files in temp/, so we 
can't simply delete everything in that folder.

I assume from TIKA-1040, TIKA-1361, and TIKA-3084 that most file deletion 
issues in the MP4Parser have been fixed.  This may be a little gremlin in 
CentOS or in Tomcat ... ?

I have tried using TemporaryResources (i.e.: replace the "TikaInputStream.get" 
in the code below by TikaInputStream.get(InputStream, TemporaryResources)) to 
put the parser's temporary files in a folder that we can control, but to no 
avail.  Tika's MP4Parser "parse" method initializes a new instance of 
TemporaryResources, so the TemporaryResources that I created is never used.  
The default TemporaryResources would use java.io.tmpdir anyways, right?

So, why aren't these files deleted ?

And, while we are on the subject, there should be a way to set a temporary 
files folder that parsers actually use (and the parser's dependencies).  How 
can a user-defined TemporaryResources be useful if the parser ignores it ?


Relevant code:
{code}
Parser parser = new AutoDetectParser(); // injected by Spring

Path input = ...; // some mp4 audio file
Path output = ...;

final Metadata metadata = new Metadata();

try(InputStream stream = TikaInputStream.get(input, metadata);
    OutputStream outputstream = new FileOutputStream(output.toFile());
    OutputStreamWriter outputStreamWriter = new 
OutputStreamWriter(outputstream, "UTF-8")){

        ParseContext parseContext = new ParseContext();
        
        parser.parse(stream, new BodyContentHandler(outputStreamWriter), 
metadata, parseContext);
        
        // do something with the metadata and the output
}
{code}

Note that I also tried to set java.io.tmpdir to another folder, 
programmatically.  That had no effect either.  Since the application needs to 
use Tomcat's temp folder for other processing, setting java.io.tmpdir on the 
command line is not an option.

  was:
In our application, Tika is used as part of a Tomcat webapp.  Tomcat sets its 
temp folder ($CATALINA_HOME/temp) as "java.io.tmpdir".  The MP4Parser creates 
files in java.io.tmpdir.  

The files created by the MP4Parser are never deleted from temp/.  Ex: 
MediaDataBox10544109451805035303org.mp4parser.boxes.iso14496.part12.MediaDataBox@77cb1ee8

Oddly, there are no errors in logs.  Nothing about files that cannot be deleted 
or not found.

Other processes in our application needs to create other files in temp/, so we 
can't simply delete everything in that folder.

I assume from TIKA-1040, TIKA-1361, and TIKA-3084 that most file deletion 
issues in the MP4Parser have been fixed.  This may be a little gremlin in 
CentOS or in Tomcat ... ?

I have tried using TemporaryResources (i.e.: replace the "TikaInputStream.get" 
in the code below by TikaInputStream.get(InputStream, TemporaryResources)) to 
put the parser's temporary files in a folder that we can control, but to no 
avail.  Tika's MP4Parser "parse" method initializes a new instance of 
TemporaryResources, so the TemporaryResources that I created is never used.  
The default TemporaryResources would use java.io.tmpdir anyways, right?

So, why aren't these files deleted ?

And, while we are on the subject, there should be a way to set a temporary 
files folder that parsers actually use (and the parser's dependencies).  How 
can a user-defined TemporaryResources be useful if the parser ignores it ?


Relevant code:
{code}
Parser parser = new AutoDetectParser(); // injected by Spring

Path input = ...; // some mp4 audio file
Path output = ...;

final Metadata metadata = new Metadata();

try(InputStream stream = TikaInputStream.get(input, metadata);
    OutputStream outputstream = new FileOutputStream(output.toFile());
    OutputStreamWriter outputStreamWriter = new 
OutputStreamWriter(outputstream, "UTF-8")){

        ParseContext parseContext = new ParseContext();
        
        parser.parse(stream, new BodyContentHandler(outputStreamWriter), 
metadata, parseContext);
        
        // do something with the metadata and the output
}
{code}


> MP4Parser temporary files are not deleted from Tomcat temp folder
> -----------------------------------------------------------------
>
>                 Key: TIKA-3203
>                 URL: https://issues.apache.org/jira/browse/TIKA-3203
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.24.1
>         Environment: CentOS 7.8
> Tomcat webapp
>            Reporter: Isabelle Giguere
>            Priority: Major
>
> In our application, Tika is used as part of a Tomcat webapp.  Tomcat sets its 
> temp folder ($CATALINA_HOME/temp) as "java.io.tmpdir".  The MP4Parser creates 
> files in java.io.tmpdir.  
> The files created by the MP4Parser are never deleted from temp/.  Ex: 
> MediaDataBox10544109451805035303org.mp4parser.boxes.iso14496.part12.MediaDataBox@77cb1ee8
> Oddly, there are no errors in logs.  Nothing about files that cannot be 
> deleted or not found.
> Other processes in our application needs to create other files in temp/, so 
> we can't simply delete everything in that folder.
> I assume from TIKA-1040, TIKA-1361, and TIKA-3084 that most file deletion 
> issues in the MP4Parser have been fixed.  This may be a little gremlin in 
> CentOS or in Tomcat ... ?
> I have tried using TemporaryResources (i.e.: replace the 
> "TikaInputStream.get" in the code below by TikaInputStream.get(InputStream, 
> TemporaryResources)) to put the parser's temporary files in a folder that we 
> can control, but to no avail.  Tika's MP4Parser "parse" method initializes a 
> new instance of TemporaryResources, so the TemporaryResources that I created 
> is never used.  The default TemporaryResources would use java.io.tmpdir 
> anyways, right?
> So, why aren't these files deleted ?
> And, while we are on the subject, there should be a way to set a temporary 
> files folder that parsers actually use (and the parser's dependencies).  How 
> can a user-defined TemporaryResources be useful if the parser ignores it ?
> Relevant code:
> {code}
> Parser parser = new AutoDetectParser(); // injected by Spring
> Path input = ...; // some mp4 audio file
> Path output = ...;
> final Metadata metadata = new Metadata();
> try(InputStream stream = TikaInputStream.get(input, metadata);
>     OutputStream outputstream = new FileOutputStream(output.toFile());
>     OutputStreamWriter outputStreamWriter = new 
> OutputStreamWriter(outputstream, "UTF-8")){
>       ParseContext parseContext = new ParseContext();
>       
>       parser.parse(stream, new BodyContentHandler(outputStreamWriter), 
> metadata, parseContext);
>       
>       // do something with the metadata and the output
> }
> {code}
> Note that I also tried to set java.io.tmpdir to another folder, 
> programmatically.  That had no effect either.  Since the application needs to 
> use Tomcat's temp folder for other processing, setting java.io.tmpdir on the 
> command line is not an option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to