[ 
https://issues.apache.org/jira/browse/TIKA-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Isabelle Giguere updated TIKA-3203:
-----------------------------------
    Description: 
In our application, Tika is used as part of a Tomcat webapp.  Tomcat sets its 
temp folder ($CATALINA_HOME/temp) as "java.io.tmpdir".  The MP4Parser creates 
files in java.io.tmpdir.  

The files created by the MP4Parser are never deleted from temp/.  Ex: 
MediaDataBox10544109451805035303org.mp4parser.boxes.iso14496.part12.MediaDataBox@77cb1ee8

Oddly, there are no errors in logs.  Nothing about files that cannot be deleted 
or not found.

Other processes in our application needs to create other files in temp/, so we 
can't simply delete everything in that folder.

I assume from TIKA-1040, TIKA-1361, and TIKA-3084 that most file deletion 
issues in the MP4Parser have been fixed.  This may be a little gremlin in 
CentOS or in Tomcat ... ?

I have tried using TemporaryResources (i.e.: replace the "TikaInputStream.get" 
in the code below by TikaInputStream.get(InputStream, TemporaryResources)) to 
put the parser's temporary files in a folder that we can control, but to no 
avail.  Tika's MP4Parser "parse" method initializes a new instance of 
TemporaryResources, so the TemporaryResources that I created is never used.  
The default TemporaryResources would use java.io.tmpdir anyways, right?

So, why aren't these files deleted ?

And, while we are on the subject, there should be a way to set a temporary 
files folder that parsers actually use (and the parser's dependencies).  How 
can a user-defined TemporaryResources be useful if the parser ignores it ?


Relevant code:
{code}
Parser parser = new AutoDetectParser(); // injected by Spring

Path input = ...; // some mp4 audio file
Path output = ...;

final Metadata metadata = new Metadata();

try(InputStream stream = TikaInputStream.get(input, metadata);
    OutputStream outputstream = new FileOutputStream(output.toFile());
    OutputStreamWriter outputStreamWriter = new 
OutputStreamWriter(outputstream, "UTF-8")){

        ParseContext parseContext = new ParseContext();
        
        parser.parse(stream, new BodyContentHandler(outputStreamWriter), 
metadata, parseContext);
        
        // do something with the metadata and the output
}
{code}

  was:
In our application, Tika is used as part of a Tomcat webapp.  Tomcat sets its 
temp folder ($CATALINA_HOME/temp) as "java.io.tmpdir".  The MP4Parser creates 
files in java.io.tmpdir.  

The files created by the MP4Parser are never deleted from temp/.  Ex: 
MediaDataBox10544109451805035303org.mp4parser.boxes.iso14496.part12.MediaDataBox@77cb1ee8

Oddly, there are no errors in logs.  Nothing about files that cannot be deleted 
or not found.

Other processes in our application needs to create other files in temp/, so we 
can't simply delete everything in that folder.

I assume from TIKA-1040, TIKA-1361, and TIKA-3084 that most file deletion 
issues in the MP4Parser have been fixed.  This may be a little gremlin in 
CentOS or in Tomcat ... ?

I have tried using TemporaryResources (i.e.: replace the "TikaInputStream.get" 
in the code below by TikaInputStream.get(InputStream, TemporaryResources)) to 
put the parser's temporary files in a folder that we can control, but to no 
avail.  Tika's MP4Parser "parse" method initializes a new instance of 
TemporaryResources, so the TemporaryResources that I created is never used.  
The default TemporaryResources would use java.io.tmpdir anyways, right?

So, why aren't these files deleted ?

And, while we are on the subject, there should be a way to set a temporary 
files folder that parsers actually use (and the parser's dependencies).  How 
can a user-defined TemporaryResources be useful if the parser ignores it ?


Relevant code:

Parser parser = new AutoDetectParser(); // injected by Spring

Path input = ...; // some mp4 audio file
Path output = ...;

final Metadata metadata = new Metadata();

try(InputStream stream = TikaInputStream.get(input, metadata);
    OutputStream outputstream = new FileOutputStream(output.toFile());
    OutputStreamWriter outputStreamWriter = new 
OutputStreamWriter(outputstream, "UTF-8")){

        ParseContext parseContext = new ParseContext();
        
        parser.parse(stream, new BodyContentHandler(outputStreamWriter), 
metadata, parseContext);
        
        // do something with the metadata and the output
}


> MP4Parser temporary files are not deleted from Tomcat temp folder
> -----------------------------------------------------------------
>
>                 Key: TIKA-3203
>                 URL: https://issues.apache.org/jira/browse/TIKA-3203
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.24.1
>         Environment: CentOS 7.8
> Tomcat webapp
>            Reporter: Isabelle Giguere
>            Priority: Major
>
> In our application, Tika is used as part of a Tomcat webapp.  Tomcat sets its 
> temp folder ($CATALINA_HOME/temp) as "java.io.tmpdir".  The MP4Parser creates 
> files in java.io.tmpdir.  
> The files created by the MP4Parser are never deleted from temp/.  Ex: 
> MediaDataBox10544109451805035303org.mp4parser.boxes.iso14496.part12.MediaDataBox@77cb1ee8
> Oddly, there are no errors in logs.  Nothing about files that cannot be 
> deleted or not found.
> Other processes in our application needs to create other files in temp/, so 
> we can't simply delete everything in that folder.
> I assume from TIKA-1040, TIKA-1361, and TIKA-3084 that most file deletion 
> issues in the MP4Parser have been fixed.  This may be a little gremlin in 
> CentOS or in Tomcat ... ?
> I have tried using TemporaryResources (i.e.: replace the 
> "TikaInputStream.get" in the code below by TikaInputStream.get(InputStream, 
> TemporaryResources)) to put the parser's temporary files in a folder that we 
> can control, but to no avail.  Tika's MP4Parser "parse" method initializes a 
> new instance of TemporaryResources, so the TemporaryResources that I created 
> is never used.  The default TemporaryResources would use java.io.tmpdir 
> anyways, right?
> So, why aren't these files deleted ?
> And, while we are on the subject, there should be a way to set a temporary 
> files folder that parsers actually use (and the parser's dependencies).  How 
> can a user-defined TemporaryResources be useful if the parser ignores it ?
> Relevant code:
> {code}
> Parser parser = new AutoDetectParser(); // injected by Spring
> Path input = ...; // some mp4 audio file
> Path output = ...;
> final Metadata metadata = new Metadata();
> try(InputStream stream = TikaInputStream.get(input, metadata);
>     OutputStream outputstream = new FileOutputStream(output.toFile());
>     OutputStreamWriter outputStreamWriter = new 
> OutputStreamWriter(outputstream, "UTF-8")){
>       ParseContext parseContext = new ParseContext();
>       
>       parser.parse(stream, new BodyContentHandler(outputStreamWriter), 
> metadata, parseContext);
>       
>       // do something with the metadata and the output
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to