[ https://issues.apache.org/jira/browse/TIKA-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116817#comment-14116817 ]
Hudson commented on TIKA-1404: ------------------------------ SUCCESS: Integrated in tika-trunk-jdk1.7 #187 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/187/]) TIKA-1404 The tika-app in server mode needs to close the TikaInputStream when done with it, to avoid leaking temp files (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1621604) * /tika/trunk/tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java > tika-app server leaking temporary files when converting Word97 (doc) > -------------------------------------------------------------------- > > Key: TIKA-1404 > URL: https://issues.apache.org/jira/browse/TIKA-1404 > Project: Tika > Issue Type: Bug > Components: cli > Affects Versions: 1.5, 1.7 > Environment: Linux (observed on CentOS 6.5 and SuSE SLES 11) > Reporter: Lukas Graf > Assignee: Nick Burch > Fix For: 1.7 > > Attachments: simple_word97.doc > > > When converting Word97 documents (*.doc), tika-server reproducibly leaves > behind temporary files. > Steps to reproduce: > - Start {{tika-app-1.5.jar}} in {{--server}} mode > - Send a {{*.doc}} file to server for conversion > - Stop tika-server using CTRL+C or {{kill -15}} > For example: > {code} > lukas@host:~> java -jar tika-app-1.5.jar -v --server --port 8077 --text > # ... > lukas@host:/tmp> ls -lah apache-tika-* > ls: cannot access apache-tika-*: No such file or directory > lukas@host:/tmp> > lukas@host:/tmp> netcat 127.0.0.1 8077 < simple_word97.doc > Simple Word-97 Document > Lorem Ipsum. > lukas@host:/tmp> ls -lah apache-tika-* > -rw-r--r-- 1 lukas users 22K 2014-08-29 15:48 > apache-tika-2457738389388821864.tmp > # after conversion is done, tmp file handles are still open > lukas@host:/tmp> lsof | grep tika > java 29857 lukas 32r REG 104,2 28628386 4571740 > /home/lukas/tika-app-1.5.jar > java 29857 lukas 85r REG 104,2 22528 8604717 > /tmp/apache-tika-2457738389388821864.tmp > java 29857 lukas 86r REG 104,2 22528 8604717 > /tmp/apache-tika-2457738389388821864.tmp > # stop tika-server... > ^C > lukas@host:~> > # ... > lukas@host:/tmp> lsof | grep tika > lukas@host:/tmp> > {code} > No exceptions are thrown, and the plaintext is being extracted correctly from > the document, but temporary files are still left behind every single time. > This obviously is a major issue in a production environment when converting > thousands of documents a day. Our temp directories are filling up rapidly, > and we had to configure cron jobs to clean up after Tika on most of our > production servers. I wasn't able to reproduce this issue using > {{tika-app-1.5.jar}} in non-server mode. However, booting up a JVM for every > single conversion is just too slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)