[ https://issues.apache.org/jira/browse/TIKA-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr updated TIKA-3779: ---------------------------------- Description: I've wondered where the many "apache-tika-" files in the temp directory came from. It turns out that they are all (or most) PDF files so I looked at the PDF parser module. After looking at the file sizes and getting a file name I focused on the test {{PDFParserTest.testSortByPosition()}} where the first 2 parse tests have a leftover file and the 3rd one doesn't. The difference is that in the third one, {{PDFParser.parse()}} gets a {{TikaInputStream}} as parameter. {{TikaInputStream().get()}} returns its parameter. But in the first two, it creates a new object, which is never closed. So the resource cleanup is never done. Adding {code} if (!(stream instanceof TikaInputStream)) { tstream.close(); } {code} fixes this, i.e. no leftover files after running PDFParserTest. There's a null check in that method, but later the object is used without a null check. So either the null check isn't needed, or there is an NPE risk. was: I've wondered where the many "apache-tika-" files in the temp directory came from. It turns out that they are all (or most) PDF files so I looked at the PDF parser module. After looking at the file sizes and getting a file name I focused on the test {{PDFParserTest.testSortByPosition()}} where the first 2 parse tests have a leftover file and the 3rd one doesn't. The difference is that in the third one, {{PDFParser.parse()}} gets a {{TikaInputStream}} as parameter. {{TikaInputStream().get()}} returns its parameter. But in the first two, it creates a new object, which is never closed. So the resource cleanup is never done. Adding {code} if (!(tstream instanceof TikaInputStream)) { tstream.close(); } {code} fixes this, i.e. no leftover files after running PDFParserTest. There's a null check in that method, but later the object is used without a null check. So either the null check isn't needed, or there is an NPE risk. > Temp file leftover in PDFParser.parse() > --------------------------------------- > > Key: TIKA-3779 > URL: https://issues.apache.org/jira/browse/TIKA-3779 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 2.4.0 > Reporter: Tilman Hausherr > Priority: Minor > > I've wondered where the many "apache-tika-" files in the temp directory came > from. It turns out that they are all (or most) PDF files so I looked at the > PDF parser module. After looking at the file sizes and getting a file name I > focused on the test {{PDFParserTest.testSortByPosition()}} where the first 2 > parse tests have a leftover file and the 3rd one doesn't. > The difference is that in the third one, {{PDFParser.parse()}} gets a > {{TikaInputStream}} as parameter. {{TikaInputStream().get()}} returns its > parameter. But in the first two, it creates a new object, which is never > closed. So the resource cleanup is never done. > Adding > {code} > if (!(stream instanceof TikaInputStream)) { > tstream.close(); > } > {code} > fixes this, i.e. no leftover files after running PDFParserTest. > There's a null check in that method, but later the object is used without a > null check. So either the null check isn't needed, or there is an NPE risk. -- This message was sent by Atlassian Jira (v8.20.7#820007)