[ https://issues.apache.org/jira/browse/TIKA-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison updated TIKA-3779: ------------------------------ Fix Version/s: 2.4.1 > Temp file leftover in PDFParser.parse() > --------------------------------------- > > Key: TIKA-3779 > URL: https://issues.apache.org/jira/browse/TIKA-3779 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 2.4.0 > Reporter: Tilman Hausherr > Assignee: Tim Allison > Priority: Minor > Fix For: 2.4.1 > > > I've wondered where the many "apache-tika-" files in the temp directory came > from. It turns out that they are all (or most) PDF files so I looked at the > PDF parser module. After looking at the file sizes and getting a file name I > focused on the test {{PDFParserTest.testSortByPosition()}} where the first 2 > parse tests have a leftover file and the 3rd one doesn't. > The difference is that in the third one, {{PDFParser.parse()}} gets a > {{TikaInputStream}} as parameter. {{TikaInputStream().get()}} returns its > parameter. But in the first two, it creates a new object, which is never > closed. So the resource cleanup is never done. > Adding > {code} > if (!(stream instanceof TikaInputStream)) { > tstream.close(); > } > {code} > fixes this, i.e. no leftover files after running PDFParserTest. > There's a null check in that method, but later the object is used without a > null check. So either the null check isn't needed, or there is an NPE risk. -- This message was sent by Atlassian Jira (v8.20.7#820007)