Damien Obrist created JENA-1615:
-----------------------------------

             Summary: Compaction leaks file descriptors
                 Key: JENA-1615
                 URL: https://issues.apache.org/jira/browse/JENA-1615
             Project: Apache Jena
          Issue Type: Bug
          Components: Core, TDB2
    Affects Versions: Jena 3.8.0
         Environment: I reproduced the issue on the following environments:
 * OS / Java:
 ** MacOS 10.13.5
Java 1.8.0_161 (Oracle)
 ** Debian 9.5
Java 1.8.0_181 (OpenJDK)
 * Jena version 3.8.0
 * TDB2 mode: mapped
            Reporter: Damien Obrist
         Attachments: open_files_after_compaction_after_gc.png, 
open_files_after_compaction_before_gc.png, open_files_before_compaction.png

h3. Context

I'm using a TDB2 dataset in a long-running Scala application, in which the 
dataset gets compacted regularly. After compactions, the application removes 
the {{Data-xxxx}} folder of the previous generation. However, the corresponding 
disk space isn't properly returned back to the OS, but is still reported as 
being used by {{df}}. Indeed, {{lsof}} shows that the application keeps open 
file descriptors that point to the old generation's files. Only stopping / 
restarting the JVM frees the disk space for good.

h3. Reproduction steps

* Connect to an existing TDB2 dataset
  {code:scala}val dataset = TDB2Factory.connectDataset("sample"){code}
* Check open files
  !open_files_before_compaction.png|thumbnail!
* Compact the dataset
  {code:scala}DatabaseMgr.compact(dataset.asDatasetGraph){code}
* Check open files (before garbage collection)
  !open_files_after_compaction_before_gc.png|thumbnail!
* Check open files (after garbage collection)
  !open_files_after_compaction_after_gc.png|thumbnail!

The last sceenshot shows that, even after garbage collection, there are still 
open file descriptors pointing to the old generation {{Data-0001}}.

h3. Impact

Depending on how disk usage is being reported, this can be quite problematic. 
In our case, we're running on an OpenShift infrastructure with limited storage. 
After only a handful of compactions, the storage is considered full and cannot 
be used anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to