[jira] [Commented] (JENA-1615) Compaction leaks file descriptors

2018-10-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643243#comment-16643243
 ] 

ASF GitHub Bot commented on JENA-1615:
--

GitHub user dobrist opened a pull request:

https://github.com/apache/jena/pull/481

JENA-1615 - Compaction leaks file descriptors

Close file channel when closing a block to release open file descriptors

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dobrist/jena JENA-1615

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/481.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #481


commit 199d4f267a70627b6a79f5a3081b37ca88224921
Author: damienobrist 
Date:   2018-10-09T12:12:43Z

JENA-1615: Close file channel when closing a block

This is necessary to release open file descriptors




> Compaction leaks file descriptors
> -
>
> Key: JENA-1615
> URL: https://issues.apache.org/jira/browse/JENA-1615
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core, TDB2
>Affects Versions: Jena 3.8.0
> Environment: I reproduced the issue on the following environments:
>  * OS / Java:
>  ** MacOS 10.13.5
> Java 1.8.0_161 (Oracle)
>  ** Debian 9.5
> Java 1.8.0_181 (OpenJDK)
>  * Jena version 3.8.0
>  * TDB2 mode: mapped
>Reporter: Damien Obrist
>Priority: Major
> Attachments: open_files_after_compaction_after_gc.png, 
> open_files_after_compaction_before_gc.png, open_files_before_compaction.png
>
>
> h3. Context
> I'm using a TDB2 dataset in a long-running Scala application, in which the 
> dataset gets compacted regularly. After compactions, the application removes 
> the {{Data-}} folder of the previous generation. However, the 
> corresponding disk space isn't properly returned back to the OS, but is still 
> reported as being used by {{df}}. Indeed, {{lsof}} shows that the application 
> keeps open file descriptors that point to the old generation's files. Only 
> stopping / restarting the JVM frees the disk space for good.
> h3. Reproduction steps
>  * Connect to an existing TDB2 dataset
> {code}
> val dataset = TDB2Factory.connectDataset("sample"){code}
>  * Check open files
>   [^open_files_before_compaction.png]
>  * Compact the dataset
>   {code}DatabaseMgr.compact(dataset.asDatasetGraph){code}
>  * Check open files (before garbage collection)
>  [^open_files_after_compaction_before_gc.png]
>  * Check open files (after garbage collection)
>  [^open_files_after_compaction_after_gc.png]
> The last sceenshot shows that, even after garbage collection, there are still 
> open file descriptors pointing to the old generation {{Data-0001}}.
> h3. Impact
> Depending on how disk usage is being reported, this can be quite problematic. 
> In our case, we're running on an OpenShift infrastructure with limited 
> storage. After only a handful of compactions, the storage is considered full 
> and cannot be used anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1615) Compaction leaks file descriptors

2018-10-09 Thread Damien Obrist (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643280#comment-16643280
 ] 

Damien Obrist commented on JENA-1615:
-

h3. Investigation

[^open_files_after_compaction_before_gc.png] shows duplicate file descriptors 
for some of the files. Analyzing the Java heap and playing with garbage 
collection, I saw that the first garbage collection after a compaction releases 
some of these file descriptors. This is expected as 
[BlockAccessMapped#_close|https://github.com/apache/jena/blob/3d456654feb2cf7617a85a5245c80b827900076f/jena-db/jena-dboe-base/src/main/java/org/apache/jena/dboe/base/file/BlockAccessMapped.java#L249]
 dereferences its {{MappedByteBuffers}}, whose file descriptors are 
subsequently closed by 
[FileChannelImpl#Unmapper|http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/sun/nio/ch/FileChannelImpl.java#l784].

[^open_files_after_compaction_after_gc.png] shows that after garbage collection 
there is still an open file descriptor for each TDB file. It seems these are 
the {{FileChannelImpl}}'s [file descriptor 
instances|http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/sun/nio/ch/FileChannelImpl.java#l49],
 which seem to be 
[duplicated|http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/sun/nio/ch/FileChannelImpl.java#l900]
 when creating the memory mapping. The {{FileChannelImpl}} objects get 
[garbage-collected|https://github.com/apache/jena/blob/3f371dfa952f4af8c2f2511cf4f36e82a56f5789/jena-db/jena-dboe-base/src/main/java/org/apache/jena/dboe/base/file/BlockAccessBase.java#L169],
 but their file descriptors are never closed.
h3. Proposed solution

Based on my investigation, I think the {{BlockAccessBase}}'s {{FileChannel}} 
instances need to be closed before they are dereferenced. This would seem to be 
consistent with what is done in TDB1's 
[BlockAccessBase#_close|https://github.com/apache/jena/blob/0d3928eaf449e7b375038a892a6c9c3b0dd05908/jena-tdb/src/main/java/org/apache/jena/tdb/base/file/BlockAccessBase.java#L152].

I have created a pull request with the proposed fix: 
[https://github.com/apache/jena/pull/481]

The change seems to fix the issue: using the mentioned reproduction steps with 
a patched Jena SNAPSHOT shows that all (but two) file descriptors are closed 
after the garbage collection: 
[^open_files_after_compaction_after_gc_with_fix.png]

 

Please let me know what you think and thanks in advance for your feedback!

> Compaction leaks file descriptors
> -
>
> Key: JENA-1615
> URL: https://issues.apache.org/jira/browse/JENA-1615
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core, TDB2
>Affects Versions: Jena 3.8.0
> Environment: I reproduced the issue on the following environments:
>  * OS / Java:
>  ** MacOS 10.13.5
> Java 1.8.0_161 (Oracle)
>  ** Debian 9.5
> Java 1.8.0_181 (OpenJDK)
>  * Jena version 3.8.0
>  * TDB2 mode: mapped
>Reporter: Damien Obrist
>Priority: Major
> Attachments: open_files_after_compaction_after_gc.png, 
> open_files_after_compaction_after_gc_with_fix.png, 
> open_files_after_compaction_before_gc.png, open_files_before_compaction.png
>
>
> h3. Context
> I'm using a TDB2 dataset in a long-running Scala application, in which the 
> dataset gets compacted regularly. After compactions, the application removes 
> the {{Data-}} folder of the previous generation. However, the 
> corresponding disk space isn't properly returned back to the OS, but is still 
> reported as being used by {{df}}. Indeed, {{lsof}} shows that the application 
> keeps open file descriptors that point to the old generation's files. Only 
> stopping / restarting the JVM frees the disk space for good.
> h3. Reproduction steps
>  * Connect to an existing TDB2 dataset
> {code}
> val dataset = TDB2Factory.connectDataset("sample"){code}
>  * Check open files
>   [^open_files_before_compaction.png]
>  * Compact the dataset
>   {code}DatabaseMgr.compact(dataset.asDatasetGraph){code}
>  * Check open files (before garbage collection)
>  [^open_files_after_compaction_before_gc.png]
>  * Check open files (after garbage collection)
>  [^open_files_after_compaction_after_gc.png]
> The last sceenshot shows that, even after garbage collection, there are still 
> open file descriptors pointing to the old generation {{Data-0001}}.
> h3. Impact
> Depending on how disk usage is being reported, this can be quite problematic. 
> In our case, we're running on an OpenShift infrastructure with limited 
> storage. After only a handful of compactions, the storage is considered full 
> and cannot be used anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1615) Compaction leaks file descriptors

2018-10-09 Thread Andy Seaborne (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643660#comment-16643660
 ] 

Andy Seaborne commented on JENA-1615:
-

Hi there - thanks for the clear analysis and for the fix.

{{node-data.bdf}} and {{prefix-data.bdf}} should be closed as well - from a 
code inspection, it is {{TransBinaryDataFile}} not closing the state file:

{noformat}
--- 
a/jena-db/jena-dboe-trans-data/src/main/java/org/apache/jena/dboe/trans/data/TransBinaryDataFile.java
+++ 
b/jena-db/jena-dboe-trans-data/src/main/java/org/apache/jena/dboe/trans/data/TransBinaryDataFile.java
@@ -217,6 +217,7 @@
 
 @Override
 public void close() {
+stateMgr.close();
 binFile.close() ;
 }
 {noformat}

> Compaction leaks file descriptors
> -
>
> Key: JENA-1615
> URL: https://issues.apache.org/jira/browse/JENA-1615
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core, TDB2
>Affects Versions: Jena 3.8.0
> Environment: I reproduced the issue on the following environments:
>  * OS / Java:
>  ** MacOS 10.13.5
> Java 1.8.0_161 (Oracle)
>  ** Debian 9.5
> Java 1.8.0_181 (OpenJDK)
>  * Jena version 3.8.0
>  * TDB2 mode: mapped
>Reporter: Damien Obrist
>Priority: Major
> Attachments: open_files_after_compaction_after_gc.png, 
> open_files_after_compaction_after_gc_with_fix.png, 
> open_files_after_compaction_before_gc.png, open_files_before_compaction.png
>
>
> h3. Context
> I'm using a TDB2 dataset in a long-running Scala application, in which the 
> dataset gets compacted regularly. After compactions, the application removes 
> the {{Data-}} folder of the previous generation. However, the 
> corresponding disk space isn't properly returned back to the OS, but is still 
> reported as being used by {{df}}. Indeed, {{lsof}} shows that the application 
> keeps open file descriptors that point to the old generation's files. Only 
> stopping / restarting the JVM frees the disk space for good.
> h3. Reproduction steps
>  * Connect to an existing TDB2 dataset
> {code}
> val dataset = TDB2Factory.connectDataset("sample"){code}
>  * Check open files
>   [^open_files_before_compaction.png]
>  * Compact the dataset
>   {code}DatabaseMgr.compact(dataset.asDatasetGraph){code}
>  * Check open files (before garbage collection)
>  [^open_files_after_compaction_before_gc.png]
>  * Check open files (after garbage collection)
>  [^open_files_after_compaction_after_gc.png]
> The last sceenshot shows that, even after garbage collection, there are still 
> open file descriptors pointing to the old generation {{Data-0001}}.
> h3. Impact
> Depending on how disk usage is being reported, this can be quite problematic. 
> In our case, we're running on an OpenShift infrastructure with limited 
> storage. After only a handful of compactions, the storage is considered full 
> and cannot be used anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1615) Compaction leaks file descriptors

2018-10-10 Thread Damien Obrist (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16644895#comment-16644895
 ] 

Damien Obrist commented on JENA-1615:
-

Hi [~andy.seaborne], thanks for looking into this!

Indeed these files should be closed as well. Being unfamiliar with the code, I 
struggled when I tried to track them down. Thanks for the fix, I have included 
the change in the pull request.

I can confirm that with these fixes, there are no more open file descriptors 
for the old generation after a compaction (after garbage collection).

> Compaction leaks file descriptors
> -
>
> Key: JENA-1615
> URL: https://issues.apache.org/jira/browse/JENA-1615
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core, TDB2
>Affects Versions: Jena 3.8.0
> Environment: I reproduced the issue on the following environments:
>  * OS / Java:
>  ** MacOS 10.13.5
> Java 1.8.0_161 (Oracle)
>  ** Debian 9.5
> Java 1.8.0_181 (OpenJDK)
>  * Jena version 3.8.0
>  * TDB2 mode: mapped
>Reporter: Damien Obrist
>Priority: Major
> Attachments: open_files_after_compaction_after_gc.png, 
> open_files_after_compaction_after_gc_with_fix.png, 
> open_files_after_compaction_before_gc.png, open_files_before_compaction.png
>
>
> h3. Context
> I'm using a TDB2 dataset in a long-running Scala application, in which the 
> dataset gets compacted regularly. After compactions, the application removes 
> the {{Data-}} folder of the previous generation. However, the 
> corresponding disk space isn't properly returned back to the OS, but is still 
> reported as being used by {{df}}. Indeed, {{lsof}} shows that the application 
> keeps open file descriptors that point to the old generation's files. Only 
> stopping / restarting the JVM frees the disk space for good.
> h3. Reproduction steps
>  * Connect to an existing TDB2 dataset
> {code}
> val dataset = TDB2Factory.connectDataset("sample"){code}
>  * Check open files
>   [^open_files_before_compaction.png]
>  * Compact the dataset
>   {code}DatabaseMgr.compact(dataset.asDatasetGraph){code}
>  * Check open files (before garbage collection)
>  [^open_files_after_compaction_before_gc.png]
>  * Check open files (after garbage collection)
>  [^open_files_after_compaction_after_gc.png]
> The last sceenshot shows that, even after garbage collection, there are still 
> open file descriptors pointing to the old generation {{Data-0001}}.
> h3. Impact
> Depending on how disk usage is being reported, this can be quite problematic. 
> In our case, we're running on an OpenShift infrastructure with limited 
> storage. After only a handful of compactions, the storage is considered full 
> and cannot be used anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1615) Compaction leaks file descriptors

2018-10-11 Thread Andy Seaborne (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646353#comment-16646353
 ] 

Andy Seaborne commented on JENA-1615:
-

Thanks for the confirmation, it's very helpful to have independent verification 
- I'll merge the PR.

> Compaction leaks file descriptors
> -
>
> Key: JENA-1615
> URL: https://issues.apache.org/jira/browse/JENA-1615
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core, TDB2
>Affects Versions: Jena 3.8.0
> Environment: I reproduced the issue on the following environments:
>  * OS / Java:
>  ** MacOS 10.13.5
> Java 1.8.0_161 (Oracle)
>  ** Debian 9.5
> Java 1.8.0_181 (OpenJDK)
>  * Jena version 3.8.0
>  * TDB2 mode: mapped
>Reporter: Damien Obrist
>Priority: Major
> Attachments: open_files_after_compaction_after_gc.png, 
> open_files_after_compaction_after_gc_with_fix.png, 
> open_files_after_compaction_before_gc.png, open_files_before_compaction.png
>
>
> h3. Context
> I'm using a TDB2 dataset in a long-running Scala application, in which the 
> dataset gets compacted regularly. After compactions, the application removes 
> the {{Data-}} folder of the previous generation. However, the 
> corresponding disk space isn't properly returned back to the OS, but is still 
> reported as being used by {{df}}. Indeed, {{lsof}} shows that the application 
> keeps open file descriptors that point to the old generation's files. Only 
> stopping / restarting the JVM frees the disk space for good.
> h3. Reproduction steps
>  * Connect to an existing TDB2 dataset
> {code}
> val dataset = TDB2Factory.connectDataset("sample"){code}
>  * Check open files
>   [^open_files_before_compaction.png]
>  * Compact the dataset
>   {code}DatabaseMgr.compact(dataset.asDatasetGraph){code}
>  * Check open files (before garbage collection)
>  [^open_files_after_compaction_before_gc.png]
>  * Check open files (after garbage collection)
>  [^open_files_after_compaction_after_gc.png]
> The last sceenshot shows that, even after garbage collection, there are still 
> open file descriptors pointing to the old generation {{Data-0001}}.
> h3. Impact
> Depending on how disk usage is being reported, this can be quite problematic. 
> In our case, we're running on an OpenShift infrastructure with limited 
> storage. After only a handful of compactions, the storage is considered full 
> and cannot be used anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1615) Compaction leaks file descriptors

2018-10-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646356#comment-16646356
 ] 

ASF GitHub Bot commented on JENA-1615:
--

Github user asfgit closed the pull request at:

https://github.com/apache/jena/pull/481


> Compaction leaks file descriptors
> -
>
> Key: JENA-1615
> URL: https://issues.apache.org/jira/browse/JENA-1615
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core, TDB2
>Affects Versions: Jena 3.8.0
> Environment: I reproduced the issue on the following environments:
>  * OS / Java:
>  ** MacOS 10.13.5
> Java 1.8.0_161 (Oracle)
>  ** Debian 9.5
> Java 1.8.0_181 (OpenJDK)
>  * Jena version 3.8.0
>  * TDB2 mode: mapped
>Reporter: Damien Obrist
>Priority: Major
> Fix For: Jena 3.10.0
>
> Attachments: open_files_after_compaction_after_gc.png, 
> open_files_after_compaction_after_gc_with_fix.png, 
> open_files_after_compaction_before_gc.png, open_files_before_compaction.png
>
>
> h3. Context
> I'm using a TDB2 dataset in a long-running Scala application, in which the 
> dataset gets compacted regularly. After compactions, the application removes 
> the {{Data-}} folder of the previous generation. However, the 
> corresponding disk space isn't properly returned back to the OS, but is still 
> reported as being used by {{df}}. Indeed, {{lsof}} shows that the application 
> keeps open file descriptors that point to the old generation's files. Only 
> stopping / restarting the JVM frees the disk space for good.
> h3. Reproduction steps
>  * Connect to an existing TDB2 dataset
> {code}
> val dataset = TDB2Factory.connectDataset("sample"){code}
>  * Check open files
>   [^open_files_before_compaction.png]
>  * Compact the dataset
>   {code}DatabaseMgr.compact(dataset.asDatasetGraph){code}
>  * Check open files (before garbage collection)
>  [^open_files_after_compaction_before_gc.png]
>  * Check open files (after garbage collection)
>  [^open_files_after_compaction_after_gc.png]
> The last sceenshot shows that, even after garbage collection, there are still 
> open file descriptors pointing to the old generation {{Data-0001}}.
> h3. Impact
> Depending on how disk usage is being reported, this can be quite problematic. 
> In our case, we're running on an OpenShift infrastructure with limited 
> storage. After only a handful of compactions, the storage is considered full 
> and cannot be used anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1615) Compaction leaks file descriptors

2018-10-12 Thread Damien Obrist (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647555#comment-16647555
 ] 

Damien Obrist commented on JENA-1615:
-

{quote}I'll merge the PR
{quote}
[~andy.seaborne] thanks a lot, that's great to hear! When is version 3.10.0 
scheduled to be released?

> Compaction leaks file descriptors
> -
>
> Key: JENA-1615
> URL: https://issues.apache.org/jira/browse/JENA-1615
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core, TDB2
>Affects Versions: Jena 3.8.0
> Environment: I reproduced the issue on the following environments:
>  * OS / Java:
>  ** MacOS 10.13.5
> Java 1.8.0_161 (Oracle)
>  ** Debian 9.5
> Java 1.8.0_181 (OpenJDK)
>  * Jena version 3.8.0
>  * TDB2 mode: mapped
>Reporter: Damien Obrist
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 3.10.0
>
> Attachments: open_files_after_compaction_after_gc.png, 
> open_files_after_compaction_after_gc_with_fix.png, 
> open_files_after_compaction_before_gc.png, open_files_before_compaction.png
>
>
> h3. Context
> I'm using a TDB2 dataset in a long-running Scala application, in which the 
> dataset gets compacted regularly. After compactions, the application removes 
> the {{Data-}} folder of the previous generation. However, the 
> corresponding disk space isn't properly returned back to the OS, but is still 
> reported as being used by {{df}}. Indeed, {{lsof}} shows that the application 
> keeps open file descriptors that point to the old generation's files. Only 
> stopping / restarting the JVM frees the disk space for good.
> h3. Reproduction steps
>  * Connect to an existing TDB2 dataset
> {code}
> val dataset = TDB2Factory.connectDataset("sample"){code}
>  * Check open files
>   [^open_files_before_compaction.png]
>  * Compact the dataset
>   {code}DatabaseMgr.compact(dataset.asDatasetGraph){code}
>  * Check open files (before garbage collection)
>  [^open_files_after_compaction_before_gc.png]
>  * Check open files (after garbage collection)
>  [^open_files_after_compaction_after_gc.png]
> The last sceenshot shows that, even after garbage collection, there are still 
> open file descriptors pointing to the old generation {{Data-0001}}.
> h3. Impact
> Depending on how disk usage is being reported, this can be quite problematic. 
> In our case, we're running on an OpenShift infrastructure with limited 
> storage. After only a handful of compactions, the storage is considered full 
> and cannot be used anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1615) Compaction leaks file descriptors

2018-10-12 Thread Andy Seaborne (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647893#comment-16647893
 ] 

Andy Seaborne commented on JENA-1615:
-

Jena releases about every 3-4 months though it depends on people's availability 
(volunteers).

You can build the source code (-Pdev will all up to Fuseki2 and is faster) or 
try the development snapshots built daily (maven repo 
[https://repository.apache.org/snapshots).]

 

> Compaction leaks file descriptors
> -
>
> Key: JENA-1615
> URL: https://issues.apache.org/jira/browse/JENA-1615
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core, TDB2
>Affects Versions: Jena 3.8.0
> Environment: I reproduced the issue on the following environments:
>  * OS / Java:
>  ** MacOS 10.13.5
> Java 1.8.0_161 (Oracle)
>  ** Debian 9.5
> Java 1.8.0_181 (OpenJDK)
>  * Jena version 3.8.0
>  * TDB2 mode: mapped
>Reporter: Damien Obrist
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 3.10.0
>
> Attachments: open_files_after_compaction_after_gc.png, 
> open_files_after_compaction_after_gc_with_fix.png, 
> open_files_after_compaction_before_gc.png, open_files_before_compaction.png
>
>
> h3. Context
> I'm using a TDB2 dataset in a long-running Scala application, in which the 
> dataset gets compacted regularly. After compactions, the application removes 
> the {{Data-}} folder of the previous generation. However, the 
> corresponding disk space isn't properly returned back to the OS, but is still 
> reported as being used by {{df}}. Indeed, {{lsof}} shows that the application 
> keeps open file descriptors that point to the old generation's files. Only 
> stopping / restarting the JVM frees the disk space for good.
> h3. Reproduction steps
>  * Connect to an existing TDB2 dataset
> {code}
> val dataset = TDB2Factory.connectDataset("sample"){code}
>  * Check open files
>   [^open_files_before_compaction.png]
>  * Compact the dataset
>   {code}DatabaseMgr.compact(dataset.asDatasetGraph){code}
>  * Check open files (before garbage collection)
>  [^open_files_after_compaction_before_gc.png]
>  * Check open files (after garbage collection)
>  [^open_files_after_compaction_after_gc.png]
> The last sceenshot shows that, even after garbage collection, there are still 
> open file descriptors pointing to the old generation {{Data-0001}}.
> h3. Impact
> Depending on how disk usage is being reported, this can be quite problematic. 
> In our case, we're running on an OpenShift infrastructure with limited 
> storage. After only a handful of compactions, the storage is considered full 
> and cannot be used anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1615) Compaction leaks file descriptors

2018-10-15 Thread Damien Obrist (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16650144#comment-16650144
 ] 

Damien Obrist commented on JENA-1615:
-

[~andy.seaborne] got it, thanks for the pointers!

> Compaction leaks file descriptors
> -
>
> Key: JENA-1615
> URL: https://issues.apache.org/jira/browse/JENA-1615
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Core, TDB2
>Affects Versions: Jena 3.8.0
> Environment: I reproduced the issue on the following environments:
>  * OS / Java:
>  ** MacOS 10.13.5
> Java 1.8.0_161 (Oracle)
>  ** Debian 9.5
> Java 1.8.0_181 (OpenJDK)
>  * Jena version 3.8.0
>  * TDB2 mode: mapped
>Reporter: Damien Obrist
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 3.10.0
>
> Attachments: open_files_after_compaction_after_gc.png, 
> open_files_after_compaction_after_gc_with_fix.png, 
> open_files_after_compaction_before_gc.png, open_files_before_compaction.png
>
>
> h3. Context
> I'm using a TDB2 dataset in a long-running Scala application, in which the 
> dataset gets compacted regularly. After compactions, the application removes 
> the {{Data-}} folder of the previous generation. However, the 
> corresponding disk space isn't properly returned back to the OS, but is still 
> reported as being used by {{df}}. Indeed, {{lsof}} shows that the application 
> keeps open file descriptors that point to the old generation's files. Only 
> stopping / restarting the JVM frees the disk space for good.
> h3. Reproduction steps
>  * Connect to an existing TDB2 dataset
> {code}
> val dataset = TDB2Factory.connectDataset("sample"){code}
>  * Check open files
>   [^open_files_before_compaction.png]
>  * Compact the dataset
>   {code}DatabaseMgr.compact(dataset.asDatasetGraph){code}
>  * Check open files (before garbage collection)
>  [^open_files_after_compaction_before_gc.png]
>  * Check open files (after garbage collection)
>  [^open_files_after_compaction_after_gc.png]
> The last sceenshot shows that, even after garbage collection, there are still 
> open file descriptors pointing to the old generation {{Data-0001}}.
> h3. Impact
> Depending on how disk usage is being reported, this can be quite problematic. 
> In our case, we're running on an OpenShift infrastructure with limited 
> storage. After only a handful of compactions, the storage is considered full 
> and cannot be used anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)