[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-12 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359903#comment-14359903
 ] 

Chetan Mehrotra commented on OAK-2557:
--

bq. docIdsToDelete.add(prevDoc.getId()); << this

Thats important and should be covered by a test. Opened OAK-2623 for that

bq. The "log" part has a side effect (docIdsToDelete.getIds(), which might open 
new files)

Changed the log part to not have that side effect

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Assignee: Chetan Mehrotra
>Priority: Blocker
> Fix For: 1.1.8, 1.0.13
>
> Attachments: OAK-2557-2.patch, OAK-2557-3.patch, OAK-2557.patch
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-12 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359898#comment-14359898
 ] 

Chetan Mehrotra commented on OAK-2557:
--

Minor nit pick there :)

Above approach creates a potential security risk as per [Guava 
Files|http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/io/Files.html#createTempDir%28%29].
 Though those risks do not apply in our usecase!

{quote}
Use this method instead of File.createTempFile(String, String) when you wish to 
create a directory, not a regular file. A common pitfall is to call 
createTempFile, delete the file and create a directory in its place, but this 
leads a race condition which can be exploited to create security 
vulnerabilities, especially when executable files are to be written into the 
directory. 
{quote}

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Assignee: Chetan Mehrotra
>Priority: Blocker
> Fix For: 1.1.8, 1.0.13
>
> Attachments: OAK-2557-2.patch, OAK-2557-3.patch, OAK-2557.patch
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-12 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359896#comment-14359896
 ] 

Chetan Mehrotra commented on OAK-2557:
--

bq. maybe it could help us see the progress

That would be useful. Changed the impl a bit with 
http://svn.apache.org/r1666352. Would that meet the requirements

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Assignee: Chetan Mehrotra
>Priority: Blocker
> Fix For: 1.1.8, 1.0.13
>
> Attachments: OAK-2557-2.patch, OAK-2557-3.patch, OAK-2557.patch
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-12 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358742#comment-14358742
 ] 

Michael Dürig commented on OAK-2557:


You can do

{code}
  directory = File.createTempFile();
  directory.delete();
  directory.mkdir();
{code}

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Assignee: Chetan Mehrotra
>Priority: Blocker
> Fix For: 1.0.13, 1.2
>
> Attachments: OAK-2557-2.patch, OAK-2557-3.patch, OAK-2557.patch
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-12 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358718#comment-14358718
 ] 

Stefan Egli commented on OAK-2557:
--

bq. [~tmueller] I think createTempFile can not create directories (which is 
what we need here).
[right|http://stackoverflow.com/questions/617414/create-a-temporary-directory-in-java],
 I see

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Assignee: Chetan Mehrotra
>Priority: Blocker
> Fix For: 1.0.13, 1.2
>
> Attachments: OAK-2557-2.patch, OAK-2557-3.patch, OAK-2557.patch
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-12 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358704#comment-14358704
 ] 

Thomas Mueller commented on OAK-2557:
-

I made some code coverage tests, and it looks very good to me. What is not 
covered by "deleteLargeNumber" are:

{noformat}
collectDeletedDocuments
..
docIdsToDelete.add(prevDoc.getId()); << this
..
if (docIdsToDelete.isEmpty()){
return; << this
}
..
if (log.isDebugEnabled() && docIdsToDelete.getSize() < 1000) {
.. << this
}
{noformat}

The "log" part has a side effect (docIdsToDelete.getIds(), which might open new 
files)

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Assignee: Chetan Mehrotra
>Priority: Blocker
> Fix For: 1.0.13, 1.2
>
> Attachments: OAK-2557-2.patch, OAK-2557-3.patch, OAK-2557.patch
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-12 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358699#comment-14358699
 ] 

Thomas Mueller commented on OAK-2557:
-

I think large log files are acceptable, maybe it could help us see the progress?

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Assignee: Chetan Mehrotra
>Priority: Blocker
> Fix For: 1.0.13, 1.2
>
> Attachments: OAK-2557-2.patch, OAK-2557-3.patch, OAK-2557.patch
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-12 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358694#comment-14358694
 ] 

Chetan Mehrotra commented on OAK-2557:
--

bq. I noticed with debug level logging, the removed entries are logged, but 
only up to 1000

Can do that. But like in current case the list might be pretty big 12M. So 
avoided flooding the logs

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Assignee: Chetan Mehrotra
>Priority: Blocker
> Fix For: 1.0.13, 1.2
>
> Attachments: OAK-2557-2.patch, OAK-2557-3.patch, OAK-2557.patch
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-12 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358639#comment-14358639
 ] 

Thomas Mueller commented on OAK-2557:
-

[~egli], I think createTempFile can not create directories (which is what we 
need here).

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Assignee: Chetan Mehrotra
>Priority: Blocker
> Fix For: 1.0.13, 1.2
>
> Attachments: OAK-2557-2.patch, OAK-2557-3.patch, OAK-2557.patch
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-12 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358632#comment-14358632
 ] 

Thomas Mueller commented on OAK-2557:
-

I assume the document ids can not contain strange characters (newlines), right?

I noticed with debug level logging, the removed entries are logged, but only up 
to 1000. Maybe it's better to log in all cases? The list is iterated over twice 
in this case (for logging and for removing), but the logic looks good to me. 

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Assignee: Chetan Mehrotra
>Priority: Blocker
> Fix For: 1.0.13, 1.2
>
> Attachments: OAK-2557-2.patch, OAK-2557-3.patch, OAK-2557.patch
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-12 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358628#comment-14358628
 ] 

Stefan Egli commented on OAK-2557:
--

Only a small comment/question: is there a reason you're doing the temp file 
creation manually vs using {{java.io.File.createTempFile}} ?

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Assignee: Chetan Mehrotra
>Priority: Blocker
> Fix For: 1.0.13, 1.2
>
> Attachments: OAK-2557-2.patch, OAK-2557-3.patch, OAK-2557.patch
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-12 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358601#comment-14358601
 ] 

Thomas Mueller commented on OAK-2557:
-

The StringSort doesn't support newlines so it's not a generic tool yet, also 
performance could probably be improved by pre-sorting and removing duplicates 
in the buffer (before writing), but that's all minor issue.

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Assignee: Chetan Mehrotra
>Priority: Blocker
> Fix For: 1.0.13, 1.2
>
> Attachments: OAK-2557-2.patch, OAK-2557-3.patch, OAK-2557.patch
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-12 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358595#comment-14358595
 ] 

Marcel Reutegger commented on OAK-2557:
---

Forgot one thing: the patch contains a change in the logback-test.xml. I assume 
this is unintentional and will not be committed.

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Assignee: Chetan Mehrotra
>Priority: Blocker
> Fix For: 1.0.13, 1.2
>
> Attachments: OAK-2557-2.patch, OAK-2557-3.patch, OAK-2557.patch
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-12 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358591#comment-14358591
 ] 

Marcel Reutegger commented on OAK-2557:
---

Looks better now. [~tmueller], WDYT?

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Assignee: Chetan Mehrotra
>Priority: Blocker
> Fix For: 1.0.13, 1.2
>
> Attachments: OAK-2557-2.patch, OAK-2557-3.patch, OAK-2557.patch
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-12 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358517#comment-14358517
 ] 

Chetan Mehrotra commented on OAK-2557:
--

Thanks for the detailed review [~mreutegg]. Attached is the [updated 
patch|^OAK-2557-3.patch]

bq. Maybe even promote NodeDocIdCollector to a top level class to avoid 
breaking encapsulation?

That makes more sense as its a generic class not bound to Document logic. I 
have refactored and moved this class to oak-commons under sort package. 
[~tmueller] Can you have a quick look as with this it becomes part of Oak API

bq. NodeDocIdCollector.sort() uses a hard coded Comparator when sorting in 
memory instead of the instance passed in the constructor.

Fixed that

bq. NodeDocIdCollector.flushToFile() uses PrintWriter.println(). This method 
does not throw an IOException when the write fails. I think it would be better 
to use BufferedWriter directly.

Did not knew that. Refactored to use {{BufferedWriter}}

bq. VersionGCState.close() deletes the the directory before resources are 
closed. I think this will fail on Windows based machines.

Thanks for catching that. Would have missed this completely!

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Assignee: Chetan Mehrotra
>Priority: Blocker
> Fix For: 1.0.13, 1.2
>
> Attachments: OAK-2557-2.patch, OAK-2557-3.patch, OAK-2557.patch
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-12 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358413#comment-14358413
 ] 

Marcel Reutegger commented on OAK-2557:
---

I have a couple of suggestions:

In VersionGarbageCollector:

{noformat}
if (log.isDebugEnabled() && docIdsToDelete.size < 1000) {
{noformat}

Rather call {{docIdsToDelete.getSize()}}? Maybe even promote NodeDocIdCollector 
to a top level class to avoid breaking encapsulation?

NodeDocIdCollector.sort() uses a hard coded Comparator when sorting in memory 
instead of the instance passed in the constructor.

NodeDocIdCollector.flushToFile() uses {{PrintWriter.println()}}. This method 
does not throw an IOException when the write fails. I think it would be better 
to use BufferedWriter directly.

VersionGCState.close() deletes the the directory before resources are closed. I 
think this will fail on Windows based machines.

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Assignee: Chetan Mehrotra
>Priority: Blocker
> Fix For: 1.0.13, 1.2
>
> Attachments: OAK-2557-2.patch, OAK-2557.patch
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-12 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358376#comment-14358376
 ] 

Stefan Egli commented on OAK-2557:
--

Discussed offline with [~mreutegg]: created OAK-2613 to follow this up

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Assignee: Chetan Mehrotra
>Priority: Blocker
> Fix For: 1.0.13, 1.2
>
> Attachments: OAK-2557-2.patch, OAK-2557.patch
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-10 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356267#comment-14356267
 ] 

Chetan Mehrotra commented on OAK-2557:
--

Version GC currently ensures that query fired is made against the Secondary (if 
present). However having some throttling in such background task would be good 
thing to have. But first we need to have some {{SystemLoadIndicator}} notion in 
Oak which can be provide details say in percentage 1..100 about system load. We 
can then expose configurable threshold which VersionGC would listen for and 
adjust its working accordingly.

It can be a JMX bean which emits notification and we have our components listen 
to those notification (or use OSGi SR/Events). That can be used in other places 
like Observation processing, Blob GC etc

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Assignee: Chetan Mehrotra
>Priority: Blocker
> Fix For: 1.0.13, 1.2
>
> Attachments: OAK-2557.patch
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-10 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355322#comment-14355322
 ] 

Stefan Egli commented on OAK-2557:
--

{{}}
we could make the VersionGC play nicely to existing load on the system: it 
could progress slower if the load is higher and vice-verca. One simple measure 
could be: if the observation queue is small (eg below 10) then the load is low 
and it could progress full-speed. Otherwise it could add some artificial 
sleeping in between.
{{}}

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Assignee: Chetan Mehrotra
>Priority: Blocker
> Fix For: 1.0.13, 1.2
>
> Attachments: OAK-2557.patch
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-09 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352867#comment-14352867
 ] 

Chetan Mehrotra commented on OAK-2557:
--

After an offline discussion with [~mreutegg] we come to following conclusion

# Deletion logic has to work like creation logic and has to perform deletion in 
bottom to top way
# Current logic also has a potential issue where if the system performing GC 
crashes in between then it might lead to a state where parent would have got 
removed before child and in that case such child document can never be GCed. So 
as a fix we should first sort the batch via {{PathComparator}} in reverse and 
then perform deletion from child to parent. {color:brown}Open a new issue for 
that{color}
# For large deletion (like current case) we make use of {{ExternalSort}} where 
sorting is performed on disk and we read for paths stored in file. This would 
make use of all the support developed in Blob GC in 
{{MarkSweepGarbageCollector}}

All in all this would not be a simple fix that I initially thought :(

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Priority: Blocker
> Fix For: 1.2, 1.0.13
>
> Attachments: OAK-2557.patch
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-09 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352819#comment-14352819
 ] 

Marcel Reutegger commented on OAK-2557:
---

The revision GC needs to remove documents starting at the leafs. Otherwise the 
store is in an inconsistent state as you noticed. I assume the revision GC 
considers /x/child501 as not yet committed because commit flag is missing, 
hence it cannot be removed.

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Priority: Blocker
> Fix For: 1.2, 1.0.13
>
> Attachments: OAK-2557.patch
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-09 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352738#comment-14352738
 ] 

Chetan Mehrotra commented on OAK-2557:
--

bq. Could it be a side-effect of too many nodes being indexed due to OAK-2559 ?

Nopes. The paths in the list were for some property index. Probably the GC was 
disabled or has been run after long time.

OAK-2559 is related to Lucene which stores the content in binary files. Worst 
case I can expect 10k files which need to be deleted but not 12M. 

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Priority: Blocker
> Fix For: 1.2, 1.0.13
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-09 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352736#comment-14352736
 ] 

Stefan Egli commented on OAK-2557:
--

Could it be a side-effect of too many nodes being indexed due to OAK-2559 ?

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Priority: Blocker
> Fix For: 1.2, 1.0.13
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-09 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352714#comment-14352714
 ] 

Chetan Mehrotra commented on OAK-2557:
--

bq. Is that set becoming too large to cause an OOM

Indeed thats the case. The array list had 12M entries taking 3 GB of space. So 
GC logic should do the deletion in in between

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Priority: Blocker
> Fix For: 1.2, 1.0.13
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-03-02 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342911#comment-14342911
 ] 

Chetan Mehrotra commented on OAK-2557:
--

{{docIdsToDelete}} is just a set of string ids to be deleted? Is that set 
becoming too large to cause an OOM? Do we have access to the heapdump to 
determine the size of set. We can possibly change the logic to delete once we 
have got say 500 ids collected but before that would like to confirm if thats 
the case here

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>Priority: Blocker
> Fix For: 1.2, 1.0.12
>
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2557) VersionGC uses way too much memory if there is a large pile of garbage

2015-02-28 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341522#comment-14341522
 ] 

Stefan Egli commented on OAK-2557:
--

Here's a stacktrace that keeps showing up when this particular VM is running 
into low memory/full-gc-loops:

{code}
"pool-5-thread-5" prio=10 tid=0x01413800 nid=0x1aa5 runnable 
[0x7f247599f000]
   java.lang.Thread.State: RUNNABLE
at java.util.HashMap.addEntry(HashMap.java:884)
at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:427)
at java.util.HashMap.put(HashMap.java:505)
at org.bson.BasicBSONObject.put(BasicBSONObject.java:288)
at org.bson.BasicBSONCallback._put(BasicBSONCallback.java:184)
at org.bson.BasicBSONCallback.gotLong(BasicBSONCallback.java:132)
at org.bson.BasicBSONDecoder.decodeElement(BasicBSONDecoder.java:222)
at org.bson.BasicBSONDecoder._decode(BasicBSONDecoder.java:154)
at org.bson.BasicBSONDecoder.decode(BasicBSONDecoder.java:132)
at com.mongodb.DefaultDBDecoder.decode(DefaultDBDecoder.java:62)
at com.mongodb.Response.(Response.java:85)
at com.mongodb.DBPort$1.execute(DBPort.java:141)
at com.mongodb.DBPort$1.execute(DBPort.java:135)
at com.mongodb.DBPort.doOperation(DBPort.java:164)
- locked <0x00064f6049d0> (a com.mongodb.DBPort)
at com.mongodb.DBPort.call(DBPort.java:135)
at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:292)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:271)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:237)
at com.mongodb.QueryResultIterator.getMore(QueryResultIterator.java:137)
at com.mongodb.QueryResultIterator.hasNext(QueryResultIterator.java:127)
at com.mongodb.DBCursor._hasNext(DBCursor.java:551)
at com.mongodb.DBCursor.hasNext(DBCursor.java:571)
at 
com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:43)
at 
org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector.collectDeletedDocuments(VersionGarbageCollector.java:95)
at 
org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector.gc(VersionGarbageCollector.java:79)
at 
org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService$2.run(DocumentNodeStoreService.java:472)
at 
org.apache.jackrabbit.oak.spi.state.RevisionGC$1.call(RevisionGC.java:68)
at 
org.apache.jackrabbit.oak.spi.state.RevisionGC$1.call(RevisionGC.java:64)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}

> VersionGC uses way too much memory if there is a large pile of garbage
> --
>
> Key: OAK-2557
> URL: https://issues.apache.org/jira/browse/OAK-2557
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>
> It has been noticed that on a system where revision-gc 
> (VersionGarbageCollector of mongomk) did not run for a few days (due to not 
> interfering with some tests/large bulk operations) that there was such a 
> large pile of garbage accumulating, that the following code
> {code}
> VersionGarbageCollector.collectDeletedDocuments
> {code}
> in the for loop, creates such a large list of NodeDocuments to delete 
> (docIdsToDelete) that it uses up too much memory, causing the JVM's GC to 
> constantly spin in Full-GCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)