[jira] Commented: (JCR-2524) Reduce memory usage of DocIds
[ https://issues.apache.org/jira/browse/JCR-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847282#action_12847282 ] Marcel Reutegger commented on JCR-2524: --- Removed System.out debug calls in test class. svn revision: 925141 > Reduce memory usage of DocIds > - > > Key: JCR-2524 > URL: https://issues.apache.org/jira/browse/JCR-2524 > Project: Jackrabbit Content Repository > Issue Type: Improvement > Components: jackrabbit-core >Reporter: Marcel Reutegger >Priority: Minor > Fix For: 2.1.0 > > Attachments: JCR-2524.patch, JCR-2524.patch > > > Implementations of DocIds are used to cache parent child relations of nodes > in the index. Usually there are a lot of duplicate objects because a DocId > instance is used to identify the parent of a node in the index. That is, > sibling nodes will all have DocIds with the same value. Currently a new DocId > instance is created for each node. Caching the most recently used DocIds and > reuse them might help to reduce the memory usage. Furthermore there are > DocIds that could be represented with a short instead of an int when possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2524) Reduce memory usage of DocIds
[ https://issues.apache.org/jira/browse/JCR-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841737#action_12841737 ] Marcel Reutegger commented on JCR-2524: --- hmm, you are right. should have looked more closely what the memory analyzer reported. here's another idea: - use int arrays and create PlainDocIds on the fly (possibly using cached instances) - a special value in the int array marks the existence of a UUIDDocId, which are held in a separate map > Reduce memory usage of DocIds > - > > Key: JCR-2524 > URL: https://issues.apache.org/jira/browse/JCR-2524 > Project: Jackrabbit Content Repository > Issue Type: Improvement > Components: jackrabbit-core >Reporter: Marcel Reutegger >Priority: Minor > Attachments: JCR-2524.patch > > > Implementations of DocIds are used to cache parent child relations of nodes > in the index. Usually there are a lot of duplicate objects because a DocId > instance is used to identify the parent of a node in the index. That is, > sibling nodes will all have DocIds with the same value. Currently a new DocId > instance is created for each node. Caching the most recently used DocIds and > reuse them might help to reduce the memory usage. Furthermore there are > DocIds that could be represented with a short instead of an int when possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2524) Reduce memory usage of DocIds
[ https://issues.apache.org/jira/browse/JCR-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840074#action_12840074 ] Thomas Mueller commented on JCR-2524: - > Caching the most recently used DocIds and reuse them might help to reduce the > memory usage +1 > DocIds that could be represented with a short instead of an int According to my test, this will not reduce memory usage: http://h2database.com/p.html#da4c6a321d0dc84a2b7b96cdbf468a47 For the Sun JVM (JDK 1.5, 32 bit), objects with one field of type boolean, byte, short, character, integer, and long all need 16 bytes. A small BigInteger uses 56 bytes, a small BigDecimal uses 32 bytes (probably re-uses the same BigInteger internally), and a String uses 24 bytes. Object uses 8 bytes. For JDK 1.6, 32 bit and 64 bit, it's a bit different: 20 bytes for an object, 24 bytes for boolean - long. For JDK 1.5, 64 bit, it's again different: 16 bytes for an object, 24 bytes for boolean - long. > Reduce memory usage of DocIds > - > > Key: JCR-2524 > URL: https://issues.apache.org/jira/browse/JCR-2524 > Project: Jackrabbit Content Repository > Issue Type: Improvement > Components: jackrabbit-core >Reporter: Marcel Reutegger >Priority: Minor > Attachments: JCR-2524.patch > > > Implementations of DocIds are used to cache parent child relations of nodes > in the index. Usually there are a lot of duplicate objects because a DocId > instance is used to identify the parent of a node in the index. That is, > sibling nodes will all have DocIds with the same value. Currently a new DocId > instance is created for each node. Caching the most recently used DocIds and > reuse them might help to reduce the memory usage. Furthermore there are > DocIds that could be represented with a short instead of an int when possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2524) Reduce memory usage of DocIds
[ https://issues.apache.org/jira/browse/JCR-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839855#action_12839855 ] Marcel Reutegger commented on JCR-2524: --- Forgot to mention that the proposed patch reduces the memory usage to about a third. > Reduce memory usage of DocIds > - > > Key: JCR-2524 > URL: https://issues.apache.org/jira/browse/JCR-2524 > Project: Jackrabbit Content Repository > Issue Type: Improvement > Components: jackrabbit-core >Reporter: Marcel Reutegger >Priority: Minor > Attachments: JCR-2524.patch > > > Implementations of DocIds are used to cache parent child relations of nodes > in the index. Usually there are a lot of duplicate objects because a DocId > instance is used to identify the parent of a node in the index. That is, > sibling nodes will all have DocIds with the same value. Currently a new DocId > instance is created for each node. Caching the most recently used DocIds and > reuse them might help to reduce the memory usage. Furthermore there are > DocIds that could be represented with a short instead of an int when possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (JCR-2524) Reduce memory usage of DocIds
[ https://issues.apache.org/jira/browse/JCR-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839717#action_12839717 ] Marcel Reutegger commented on JCR-2524: --- Some memory stats from a real life system: a fully populated DocId cache for 300'000 nodes consumes about 6MB of heap. > Reduce memory usage of DocIds > - > > Key: JCR-2524 > URL: https://issues.apache.org/jira/browse/JCR-2524 > Project: Jackrabbit Content Repository > Issue Type: Improvement > Components: jackrabbit-core >Reporter: Marcel Reutegger >Priority: Minor > > Implementations of DocIds are used to cache parent child relations of nodes > in the index. Usually there are a lot of duplicate objects because a DocId > instance is used to identify the parent of a node in the index. That is, > sibling nodes will all have DocIds with the same value. Currently a new DocId > instance is created for each node. Caching the most recently used DocIds and > reuse them might help to reduce the memory usage. Furthermore there are > DocIds that could be represented with a short instead of an int when possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.