[jira] [Commented] (OAK-9052) Reindexing using --doc-traversal-mode may need a lot of memory
[ https://issues.apache.org/jira/browse/OAK-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112977#comment-17112977 ] Nitin Gupta commented on OAK-9052: -- A test fails for me {code:java} [ERROR] Tests run: 6, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.169 s <<< FAILURE! - in org.apache.jackrabbit.oak.index.indexer.document.flatfile.FlatFileStoreIteratorTest [ERROR] simpleTraversal(org.apache.jackrabbit.oak.index.indexer.document.flatfile.FlatFileStoreIteratorTest) Time elapsed: 0.047 s <<< ERROR! java.lang.IllegalStateException: java.io.IOException: Unable to delete file: target\test at org.apache.jackrabbit.oak.index.indexer.document.flatfile.FlatFileStoreIteratorTest.newFlatFileStore(FlatFileStoreIteratorTest.java:52) at org.apache.jackrabbit.oak.index.indexer.document.flatfile.FlatFileStoreIteratorTest.simpleTraversal(FlatFileStoreIteratorTest.java:61) Caused by: java.io.IOException: Unable to delete file: target\test at org.apache.jackrabbit.oak.index.indexer.document.flatfile.FlatFileStoreIteratorTest.newFlatFileStore(FlatFileStoreIteratorTest.java:52) at org.apache.jackrabbit.oak.index.indexer.document.flatfile.FlatFileStoreIteratorTest.simpleTraversal(FlatFileStoreIteratorTest.java:61)[ERROR] invalidOrderAccess(org.apache.jackrabbit.oak.index.indexer.document.flatfile.FlatFileStoreIteratorTest) Time elapsed: 0 s <<< ERROR! java.lang.IllegalStateException: java.io.IOException: Unable to delete file: target\test at org.apache.jackrabbit.oak.index.indexer.document.flatfile.FlatFileStoreIteratorTest.newFlatFileStore(FlatFileStoreIteratorTest.java:52) at org.apache.jackrabbit.oak.index.indexer.document.flatfile.FlatFileStoreIteratorTest.invalidOrderAccess(FlatFileStoreIteratorTest.java:96) Caused by: java.io.IOException: Unable to delete file: target\test at org.apache.jackrabbit.oak.index.indexer.document.flatfile.FlatFileStoreIteratorTest.newFlatFileStore(FlatFileStoreIteratorTest.java:52) at org.apache.jackrabbit.oak.index.indexer.document.flatfile.FlatFileStoreIteratorTest.invalidOrderAccess(FlatFileStoreIteratorTest.java:96) {code} Seems like specific to windows env. > Reindexing using --doc-traversal-mode may need a lot of memory > -- > > Key: OAK-9052 > URL: https://issues.apache.org/jira/browse/OAK-9052 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing, mongomk >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.28.0 > > Attachments: fileSizeOverTime.png > > > Indexing using oak-run and --doc-traversal-mode uses the FlatFileStore. For > aggregation, there is a limit on memory usage, by default around 100 MB. > Depending on the content structure, this limit can be exceeded. > It would be good to find a way to avoid a memory limit, for example using a > temporary storage (a file, or a persistent key/value store). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-9052) Reindexing using --doc-traversal-mode may need a lot of memory
[ https://issues.apache.org/jira/browse/OAK-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105112#comment-17105112 ] Julian Reschke commented on OAK-9052: - trunk: [r1877625|http://svn.apache.org/r1877625] [r1877497|http://svn.apache.org/r1877497] > Reindexing using --doc-traversal-mode may need a lot of memory > -- > > Key: OAK-9052 > URL: https://issues.apache.org/jira/browse/OAK-9052 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing, mongomk >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Fix For: 1.28.0 > > Attachments: fileSizeOverTime.png > > > Indexing using oak-run and --doc-traversal-mode uses the FlatFileStore. For > aggregation, there is a limit on memory usage, by default around 100 MB. > Depending on the content structure, this limit can be exceeded. > It would be good to find a way to avoid a memory limit, for example using a > temporary storage (a file, or a persistent key/value store). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-9052) Reindexing using --doc-traversal-mode may need a lot of memory
[ https://issues.apache.org/jira/browse/OAK-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102318#comment-17102318 ] Thomas Mueller commented on OAK-9052: - http://svn.apache.org/r1877497 > Reindexing using --doc-traversal-mode may need a lot of memory > -- > > Key: OAK-9052 > URL: https://issues.apache.org/jira/browse/OAK-9052 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing, mongomk >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: fileSizeOverTime.png > > > Indexing using oak-run and --doc-traversal-mode uses the FlatFileStore. For > aggregation, there is a limit on memory usage, by default around 100 MB. > Depending on the content structure, this limit can be exceeded. > It would be good to find a way to avoid a memory limit, for example using a > temporary storage (a file, or a persistent key/value store). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-9052) Reindexing using --doc-traversal-mode may need a lot of memory
[ https://issues.apache.org/jira/browse/OAK-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101789#comment-17101789 ] Thomas Mueller commented on OAK-9052: - I added a "file size over time" diagram. It looks like there are two spikes at the beginning, and then it stays within 1 MB. Compacting is done every minute, I think that's fine, except that it's not needed if the file is smaller than 10 MB. > Reindexing using --doc-traversal-mode may need a lot of memory > -- > > Key: OAK-9052 > URL: https://issues.apache.org/jira/browse/OAK-9052 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing, mongomk >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > Attachments: fileSizeOverTime.png > > > Indexing using oak-run and --doc-traversal-mode uses the FlatFileStore. For > aggregation, there is a limit on memory usage, by default around 100 MB. > Depending on the content structure, this limit can be exceeded. > It would be good to find a way to avoid a memory limit, for example using a > temporary storage (a file, or a persistent key/value store). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-9052) Reindexing using --doc-traversal-mode may need a lot of memory
[ https://issues.apache.org/jira/browse/OAK-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100829#comment-17100829 ] Thomas Mueller commented on OAK-9052: - https://github.com/oak-indexing/jackrabbit-oak/pull/154 With the memory setting "0" (the default value), a temporary file is created for the linked list, so that heap memory usage is constant (around 30 MB I guess). Internally, a persistent key-value store, the H2 MVStore, is used (the same one as used by the MongoMK for the persistent cache). Every minute, the file is compacted (configurable using the "oak.indexer.linkedList.compactMillis" system property) It's possible to use the old behavior by setting the system property "oak.indexer.memLimitInMB" to 100. > Reindexing using --doc-traversal-mode may need a lot of memory > -- > > Key: OAK-9052 > URL: https://issues.apache.org/jira/browse/OAK-9052 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing, mongomk >Reporter: Thomas Mueller >Assignee: Thomas Mueller >Priority: Major > > Indexing using oak-run and --doc-traversal-mode uses the FlatFileStore. For > aggregation, there is a limit on memory usage, by default around 100 MB. > Depending on the content structure, this limit can be exceeded. > It would be good to find a way to avoid a memory limit, for example using a > temporary storage (a file, or a persistent key/value store). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-9052) Reindexing using --doc-traversal-mode may need a lot of memory
[ https://issues.apache.org/jira/browse/OAK-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099871#comment-17099871 ] Thomas Mueller commented on OAK-9052: - Data structure: * FlatFileBufferLinkedList is used in the second phase and contains a list of NodeStateEntry objects. * NodeStateEntry.nodeState is a LazyChildrenNodeState for entries in memory, but can be a DocumentNodeState when reading from MongoDB (in the first phase). * NodeStateEntry objects can be (de-)serialized using the NodeStateEntryWriter / NodeStateEntryReader. That is usually only used in the first phase. * The temp file is stored in temp/flat-file-store/sort-work-dir/sortInBatch...flatfile (by default using compression). > Reindexing using --doc-traversal-mode may need a lot of memory > -- > > Key: OAK-9052 > URL: https://issues.apache.org/jira/browse/OAK-9052 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: indexing, mongomk >Reporter: Thomas Mueller >Priority: Major > > Indexing using oak-run and --doc-traversal-mode uses the FlatFileStore. For > aggregation, there is a limit on memory usage, by default around 100 MB. > Depending on the content structure, this limit can be exceeded. > It would be good to find a way to avoid a memory limit, for example using a > temporary storage (a file, or a persistent key/value store). -- This message was sent by Atlassian Jira (v8.3.4#803005)