[jira] [Commented] (OAK-9052) Reindexing using --doc-traversal-mode may need a lot of memory

2020-05-21 Thread Nitin Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112977#comment-17112977
 ] 

Nitin Gupta commented on OAK-9052:
--

A test fails for me
{code:java}
[ERROR] Tests run: 6, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.169 s 
<<< FAILURE! - in 
org.apache.jackrabbit.oak.index.indexer.document.flatfile.FlatFileStoreIteratorTest
[ERROR] 
simpleTraversal(org.apache.jackrabbit.oak.index.indexer.document.flatfile.FlatFileStoreIteratorTest)
  Time elapsed: 0.047 s  <<< ERROR!
java.lang.IllegalStateException: java.io.IOException: Unable to delete file: 
target\test
at 
org.apache.jackrabbit.oak.index.indexer.document.flatfile.FlatFileStoreIteratorTest.newFlatFileStore(FlatFileStoreIteratorTest.java:52)
at 
org.apache.jackrabbit.oak.index.indexer.document.flatfile.FlatFileStoreIteratorTest.simpleTraversal(FlatFileStoreIteratorTest.java:61)
Caused by: java.io.IOException: Unable to delete file: target\test
at 
org.apache.jackrabbit.oak.index.indexer.document.flatfile.FlatFileStoreIteratorTest.newFlatFileStore(FlatFileStoreIteratorTest.java:52)
at 
org.apache.jackrabbit.oak.index.indexer.document.flatfile.FlatFileStoreIteratorTest.simpleTraversal(FlatFileStoreIteratorTest.java:61)[ERROR]
 
invalidOrderAccess(org.apache.jackrabbit.oak.index.indexer.document.flatfile.FlatFileStoreIteratorTest)
  Time elapsed: 0 s  <<< ERROR!
java.lang.IllegalStateException: java.io.IOException: Unable to delete file: 
target\test
at 
org.apache.jackrabbit.oak.index.indexer.document.flatfile.FlatFileStoreIteratorTest.newFlatFileStore(FlatFileStoreIteratorTest.java:52)
at 
org.apache.jackrabbit.oak.index.indexer.document.flatfile.FlatFileStoreIteratorTest.invalidOrderAccess(FlatFileStoreIteratorTest.java:96)
Caused by: java.io.IOException: Unable to delete file: target\test
at 
org.apache.jackrabbit.oak.index.indexer.document.flatfile.FlatFileStoreIteratorTest.newFlatFileStore(FlatFileStoreIteratorTest.java:52)
at 
org.apache.jackrabbit.oak.index.indexer.document.flatfile.FlatFileStoreIteratorTest.invalidOrderAccess(FlatFileStoreIteratorTest.java:96)

{code}
 

Seems like specific to windows env.

> Reindexing using --doc-traversal-mode may need a lot of memory
> --
>
> Key: OAK-9052
> URL: https://issues.apache.org/jira/browse/OAK-9052
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing, mongomk
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.28.0
>
> Attachments: fileSizeOverTime.png
>
>
> Indexing using oak-run and --doc-traversal-mode uses the FlatFileStore. For 
> aggregation, there is a limit on memory usage, by default around 100 MB. 
> Depending on the content structure, this limit can be exceeded. 
> It would be good to find a way to avoid a memory limit, for example using a 
> temporary storage (a file, or a persistent key/value store).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-9052) Reindexing using --doc-traversal-mode may need a lot of memory

2020-05-12 Thread Julian Reschke (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105112#comment-17105112
 ] 

Julian Reschke commented on OAK-9052:
-

trunk: [r1877625|http://svn.apache.org/r1877625] 
[r1877497|http://svn.apache.org/r1877497]

> Reindexing using --doc-traversal-mode may need a lot of memory
> --
>
> Key: OAK-9052
> URL: https://issues.apache.org/jira/browse/OAK-9052
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing, mongomk
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.28.0
>
> Attachments: fileSizeOverTime.png
>
>
> Indexing using oak-run and --doc-traversal-mode uses the FlatFileStore. For 
> aggregation, there is a limit on memory usage, by default around 100 MB. 
> Depending on the content structure, this limit can be exceeded. 
> It would be good to find a way to avoid a memory limit, for example using a 
> temporary storage (a file, or a persistent key/value store).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-9052) Reindexing using --doc-traversal-mode may need a lot of memory

2020-05-08 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102318#comment-17102318
 ] 

Thomas Mueller commented on OAK-9052:
-

http://svn.apache.org/r1877497

> Reindexing using --doc-traversal-mode may need a lot of memory
> --
>
> Key: OAK-9052
> URL: https://issues.apache.org/jira/browse/OAK-9052
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing, mongomk
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: fileSizeOverTime.png
>
>
> Indexing using oak-run and --doc-traversal-mode uses the FlatFileStore. For 
> aggregation, there is a limit on memory usage, by default around 100 MB. 
> Depending on the content structure, this limit can be exceeded. 
> It would be good to find a way to avoid a memory limit, for example using a 
> temporary storage (a file, or a persistent key/value store).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-9052) Reindexing using --doc-traversal-mode may need a lot of memory

2020-05-07 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101789#comment-17101789
 ] 

Thomas Mueller commented on OAK-9052:
-

I added a "file size over time" diagram. It looks like there are two spikes at 
the beginning, and then it stays within 1 MB. Compacting is done every minute, 
I think that's fine, except that it's not needed if the file is smaller than 10 
MB.

> Reindexing using --doc-traversal-mode may need a lot of memory
> --
>
> Key: OAK-9052
> URL: https://issues.apache.org/jira/browse/OAK-9052
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing, mongomk
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: fileSizeOverTime.png
>
>
> Indexing using oak-run and --doc-traversal-mode uses the FlatFileStore. For 
> aggregation, there is a limit on memory usage, by default around 100 MB. 
> Depending on the content structure, this limit can be exceeded. 
> It would be good to find a way to avoid a memory limit, for example using a 
> temporary storage (a file, or a persistent key/value store).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-9052) Reindexing using --doc-traversal-mode may need a lot of memory

2020-05-06 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100829#comment-17100829
 ] 

Thomas Mueller commented on OAK-9052:
-

https://github.com/oak-indexing/jackrabbit-oak/pull/154

With the memory setting "0" (the default value), a temporary file is created 
for the linked list, so that heap memory usage is constant (around 30 MB I 
guess). Internally, a persistent key-value store, the H2 MVStore, is used (the 
same one as used by the MongoMK for the persistent cache). Every minute, the 
file is compacted (configurable using the 
"oak.indexer.linkedList.compactMillis" system property)

It's possible to use the old behavior by setting the system property 
"oak.indexer.memLimitInMB" to 100.

> Reindexing using --doc-traversal-mode may need a lot of memory
> --
>
> Key: OAK-9052
> URL: https://issues.apache.org/jira/browse/OAK-9052
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing, mongomk
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
>
> Indexing using oak-run and --doc-traversal-mode uses the FlatFileStore. For 
> aggregation, there is a limit on memory usage, by default around 100 MB. 
> Depending on the content structure, this limit can be exceeded. 
> It would be good to find a way to avoid a memory limit, for example using a 
> temporary storage (a file, or a persistent key/value store).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-9052) Reindexing using --doc-traversal-mode may need a lot of memory

2020-05-05 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099871#comment-17099871
 ] 

Thomas Mueller commented on OAK-9052:
-

Data structure:
* FlatFileBufferLinkedList is used in the second phase and contains a list of 
NodeStateEntry objects.
* NodeStateEntry.nodeState is a LazyChildrenNodeState for entries in memory, 
but can be a DocumentNodeState when reading from MongoDB (in the first phase).
* NodeStateEntry objects can be (de-)serialized using the NodeStateEntryWriter 
/ NodeStateEntryReader. That is usually only used in the first phase.
* The temp file is stored in 
temp/flat-file-store/sort-work-dir/sortInBatch...flatfile (by default using 
compression).

> Reindexing using --doc-traversal-mode may need a lot of memory
> --
>
> Key: OAK-9052
> URL: https://issues.apache.org/jira/browse/OAK-9052
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing, mongomk
>Reporter: Thomas Mueller
>Priority: Major
>
> Indexing using oak-run and --doc-traversal-mode uses the FlatFileStore. For 
> aggregation, there is a limit on memory usage, by default around 100 MB. 
> Depending on the content structure, this limit can be exceeded. 
> It would be good to find a way to avoid a memory limit, for example using a 
> temporary storage (a file, or a persistent key/value store).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)