I modified the importer logic to use a custom nodeType similar to SlingFolder (no orderable nodes) and following are the results
Segment MK ------------------ 05:30:31 {benchmark} ~/git/apache/jackrabbit-oak$ java -DOAK-652=true -jar oak-run/target/oak-run-0.7-SNAPSHOT.jar benchmark --wikipedia=/home/chetanm/data/oak/fowiki-20130213-pages-articles.xml --port=27018 WikipediaImport Oak-Segment Apache Jackrabbit Oak 0.7-SNAPSHOT Oak-Segment: importing /home/chetanm/data/oak/fowiki-20130213-pages-articles.xml... Added 1000 pages in 6 seconds (6.34ms/page) Added 2000 pages in 8 seconds (4.45ms/page) Added 3000 pages in 11 seconds (3.67ms/page) Added 4000 pages in 13 seconds (3.29ms/page) Added 5000 pages in 15 seconds (3.04ms/page) Added 6000 pages in 17 seconds (2.88ms/page) Added 7000 pages in 19 seconds (2.81ms/page) Added 8000 pages in 22 seconds (2.77ms/page) Added 9000 pages in 24 seconds (2.76ms/page) Added 10000 pages in 27 seconds (2.75ms/page) Added 11000 pages in 30 seconds (2.75ms/page) Added 12000 pages in 32 seconds (2.69ms/page) Imported 12148 pages in 86 seconds (7.14ms/page) Mongo MK ---------------- 05:32:21 {benchmark} ~/git/apache/jackrabbit-oak$ java -DOAK-652=true -jar oak-run/target/oak-run-0.7-SNAPSHOT.jar benchmark --wikipedia=/home/chetanm/data/oak/fowiki-20130213-pages-articles.xml --port=27018 WikipediaImport Oak-Mongo Apache Jackrabbit Oak 0.7-SNAPSHOT Oak-Mongo: importing /home/chetanm/data/oak/fowiki-20130213-pages-articles.xml... Added 1000 pages in 4 seconds (4.84ms/page) Added 2000 pages in 7 seconds (3.53ms/page) Added 3000 pages in 9 seconds (3.33ms/page) Added 4000 pages in 12 seconds (3.14ms/page) Added 5000 pages in 14 seconds (2.93ms/page) Added 6000 pages in 18 seconds (3.02ms/page) Added 7000 pages in 22 seconds (3.16ms/page) Added 8000 pages in 26 seconds (3.33ms/page) Added 9000 pages in 29 seconds (3.30ms/page) Added 10000 pages in 34 seconds (3.49ms/page) Added 11000 pages in 53 seconds (4.88ms/page) Added 12000 pages in 70 seconds (5.84ms/page) Imported 12148 pages in 72 seconds (5.99ms/page) This includes some cache related changes done by Thomas today. Both the test pass with no OOM. Further with nt:unstructured nodes I was getting error with MongoMK around Document size exceeding the limit which I think was due to keeping multiple revisioned copies of :childOrder array. This would be addressed going forward by moving older revision to separate node or removing them alltoger if possible Chetan Mehrotra On Tue, Feb 26, 2013 at 4:12 PM, Marcel Reutegger <mreut...@adobe.com>wrote: > > I didn't analyze the results, but could the problem be orderable child > > nodes? Currently, oak-core stores a property ":childOrder". > > no, the problem is how oak-core detects changes between two node > state revisions. for a node with many child nodes in two revisions, > oak-core > currently loads all children in both revisions to find out, which child > nodes > were added, removed, changed or didn't change at all. > > I'm currently working on this issue in KernelNodeState by leveraging > the MK.diff(). right now it simply checks if there are differences, but > doesn't make use of the information. this should bring the cost down > to O(N) where N is the number of modified child nodes. > > Please note this requires a correct implementation of MK.diff()! > > Regards > Marcel >