I modified the importer logic to use a custom nodeType similar to
SlingFolder (no orderable nodes) and following are the results

Segment MK
------------------

05:30:31 {benchmark} ~/git/apache/jackrabbit-oak$ java -DOAK-652=true -jar
oak-run/target/oak-run-0.7-SNAPSHOT.jar  benchmark
--wikipedia=/home/chetanm/data/oak/fowiki-20130213-pages-articles.xml
--port=27018 WikipediaImport Oak-Segment
Apache Jackrabbit Oak 0.7-SNAPSHOT
Oak-Segment: importing
/home/chetanm/data/oak/fowiki-20130213-pages-articles.xml...
Added 1000 pages in 6 seconds (6.34ms/page)
Added 2000 pages in 8 seconds (4.45ms/page)
Added 3000 pages in 11 seconds (3.67ms/page)
Added 4000 pages in 13 seconds (3.29ms/page)
Added 5000 pages in 15 seconds (3.04ms/page)
Added 6000 pages in 17 seconds (2.88ms/page)
Added 7000 pages in 19 seconds (2.81ms/page)
Added 8000 pages in 22 seconds (2.77ms/page)
Added 9000 pages in 24 seconds (2.76ms/page)
Added 10000 pages in 27 seconds (2.75ms/page)
Added 11000 pages in 30 seconds (2.75ms/page)
Added 12000 pages in 32 seconds (2.69ms/page)
Imported 12148 pages in 86 seconds (7.14ms/page)

Mongo MK
----------------

05:32:21 {benchmark} ~/git/apache/jackrabbit-oak$ java -DOAK-652=true -jar
oak-run/target/oak-run-0.7-SNAPSHOT.jar  benchmark
--wikipedia=/home/chetanm/data/oak/fowiki-20130213-pages-articles.xml
--port=27018 WikipediaImport Oak-Mongo
Apache Jackrabbit Oak 0.7-SNAPSHOT
Oak-Mongo: importing
/home/chetanm/data/oak/fowiki-20130213-pages-articles.xml...
Added 1000 pages in 4 seconds (4.84ms/page)
Added 2000 pages in 7 seconds (3.53ms/page)
Added 3000 pages in 9 seconds (3.33ms/page)
Added 4000 pages in 12 seconds (3.14ms/page)
Added 5000 pages in 14 seconds (2.93ms/page)
Added 6000 pages in 18 seconds (3.02ms/page)
Added 7000 pages in 22 seconds (3.16ms/page)
Added 8000 pages in 26 seconds (3.33ms/page)
Added 9000 pages in 29 seconds (3.30ms/page)
Added 10000 pages in 34 seconds (3.49ms/page)
Added 11000 pages in 53 seconds (4.88ms/page)
Added 12000 pages in 70 seconds (5.84ms/page)
Imported 12148 pages in 72 seconds (5.99ms/page)


This includes some cache related changes done by Thomas today. Both the
test pass with no OOM.

Further with nt:unstructured nodes I was getting error with MongoMK around
Document size exceeding the limit which I think was due to keeping multiple
revisioned copies of :childOrder array. This would be addressed going
forward by moving older revision to separate node or removing them alltoger
if possible

Chetan Mehrotra


On Tue, Feb 26, 2013 at 4:12 PM, Marcel Reutegger <mreut...@adobe.com>wrote:

> > I didn't analyze the results, but could the problem be orderable child
> > nodes? Currently, oak-core stores a property ":childOrder".
>
> no, the problem is how oak-core detects changes between two node
> state revisions. for a node with many child nodes in two revisions,
> oak-core
> currently loads all children in both revisions to find out, which child
> nodes
> were added, removed, changed or didn't change at all.
>
> I'm currently working on this issue in KernelNodeState by leveraging
> the MK.diff(). right now it simply checks if there are differences, but
> doesn't make use of the information. this should bring the cost down
> to O(N) where N is the number of modified child nodes.
>
> Please note this requires a correct implementation of MK.diff()!
>
> Regards
>  Marcel
>

Reply via email to