Hi, On Tue, Feb 26, 2013 at 2:19 PM, Chetan Mehrotra <chetan.mehro...@gmail.com> wrote: > I modified the importer logic to use a custom nodeType similar to > SlingFolder (no orderable nodes) and following are the results
Thanks! It indeed looks like the orderability is an issue here. With the oak:unstructured type I added in OAK-657 and a few more improvements and fixes to the SegmentMK I can now import also the Simplified English wiki, with 167k pages: $ java -Xmx500m -DOAK-652=true -jar oak-run/target/oak-run-0.7-SNAPSHOT.jar \ benchmark --wikipedia=simplewiki-20130214-pages-articles.xml --cache=200 \ WikipediaImport Oak-Segment Apache Jackrabbit Oak 0.7-SNAPSHOT Oak-Segment: Wikipedia import benchmark Importing simplewiki-20130214-pages-articles.xml... Added 1000 pages in 1 seconds (1.35ms/page) [...] Added 167000 pages in 467 seconds (2.80ms/page) Imported 167404 pages in 1799 seconds (10.75ms/page) The speed of transient operations slows down slightly over time mostly since initially everything is cached and later cache misses become more frequent. Note the new --cache option that can be used to control the size (in MB) of the segment cache. Ideally, for better comparison, we'd also make it control the cache used by the MongoMK. There are still a few problems, most notably the fact the index update hook operates directly on the plain MemoryNodeBuilder used by the current SegmentMK, so it won't benefit from the automatic purging of large change-sets and thus ends up requiring lots of memory during the massive final save() call. Something like a SegmentNodeBuilder with similar internal purge logic like what we already prototyped in KernelNodeState should solve that issue. The other big issue is the large amount of time spent processing the commit hooks. The one hook approach I outlined earlier should help us there. BR, Jukka Zitting