Current update 1. Tommaso provided a patch (OAK-1702) to disable compression and that also helps quite a bit 2. Currently we are storing the full tokenized text in Lucene Index [1]. This would cause fetching of doc fields to be slower. On disabling the storage the number improve quite a bit. This was added as part of OAK-319 for supporting MLT
# FullTextSearchTest C min 10% 50% 90% max N Oak-Tar (codec) 1 9 9 10 12 41 5664 Oak-Tar (codec,mlt off) 1 7 8 8 10 21 6921 Would look further Chetan Mehrotra [1] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/FieldFactory.java#L44 On Wed, Apr 9, 2014 at 7:15 PM, Alex Parvulescu <alex.parvule...@gmail.com> wrote: > Aside from the compression issue, there was another one related to the > 'order by' clause. I saw Collections.sort taking up as far as 23% of the > perf. > > I removed the order by temporarily so it doesn't get in the way of the > Lucene stuff, but I think the QueryEngine should skip ordering results in > this case. > > > > > On Wed, Apr 9, 2014 at 3:31 PM, Tommaso Teofili > <tommaso.teof...@gmail.com>wrote: > >> I'm looking into the Lucene codecs right now. >> >> Tommaso >> >> >> 2014-04-09 15:20 GMT+02:00 Alex Parvulescu <alex.parvule...@gmail.com>: >> >> > Profiling the result shows that quite a bit of time goes in >> > org.apache.lucene.codecs.compressing.LZ4.decompress() (40%). This I >> > think is part of Lucene 4.x and not present in 3.x. Any idea if I can >> > disable compression? >> > >> > +1 I noticed that too, we should try to disable compression and compare >> > results. >> > >> > alex >> > >> > >> > On Wed, Apr 9, 2014 at 3:16 PM, Chetan Mehrotra >> > <chetan.mehro...@gmail.com>wrote: >> > >> > > On Wed, Apr 9, 2014 at 5:14 PM, Jukka Zitting <jukka.zitt...@gmail.com >> > >> > > wrote: >> > > > Is that a common use case? To better simulate a normal usage scenario >> > > > I'd make the benchmark fetch up to N results (where N is >> configurable, >> > > > with default something like 20) and access the path and the title >> > > > property of the matching nodes. >> > > >> > > I changed the logic of benchmark in http://svn.apache.org/r1585962. >> > > With that JR2 slows down a bit >> > > >> > > # FullTextSearchTest C min 10% 50% 90% >> > > max N >> > > Oak-Tar 1 34 35 36 39 >> > > 60 1639 >> > > Jackrabbit 1 5 5 6 7 >> > > 68 10038 >> > > >> > > Profiling the result shows that quite a bit of time goes in >> > > org.apache.lucene.codecs.compressing.LZ4.decompress() (40%). This I >> > > think is part of Lucene 4.x and not present in 3.x. Any idea if I can >> > > disable compression? >> > > >> > > Chetan Mehrotra >> > > >> > >>