Re: Slow full text query performance and Lucene Index handling in Oak

Thomas Mueller Wed, 09 Apr 2014 00:18:07 -0700

Hi,

Do we still have the option to store the Lucene files in the file system?
If we have, maybe we could run the test with that option and see if it
improves performance? I'm not suggesting this is a solution, it's just one
step to better analyze things. And it might be easy to do.


Regards,
Thomas



On 08/04/14 17:51, "Chetan Mehrotra" <chetan.mehro...@gmail.com> wrote:

>Hi,
>
>As part of OAK-1702 I have added a benchmark to compare the
>performance of Full text query search with JR2
>
>Based on approach taken (which might be wrong) I get following numbers
>
>Apache Jackrabbit Oak 0.21.0-SNAPSHOT
># FullTextSearchTest               C     min     10%     50%     90%
>  max       N
>Oak-Mongo                          1      58      71     101     119
>  287     610
>Oak-Mongo-FDS                      1      50      51      52      58
>  184    1106
>Oak-Tar                            1      39      40      40      44
>   64    1459
>Oak-Tar-FDS                        1      53      54      55      64
>  197    1030
>Jackrabbit                         1       4       4       5       6
>  231   11385
>
>Which shows that JR2 performs lot better for full text queries and
>subsequent queries are quite faster once Lucene has warmed up.
>
>Looking at current usage of Lucene in Oak and the way we store and
>access the Lucene indexes [2] I have couple of doubts
>
>1. Multiple IndexSearcher instances - Current impl would create a new
>IndexSearcher for every Lucene query as the OakDirectory uses is bound
>to NodeState of executing JCR session. Compared to this in JR2 we
>probably had a singleton IndexSearcher which was shared across all the
>query execution path. This would potentially cause performance issue
>as Lucene is effectively used in a state less way and it has to
>perform initialization for every call. As [3] the IndexSearcher must
>be shared
>
>2. Index Access - Currently we have custom OakDirectory which provides
>access to Lucene indexes stored in NodeStore. Even with SegmentStore
>which has memory mapped file the random access used by Lucene would
>probably be lot slower with OakDirectory in comparison to default
>Lucene MMapDirectory. For small setups where Lucene index can be
>accomodated on each node I think it would be better if the index is
>access from file system
>
>Are the above concerns valid and should we relook into how we are
>using Lucene in Oak?
>
>Chetan Mehrotra
>[1] https://issues.apache.org/jira/browse/OAK-1702
>[2] 
>https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/ja
>va/org/apache/jackrabbit/oak/plugins/index/lucene/OakDirectory.java
>[3] http://wiki.apache.org/lucene-java/ImproveSearchingSpeed

Re: Slow full text query performance and Lucene Index handling in Oak

Reply via email to