[ 
https://issues.apache.org/jira/browse/OAK-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316248#comment-15316248
 ] 

Chetan Mehrotra commented on OAK-4412:
--------------------------------------

Interesting work Tomek!. Before I dig deeper into the patch need some more 
understanding of the proposed approach

*Use of CommitHook*

The changes done by CommitHook may be rollbacked in case of conflict or if the 
branch is rebased. Further CommitHook can be invoked concurrently which would 
cause issue with Lucene indexing as its single threaded by design. The patch 
looks like make the CommitHook synchronous which would have adverse impact on 
writes. Instead of this I think it would be better to just rely on Observor and 
there only listen for local changes and update the index in observor call. This 
would ensure that index sees only committed changes and also does not impact 
the writes significantly.

This approach has a downside that indexes would lag behind a bit with there 
sync property index counter parts but that can be be offset a bit with sticky 
sessions. Consider following flow
# User U1 access cluster node N1 and performs some update to property "foo" 
which has a property index
# In a subsequent gesture the request hits N1 again and performs a query - With 
property index (sync) expectation here is that updates nodes in #1 would be 
visible to the query. If we switch to default "async" index then that would 
fail. However if we switch to "hybrid" then the in memory index would include 
that update and result would be as expected

This would work if there is sticky session at higher level (session here means 
user session) which is a suitable expectation for an eventually consistent 
deployment.

And a minor suggestion - If the patch can avoid significant code displacements 
that would be better. So instead of re indenting the code may be introduce a 
new method which delegates to old method in some way would help to understand 
the change better (without noise)

> Lucene-memory property index
> ----------------------------
>
>                 Key: OAK-4412
>                 URL: https://issues.apache.org/jira/browse/OAK-4412
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Tomek Rękawek
>            Assignee: Tomek Rękawek
>             Fix For: 1.6
>
>         Attachments: OAK-4412.patch
>
>
> When running Oak in a cluster, each write operation is expensive. After 
> performing some stress-tests with a geo-distributed Mongo cluster, we've 
> found out that updating property indexes is a large part of the overall 
> traffic.
> The asynchronous index would be an answer here (as the index update won't be 
> made in the client request thread), but the AEM requires the updates to be 
> visible immediately in order to work properly.
> The idea here is to enhance the existing asynchronous Lucene index with a 
> synchronous, locally-stored counterpart that will persist only the data since 
> the last Lucene background reindexing job.
> The new index can be stored in memory or (if necessary) in MMAPed local 
> files. Once the "main" Lucene index is being updated, the local index will be 
> purged.
> Queries will use an union of results from the {{lucene}} and 
> {{lucene-memory}} indexes.
> The {{lucene-memory}} index, as a local stored entity, will be updated using 
> an observer, so it'll get both local and remote changes.
> The original idea has been suggested by [~chetanm] in the discussion for the 
> OAK-4233.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to