[ 
https://issues.apache.org/jira/browse/OAK-6535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16179017#comment-16179017
 ] 

Chetan Mehrotra edited comment on OAK-6535 at 9/26/17 4:56 AM:
---------------------------------------------------------------

This feature is now ready for review

* On github - See 
[here|https://github.com/chetanmeh/jackrabbit-oak/compare/trunk...chetanmeh:OAK-6535]
* As single patch - See [here|^OAK-6535-v1.diff]
* See 
[wiki|https://wiki.apache.org/jackrabbit/Synchronous%20Lucene%20Property%20Indexes]
 for more background

h2. Implementation Details

*Indexing*
{{LuceneIndexEditor}} now supports a {{PropertyUpdateCallback}} which is 
invoked for each indexed property change. For this feature we provide a 
{{PropertyIndexUpdateCallback}} which performs the property index update as per 
property index type. 

For non unique sync index it uses {{ContentMirrorStoreStrategy}} and for unique 
it uses {{UniqueIndexStoreStrategy}}. See wiki for storage format

For non unique indexes it disables default pruning

For unique index each index entry also stores a timestamp (as epoch time) in 
{{jcr:created}}. Notes its not of type Calendar

*Query*
On query side {{IndexPlanner}} checks if the definition support sync indexes. 
If yes then it determine which sync index can be used. For a query only of the 
sync indexes can be used. It follows following rule

* If any unique index is found then that is given preference
* If multiple non unique sync indexes are found then first one is used

In case of unique index the entryCount is set to 1 such that this index reports 
almost lowest cost.

Post planning the {{LucenePropertyIndex}} would see if planner has identified 
any sync index. If yes then it returns a concatenated iterator where iterator 
provided by property index (via {{HybridPropertyIndexLookup}}) comes first. 

*Cleanup*

This feature configures a {{PropertyIndexCleaner}} job which gets periodically 
triggered (default frequency every 10 min) and does following

# First change the head bucket if there is any change in current head bucket 
state for non unique sync index. This is merged
# For non unique sync index cleanup old orphan buckets
# For unique index scan the index entries and remove those index entries whose 
{{jcr:created}} is older than lastIndexTo time of indexes indexer lane. That is 
those entries which have been moved to lucene index are removed. In doing this 
it also keeps a threshold which defaults to 1 hr

*Misc Points*

# Supports relative properties
# -Supports non root indexes- Pending OAK-6714

h2. Benchmark

The benchmark can be run via

{noformat}
java -DhybridIndexEnabled=true -DindexingMode=nrt -DsyncIndexing=true -jar 
oak-benchmark*.jar benchmark  HybridIndexTest Oak-Segment-Tar-DS
{noformat}

Here
* hybridIndexEnabled=true, syncIndexing=true - Enables this feature i.e. 'foo' 
property indexed in hybrid mode
* hybridIndexEnabled=true, syncIndexing=false - Enables just the NRT mode
* hybridIndexEnabled=false, syncIndexing=false - Enables pure property index 
mode

{noformat}
# HybridIndexTest                  C     min     10%     50%     90%     max    
   N Searcher  Mutator  Indexed
Oak-Segment-Tar-DS                 1       4       6       7       9     527    
7992 5385539     39400     49890      #nrt,oakCodec,sync
Oak-Segment-Tar-DS                 1       4       6       7      10     114    
7462 6834075     34220     46362      #property
Oak-Segment-Tar-DS                 1       4       5       6       8     508    
9063 4439786     47797     56844      #nrt,oakCodec
numOfIndexes: 10, refreshDeltaMillis: 1000, asyncInterval: 5, queueSize: 1000 , 
hybridIndexEnabled: true, indexingMode: nrt, useOakCodec: true, 
cleanerIntervalInSecs: 10, syncIndexing: true 
{noformat}


h2. Pending Stuff

*Open Items*

# Support for nodetype index
# Support for reference index 

*Points to discuss*

Apart from current impl design following aspects needs to be discussed

# Frequency of the cleaner job - Currently it is scheduled to run every 10 mins
# Threshold for unique index cleanup - Currently entries would be removed after 
1 hr of them making into persisted lucene index. This is required as the 
recorded time in index entry would not be same time as commit is made. So its 
possible if lastIndexTo refers to T1 then an entry created at T0 (T0 < T1) 
actually got persisted to repository in time T2 (T2 > T1). So this threshold 
ensures that we do not remove those entries which have yet not made it to the 
persisted lucene index

[~tmueller] [~catholicon] [~teofili] Please review the patch. I would keep this 
open for this week so that you get time. Plan to merge next week


was (Author: chetanm):
This feature is now ready for review

* On github - See 
[here|https://github.com/chetanmeh/jackrabbit-oak/compare/trunk...chetanmeh:OAK-6535]
* As single patch - See [here|^OAK-6535-v1.diff]
* See 
[wiki|https://wiki.apache.org/jackrabbit/Synchronous%20Lucene%20Property%20Indexes]
 for more background

h2. Implementation Details

*Indexing*
{{LuceneIndexEditor}} now supports a {{PropertyUpdateCallback}} which is 
invoked for each indexed property change. For this feature we provide a 
{{PropertyIndexUpdateCallback}} which performs the property index update as per 
property index type. 

For non unique sync index it uses {{ContentMirrorStoreStrategy}} and for unique 
it uses {{UniqueIndexStoreStrategy}}. See wiki for storage format

For non unique indexes it disables default pruning

For unique index each index entry also stores a timestamp (as epoch time) in 
{{jcr:created}}. Notes its not of type Calendar

*Query*
On query side {{IndexPlanner}} checks if the definition support sync indexes. 
If yes then it determine which sync index can be used. For a query only of the 
sync indexes can be used. It follows following rule

* If any unique index is found then that is given preference
* If multiple non unique sync indexes are found then first one is used

In case of unique index the entryCount is set to 1 such that this index reports 
almost lowest cost.

Post planning the {{LucenePropertyIndex}} would see if planner has identified 
any sync index. If yes then it returns a concatenated iterator where iterator 
provided by property index (via {{HybridPropertyIndexLookup}}) comes first. 

*Cleanup*

This feature configures a {{PropertyIndexCleaner}} job which gets periodically 
triggered (default frequency every 10 min) and does following

# First change the head bucket if there is any change in current head bucket 
state for non unique sync index. This is merged
# For non unique sync index cleanup old orphan buckets
# For unique index scan the index entries and remove those index entries whose 
{{jcr:created}} is older than lastIndexTo time of indexes indexer lane. That is 
those entries which have been moved to lucene index are removed. In doing this 
it also keeps a threshold which defaults to 1 hr

*Misc Points*

# Supports relative properties
# -Supports non root indexes- Pending OAK-6714

h2. Benchmark

The benchmark can be run via

{noformat}
java -DhybridIndexEnabled=true -DindexingMode=nrt -DsyncIndexing=true -jar 
oak-benchmark*.jar benchmark  HybridIndexTest Oak-Segment-Tar-DS
{noformat}

Here
* hybridIndexEnabled=true, syncIndexing=true - Enables this feature i.e. 'foo' 
property indexed in hybrid mode
* hybridIndexEnabled=true, syncIndexing=false - Enables just the NRT mode
* hybridIndexEnabled=false, syncIndexing=false - Enables pure property index 
mode

{noformat}
# HybridIndexTest                  C     min     10%     50%     90%     max    
   N Searcher  Mutator  Indexed
Oak-Segment-Tar-DS                 1       4       6       7       9     527    
7992 5385539     39400     49890      #nrt,oakCodec,sync
Oak-Segment-Tar-DS                 1       4       6       7      10     114    
7462 6834075     34220     46362      #property
Oak-Segment-Tar-DS                 1       4       5       6       8     508    
9063 4439786     47797     56844      #nrt,oakCodec
numOfIndexes: 10, refreshDeltaMillis: 1000, asyncInterval: 5, queueSize: 1000 , 
hybridIndexEnabled: true, indexingMode: nrt, useOakCodec: true, 
cleanerIntervalInSecs: 10, syncIndexing: true 
{noformat}


h2. Pending Stuff

*Open Items*

# Support for nodetype index
# Support for reference index 

*Points to discuss*

Apart from current impl design following aspects needs to be discussed

# Frequency of the cleaner job - Currently it is scheduled to run every 10 mins
# Threshold for unique index cleanup - Currently entries would be removed after 
1 hr of them making into persisted lucene index

[~tmueller] [~catholicon] [~teofili] Please review the patch. I would keep this 
open for this week so that you get time. Plan to merge next week

> Synchronous Lucene Property Indexes
> -----------------------------------
>
>                 Key: OAK-6535
>                 URL: https://issues.apache.org/jira/browse/OAK-6535
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: lucene, property-index
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.8
>
>         Attachments: OAK-6535-v1.diff
>
>
> Oak 1.6 added support for Lucene Hybrid Index (OAK-4412). That enables near 
> real time (NRT) support for Lucene based indexes. It also had a limited 
> support for sync indexes. This feature aims to improve that to next level and 
> enable support for sync property indexes.
> More details at 
> https://wiki.apache.org/jackrabbit/Synchronous%20Lucene%20Property%20Indexes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to