[ 
https://issues.apache.org/jira/browse/OAK-6535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16179017#comment-16179017
 ] 

Chetan Mehrotra edited comment on OAK-6535 at 9/25/17 1:24 PM:
---------------------------------------------------------------

This feature is now ready for review

* On github - See 
[here|https://github.com/chetanmeh/jackrabbit-oak/compare/trunk...chetanmeh:OAK-6535]
* As single patch - See [here|^OAK-6535-v1.diff]
* See 
[wiki|https://wiki.apache.org/jackrabbit/Synchronous%20Lucene%20Property%20Indexes]
 for more background

h2. Implementation Details

*Indexing*
{{LuceneIndexEditor}} now supports a {{PropertyUpdateCallback}} which is 
invoked for each indexed property change. For this feature we provide a 
{{PropertyIndexUpdateCallback}} which performs the property index update as per 
property index type. 

For non unique sync index it uses {{ContentMirrorStoreStrategy}} and for unique 
it uses {{UniqueIndexStoreStrategy}}. See wiki for storage format

For non unique indexes it disables default pruning

For unique index each index entry also stores a timestamp (as epoch time) in 
{{jcr:created}}. Notes its not of type Calendar

*Query*
On query side {{IndexPlanner}} checks if the definition support sync indexes. 
If yes then it determine which sync index can be used. For a query only of the 
sync indexes can be used. It follows following rule

* If any unique index is found then that is given preference
* If multiple non unique sync indexes are found then first one is used

In case of unique index the entryCount is set to 1 such that this index reports 
almost lowest cost.

Post planning the {{LucenePropertyIndex}} would see if planner has identified 
any sync index. If yes then it returns a concatenated iterator where iterator 
provided by property index (via {{HybridPropertyIndexLookup}}) comes first. 

*Cleanup*

This feature configures a {{PropertyIndexCleaner}} job which gets periodically 
triggered (default frequency every 10 min) and does following

# First change the head bucket if there is any change in current head bucket 
state for non unique sync index. This is merged
# For non unique sync index cleanup old orphan buckets
# For unique index scan the index entries and remove those index entries whose 
{{jcr:created}} is older than lastIndexTo time of indexes indexer lane. That is 
those entries which have been moved to lucene index are removed. In doing this 
it also keeps a threshold which defaults to 1 hr

*Misc Points*

# Supports relative properties
# Supports non root indexes

h2. Benchmark

The benchmark can be run via

{noformat}
java -DhybridIndexEnabled=true -DindexingMode=nrt -DsyncIndexing=true -jar 
oak-benchmark*.jar benchmark  HybridIndexTest Oak-Segment-Tar-DS
{noformat}

Here
* hybridIndexEnabled=true, syncIndexing=true - Enables this feature i.e. 'foo' 
property indexed in hybrid mode
* hybridIndexEnabled=true, syncIndexing=false - Enables just the NRT mode
* hybridIndexEnabled=false, syncIndexing=false - Enables pure property index 
mode

{noformat}
# HybridIndexTest                  C     min     10%     50%     90%     max    
   N Searcher  Mutator  Indexed
Oak-Segment-Tar-DS                 1       4       6       7       9     527    
7992 5385539     39400     49890      #nrt,oakCodec,sync
Oak-Segment-Tar-DS                 1       4       6       7      10     114    
7462 6834075     34220     46362      #property
Oak-Segment-Tar-DS                 1       4       5       6       8     508    
9063 4439786     47797     56844      #nrt,oakCodec
numOfIndexes: 10, refreshDeltaMillis: 1000, asyncInterval: 5, queueSize: 1000 , 
hybridIndexEnabled: true, indexingMode: nrt, useOakCodec: true, 
cleanerIntervalInSecs: 10, syncIndexing: true 
{noformat}


h2. Pending Stuff

*Open Items*

# Support for nodetype index
# Support for reference index 

*Points to discuss*

Apart from current impl design following aspects needs to be discussed

# Frequency of the cleaner job - Currently it is scheduled to run every 10 mins
# Threshold for unique index cleanup - Currently entries would be removed after 
1 hr of them making into persisted lucene index

[~tmueller] [~catholicon] [~teofili] Please review the patch. I would keep this 
open for this week so that you get time. Plan to merge next week


was (Author: chetanm):
This feature is now ready for review

* On github - See 
[here|https://github.com/chetanmeh/jackrabbit-oak/compare/trunk...chetanmeh:OAK-6535]
* As single patch - See [here|^OAK-6535-v1.diff]
* See 
[wiki|https://wiki.apache.org/jackrabbit/Synchronous%20Lucene%20Property%20Indexes]
 for more background

h3. Implementation Details

*Indexing*
{{LuceneIndexEditor}} now supports a {{PropertyUpdateCallback}} which is 
invoked for each indexed property change. For this feature we provide a 
{{PropertyIndexUpdateCallback}} which performs the property index update as per 
property index type. For non unique sync index it uses 
{{ContentMirrorStoreStrategy}} and for unique it uses 
{{UniqueIndexStoreStrategy}}. See wiki for storage format

For unique index each index entry also stores a timestamp (as epoch time) in 
{{jcr:created}}. Notes its not of type Calendar

*Query*
On query side {{IndexPlanner}} checks if the definition support sync indexes. 
If yes then it determine which sync index can be used. For a query only of the 
sync indexes can be used. It follows following rule

* If any unique index is found then that is given preference
* If multiple non unique sync indexes are found then first one is used

In case of unique index the entryCount is set to 1 such that this index reports 
almost lowest cost.

Post planning the {{LucenePropertyIndex}} would see if planner has identified 
any sync index. If yes then it returns a concatenated iterator where iterator 
provided by property index (via {{HybridPropertyIndexLookup}}) comes first. 

*Cleanup*

This feature configures a {{PropertyIndexCleaner}} job which gets periodically 
triggered (default frequency every 10 min) and does following

# First change the head bucket if there is any change in current head bucket 
state for non unique sync index. This is merged
# For non unique sync index cleanup old orphan buckets
# For unique index scan the index entries and remove those index entries whose 
{{jcr:created}} is older than lastIndexTo time of indexes indexer lane. That is 
those entries which have been moved to lucene index are removed. In doing this 
it also keeps a threshold which defaults to 1 hr

h3. Benchmark

The benchmark can be run via

{noformat}
java -DhybridIndexEnabled=true -DindexingMode=nrt -DsyncIndexing=true -jar 
oak-benchmark*.jar benchmark  HybridIndexTest Oak-Segment-Tar-DS
{noformat}

Here
* hybridIndexEnabled=true, syncIndexing=true - Enables this feature i.e. 'foo' 
property indexed in hybrid mode
* hybridIndexEnabled=true, syncIndexing=false - Enables just the NRT mode
* hybridIndexEnabled=false, syncIndexing=false - Enables pure property index 
mode

{noformat}
# HybridIndexTest                  C     min     10%     50%     90%     max    
   N Searcher  Mutator  Indexed
Oak-Segment-Tar-DS                 1       4       6       7       9     527    
7992 5385539     39400     49890      #nrt,oakCodec,sync
Oak-Segment-Tar-DS                 1       4       6       7      10     114    
7462 6834075     34220     46362      #property
Oak-Segment-Tar-DS                 1       4       5       6       8     508    
9063 4439786     47797     56844      #nrt,oakCodec
numOfIndexes: 10, refreshDeltaMillis: 1000, asyncInterval: 5, queueSize: 1000 , 
hybridIndexEnabled: true, indexingMode: nrt, useOakCodec: true, 
cleanerIntervalInSecs: 10, syncIndexing: true 
{noformat}


h3. Pending Stuff

*Open Items*

# Support for nodetype index
# Support for reference index 

*Points to discuss*

Apart from current impl design following aspects needs to be discussed

# Frequency of the cleaner job - Currently it is scheduled to run every 10 mins
# Threshold for unique index cleanup - Currently entries would be removed after 
1 hr of them making into persisted lucene index

[~tmueller] [~catholicon] [~teofili] Please review the patch. I would keep this 
open for this week so that you get time. Plan to merge next week

> Synchronous Lucene Property Indexes
> -----------------------------------
>
>                 Key: OAK-6535
>                 URL: https://issues.apache.org/jira/browse/OAK-6535
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: lucene, property-index
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.8
>
>         Attachments: OAK-6535-v1.diff
>
>
> Oak 1.6 added support for Lucene Hybrid Index (OAK-4412). That enables near 
> real time (NRT) support for Lucene based indexes. It also had a limited 
> support for sync indexes. This feature aims to improve that to next level and 
> enable support for sync property indexes.
> More details at 
> https://wiki.apache.org/jackrabbit/Synchronous%20Lucene%20Property%20Indexes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to