[ https://issues.apache.org/jira/browse/OAK-6535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16179017#comment-16179017 ]
Chetan Mehrotra edited comment on OAK-6535 at 9/26/17 4:56 AM: --------------------------------------------------------------- This feature is now ready for review * On github - See [here|https://github.com/chetanmeh/jackrabbit-oak/compare/trunk...chetanmeh:OAK-6535] * As single patch - See [here|^OAK-6535-v1.diff] * See [wiki|https://wiki.apache.org/jackrabbit/Synchronous%20Lucene%20Property%20Indexes] for more background h2. Implementation Details *Indexing* {{LuceneIndexEditor}} now supports a {{PropertyUpdateCallback}} which is invoked for each indexed property change. For this feature we provide a {{PropertyIndexUpdateCallback}} which performs the property index update as per property index type. For non unique sync index it uses {{ContentMirrorStoreStrategy}} and for unique it uses {{UniqueIndexStoreStrategy}}. See wiki for storage format For non unique indexes it disables default pruning For unique index each index entry also stores a timestamp (as epoch time) in {{jcr:created}}. Notes its not of type Calendar *Query* On query side {{IndexPlanner}} checks if the definition support sync indexes. If yes then it determine which sync index can be used. For a query only of the sync indexes can be used. It follows following rule * If any unique index is found then that is given preference * If multiple non unique sync indexes are found then first one is used In case of unique index the entryCount is set to 1 such that this index reports almost lowest cost. Post planning the {{LucenePropertyIndex}} would see if planner has identified any sync index. If yes then it returns a concatenated iterator where iterator provided by property index (via {{HybridPropertyIndexLookup}}) comes first. *Cleanup* This feature configures a {{PropertyIndexCleaner}} job which gets periodically triggered (default frequency every 10 min) and does following # First change the head bucket if there is any change in current head bucket state for non unique sync index. This is merged # For non unique sync index cleanup old orphan buckets # For unique index scan the index entries and remove those index entries whose {{jcr:created}} is older than lastIndexTo time of indexes indexer lane. That is those entries which have been moved to lucene index are removed. In doing this it also keeps a threshold which defaults to 1 hr *Misc Points* # Supports relative properties # -Supports non root indexes- Pending OAK-6714 h2. Benchmark The benchmark can be run via {noformat} java -DhybridIndexEnabled=true -DindexingMode=nrt -DsyncIndexing=true -jar oak-benchmark*.jar benchmark HybridIndexTest Oak-Segment-Tar-DS {noformat} Here * hybridIndexEnabled=true, syncIndexing=true - Enables this feature i.e. 'foo' property indexed in hybrid mode * hybridIndexEnabled=true, syncIndexing=false - Enables just the NRT mode * hybridIndexEnabled=false, syncIndexing=false - Enables pure property index mode {noformat} # HybridIndexTest C min 10% 50% 90% max N Searcher Mutator Indexed Oak-Segment-Tar-DS 1 4 6 7 9 527 7992 5385539 39400 49890 #nrt,oakCodec,sync Oak-Segment-Tar-DS 1 4 6 7 10 114 7462 6834075 34220 46362 #property Oak-Segment-Tar-DS 1 4 5 6 8 508 9063 4439786 47797 56844 #nrt,oakCodec numOfIndexes: 10, refreshDeltaMillis: 1000, asyncInterval: 5, queueSize: 1000 , hybridIndexEnabled: true, indexingMode: nrt, useOakCodec: true, cleanerIntervalInSecs: 10, syncIndexing: true {noformat} h2. Pending Stuff *Open Items* # Support for nodetype index # Support for reference index *Points to discuss* Apart from current impl design following aspects needs to be discussed # Frequency of the cleaner job - Currently it is scheduled to run every 10 mins # Threshold for unique index cleanup - Currently entries would be removed after 1 hr of them making into persisted lucene index. This is required as the recorded time in index entry would not be same time as commit is made. So its possible if lastIndexTo refers to T1 then an entry created at T0 (T0 < T1) actually got persisted to repository in time T2 (T2 > T1). So this threshold ensures that we do not remove those entries which have yet not made it to the persisted lucene index [~tmueller] [~catholicon] [~teofili] Please review the patch. I would keep this open for this week so that you get time. Plan to merge next week was (Author: chetanm): This feature is now ready for review * On github - See [here|https://github.com/chetanmeh/jackrabbit-oak/compare/trunk...chetanmeh:OAK-6535] * As single patch - See [here|^OAK-6535-v1.diff] * See [wiki|https://wiki.apache.org/jackrabbit/Synchronous%20Lucene%20Property%20Indexes] for more background h2. Implementation Details *Indexing* {{LuceneIndexEditor}} now supports a {{PropertyUpdateCallback}} which is invoked for each indexed property change. For this feature we provide a {{PropertyIndexUpdateCallback}} which performs the property index update as per property index type. For non unique sync index it uses {{ContentMirrorStoreStrategy}} and for unique it uses {{UniqueIndexStoreStrategy}}. See wiki for storage format For non unique indexes it disables default pruning For unique index each index entry also stores a timestamp (as epoch time) in {{jcr:created}}. Notes its not of type Calendar *Query* On query side {{IndexPlanner}} checks if the definition support sync indexes. If yes then it determine which sync index can be used. For a query only of the sync indexes can be used. It follows following rule * If any unique index is found then that is given preference * If multiple non unique sync indexes are found then first one is used In case of unique index the entryCount is set to 1 such that this index reports almost lowest cost. Post planning the {{LucenePropertyIndex}} would see if planner has identified any sync index. If yes then it returns a concatenated iterator where iterator provided by property index (via {{HybridPropertyIndexLookup}}) comes first. *Cleanup* This feature configures a {{PropertyIndexCleaner}} job which gets periodically triggered (default frequency every 10 min) and does following # First change the head bucket if there is any change in current head bucket state for non unique sync index. This is merged # For non unique sync index cleanup old orphan buckets # For unique index scan the index entries and remove those index entries whose {{jcr:created}} is older than lastIndexTo time of indexes indexer lane. That is those entries which have been moved to lucene index are removed. In doing this it also keeps a threshold which defaults to 1 hr *Misc Points* # Supports relative properties # -Supports non root indexes- Pending OAK-6714 h2. Benchmark The benchmark can be run via {noformat} java -DhybridIndexEnabled=true -DindexingMode=nrt -DsyncIndexing=true -jar oak-benchmark*.jar benchmark HybridIndexTest Oak-Segment-Tar-DS {noformat} Here * hybridIndexEnabled=true, syncIndexing=true - Enables this feature i.e. 'foo' property indexed in hybrid mode * hybridIndexEnabled=true, syncIndexing=false - Enables just the NRT mode * hybridIndexEnabled=false, syncIndexing=false - Enables pure property index mode {noformat} # HybridIndexTest C min 10% 50% 90% max N Searcher Mutator Indexed Oak-Segment-Tar-DS 1 4 6 7 9 527 7992 5385539 39400 49890 #nrt,oakCodec,sync Oak-Segment-Tar-DS 1 4 6 7 10 114 7462 6834075 34220 46362 #property Oak-Segment-Tar-DS 1 4 5 6 8 508 9063 4439786 47797 56844 #nrt,oakCodec numOfIndexes: 10, refreshDeltaMillis: 1000, asyncInterval: 5, queueSize: 1000 , hybridIndexEnabled: true, indexingMode: nrt, useOakCodec: true, cleanerIntervalInSecs: 10, syncIndexing: true {noformat} h2. Pending Stuff *Open Items* # Support for nodetype index # Support for reference index *Points to discuss* Apart from current impl design following aspects needs to be discussed # Frequency of the cleaner job - Currently it is scheduled to run every 10 mins # Threshold for unique index cleanup - Currently entries would be removed after 1 hr of them making into persisted lucene index [~tmueller] [~catholicon] [~teofili] Please review the patch. I would keep this open for this week so that you get time. Plan to merge next week > Synchronous Lucene Property Indexes > ----------------------------------- > > Key: OAK-6535 > URL: https://issues.apache.org/jira/browse/OAK-6535 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: lucene, property-index > Reporter: Chetan Mehrotra > Assignee: Chetan Mehrotra > Fix For: 1.8 > > Attachments: OAK-6535-v1.diff > > > Oak 1.6 added support for Lucene Hybrid Index (OAK-4412). That enables near > real time (NRT) support for Lucene based indexes. It also had a limited > support for sync indexes. This feature aims to improve that to next level and > enable support for sync property indexes. > More details at > https://wiki.apache.org/jackrabbit/Synchronous%20Lucene%20Property%20Indexes -- This message was sent by Atlassian JIRA (v6.4.14#64029)