[ https://issues.apache.org/jira/browse/OAK-6535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16179017#comment-16179017 ]
Chetan Mehrotra edited comment on OAK-6535 at 9/25/17 1:24 PM: --------------------------------------------------------------- This feature is now ready for review * On github - See [here|https://github.com/chetanmeh/jackrabbit-oak/compare/trunk...chetanmeh:OAK-6535] * As single patch - See [here|^OAK-6535-v1.diff] * See [wiki|https://wiki.apache.org/jackrabbit/Synchronous%20Lucene%20Property%20Indexes] for more background h2. Implementation Details *Indexing* {{LuceneIndexEditor}} now supports a {{PropertyUpdateCallback}} which is invoked for each indexed property change. For this feature we provide a {{PropertyIndexUpdateCallback}} which performs the property index update as per property index type. For non unique sync index it uses {{ContentMirrorStoreStrategy}} and for unique it uses {{UniqueIndexStoreStrategy}}. See wiki for storage format For non unique indexes it disables default pruning For unique index each index entry also stores a timestamp (as epoch time) in {{jcr:created}}. Notes its not of type Calendar *Query* On query side {{IndexPlanner}} checks if the definition support sync indexes. If yes then it determine which sync index can be used. For a query only of the sync indexes can be used. It follows following rule * If any unique index is found then that is given preference * If multiple non unique sync indexes are found then first one is used In case of unique index the entryCount is set to 1 such that this index reports almost lowest cost. Post planning the {{LucenePropertyIndex}} would see if planner has identified any sync index. If yes then it returns a concatenated iterator where iterator provided by property index (via {{HybridPropertyIndexLookup}}) comes first. *Cleanup* This feature configures a {{PropertyIndexCleaner}} job which gets periodically triggered (default frequency every 10 min) and does following # First change the head bucket if there is any change in current head bucket state for non unique sync index. This is merged # For non unique sync index cleanup old orphan buckets # For unique index scan the index entries and remove those index entries whose {{jcr:created}} is older than lastIndexTo time of indexes indexer lane. That is those entries which have been moved to lucene index are removed. In doing this it also keeps a threshold which defaults to 1 hr *Misc Points* # Supports relative properties # Supports non root indexes h2. Benchmark The benchmark can be run via {noformat} java -DhybridIndexEnabled=true -DindexingMode=nrt -DsyncIndexing=true -jar oak-benchmark*.jar benchmark HybridIndexTest Oak-Segment-Tar-DS {noformat} Here * hybridIndexEnabled=true, syncIndexing=true - Enables this feature i.e. 'foo' property indexed in hybrid mode * hybridIndexEnabled=true, syncIndexing=false - Enables just the NRT mode * hybridIndexEnabled=false, syncIndexing=false - Enables pure property index mode {noformat} # HybridIndexTest C min 10% 50% 90% max N Searcher Mutator Indexed Oak-Segment-Tar-DS 1 4 6 7 9 527 7992 5385539 39400 49890 #nrt,oakCodec,sync Oak-Segment-Tar-DS 1 4 6 7 10 114 7462 6834075 34220 46362 #property Oak-Segment-Tar-DS 1 4 5 6 8 508 9063 4439786 47797 56844 #nrt,oakCodec numOfIndexes: 10, refreshDeltaMillis: 1000, asyncInterval: 5, queueSize: 1000 , hybridIndexEnabled: true, indexingMode: nrt, useOakCodec: true, cleanerIntervalInSecs: 10, syncIndexing: true {noformat} h2. Pending Stuff *Open Items* # Support for nodetype index # Support for reference index *Points to discuss* Apart from current impl design following aspects needs to be discussed # Frequency of the cleaner job - Currently it is scheduled to run every 10 mins # Threshold for unique index cleanup - Currently entries would be removed after 1 hr of them making into persisted lucene index [~tmueller] [~catholicon] [~teofili] Please review the patch. I would keep this open for this week so that you get time. Plan to merge next week was (Author: chetanm): This feature is now ready for review * On github - See [here|https://github.com/chetanmeh/jackrabbit-oak/compare/trunk...chetanmeh:OAK-6535] * As single patch - See [here|^OAK-6535-v1.diff] * See [wiki|https://wiki.apache.org/jackrabbit/Synchronous%20Lucene%20Property%20Indexes] for more background h3. Implementation Details *Indexing* {{LuceneIndexEditor}} now supports a {{PropertyUpdateCallback}} which is invoked for each indexed property change. For this feature we provide a {{PropertyIndexUpdateCallback}} which performs the property index update as per property index type. For non unique sync index it uses {{ContentMirrorStoreStrategy}} and for unique it uses {{UniqueIndexStoreStrategy}}. See wiki for storage format For unique index each index entry also stores a timestamp (as epoch time) in {{jcr:created}}. Notes its not of type Calendar *Query* On query side {{IndexPlanner}} checks if the definition support sync indexes. If yes then it determine which sync index can be used. For a query only of the sync indexes can be used. It follows following rule * If any unique index is found then that is given preference * If multiple non unique sync indexes are found then first one is used In case of unique index the entryCount is set to 1 such that this index reports almost lowest cost. Post planning the {{LucenePropertyIndex}} would see if planner has identified any sync index. If yes then it returns a concatenated iterator where iterator provided by property index (via {{HybridPropertyIndexLookup}}) comes first. *Cleanup* This feature configures a {{PropertyIndexCleaner}} job which gets periodically triggered (default frequency every 10 min) and does following # First change the head bucket if there is any change in current head bucket state for non unique sync index. This is merged # For non unique sync index cleanup old orphan buckets # For unique index scan the index entries and remove those index entries whose {{jcr:created}} is older than lastIndexTo time of indexes indexer lane. That is those entries which have been moved to lucene index are removed. In doing this it also keeps a threshold which defaults to 1 hr h3. Benchmark The benchmark can be run via {noformat} java -DhybridIndexEnabled=true -DindexingMode=nrt -DsyncIndexing=true -jar oak-benchmark*.jar benchmark HybridIndexTest Oak-Segment-Tar-DS {noformat} Here * hybridIndexEnabled=true, syncIndexing=true - Enables this feature i.e. 'foo' property indexed in hybrid mode * hybridIndexEnabled=true, syncIndexing=false - Enables just the NRT mode * hybridIndexEnabled=false, syncIndexing=false - Enables pure property index mode {noformat} # HybridIndexTest C min 10% 50% 90% max N Searcher Mutator Indexed Oak-Segment-Tar-DS 1 4 6 7 9 527 7992 5385539 39400 49890 #nrt,oakCodec,sync Oak-Segment-Tar-DS 1 4 6 7 10 114 7462 6834075 34220 46362 #property Oak-Segment-Tar-DS 1 4 5 6 8 508 9063 4439786 47797 56844 #nrt,oakCodec numOfIndexes: 10, refreshDeltaMillis: 1000, asyncInterval: 5, queueSize: 1000 , hybridIndexEnabled: true, indexingMode: nrt, useOakCodec: true, cleanerIntervalInSecs: 10, syncIndexing: true {noformat} h3. Pending Stuff *Open Items* # Support for nodetype index # Support for reference index *Points to discuss* Apart from current impl design following aspects needs to be discussed # Frequency of the cleaner job - Currently it is scheduled to run every 10 mins # Threshold for unique index cleanup - Currently entries would be removed after 1 hr of them making into persisted lucene index [~tmueller] [~catholicon] [~teofili] Please review the patch. I would keep this open for this week so that you get time. Plan to merge next week > Synchronous Lucene Property Indexes > ----------------------------------- > > Key: OAK-6535 > URL: https://issues.apache.org/jira/browse/OAK-6535 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: lucene, property-index > Reporter: Chetan Mehrotra > Assignee: Chetan Mehrotra > Fix For: 1.8 > > Attachments: OAK-6535-v1.diff > > > Oak 1.6 added support for Lucene Hybrid Index (OAK-4412). That enables near > real time (NRT) support for Lucene based indexes. It also had a limited > support for sync indexes. This feature aims to improve that to next level and > enable support for sync property indexes. > More details at > https://wiki.apache.org/jackrabbit/Synchronous%20Lucene%20Property%20Indexes -- This message was sent by Atlassian JIRA (v6.4.14#64029)