[jira] [Commented] (OAK-4566) Multiplexing store support in Lucene Indexes

2016-08-04 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407419#comment-15407419
 ] 

Chetan Mehrotra commented on OAK-4566:
--

My intention here was to ensure that getDefaultReader does not get invoked in 
production code i.e. have all parts in production code adapted to work with 
mounts and hence not make call to getDefaultReader. In normal setup (no mount) 
existing code would work as is and no exception would be thrown. But in setup 
where mount is present and say suggeter logic (yet not adapted) tries to get 
the default then we throw exception to indicate this feature is yet not 
complete.

Or may be we just open issue and track this missing piece and not throw 
exception. For now I am think to remove this check

> Multiplexing store support in Lucene Indexes
> 
>
> Key: OAK-4566
> URL: https://issues.apache.org/jira/browse/OAK-4566
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.6
>
>
> Similar to OAK-3403 we need support multiplexing store in Lucene indexes. 
> This can be done by having multiple directories under given index definition. 
> For e.g. currently the Lucene indexes for an index /oak:index/assetIndex are 
> stored in node /oak:index/assetIndex/:dir. For supporting multiple indexes 
> which get stored in different stores we can have structure like
> {noformat}
> /oak:index/assetIndex
>  + :oak:mount1-dir
>  + :dir
> {noformat}
> In above structure index content for paths which are part of mount1 would be 
> store in Lucene files stores under {{:oak:mount1-dir}} while the rest go in 
> default location {{:dir}
> # *Writing* - At the time of indexing the {{LuceneIndexEditor}} should pick 
> up correct writer i.e. one which is mapped to right directory node in 
> repository
> # *Reading* - For reading we would have one {{IndexSearcher}} per directory 
> node and then query would be executed against both and a joined cursor would 
> be made



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4566) Multiplexing store support in Lucene Indexes

2016-08-04 Thread Alex Parvulescu (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407347#comment-15407347
 ] 

Alex Parvulescu commented on OAK-4566:
--

Patch looks good as far as I can tell.

One small question is related to the impl of {{IndexNode#getDefaultReader()}}, 
why does it need to verify  {{Preconditions.checkArgument(readers.size() == 
1}}, can't it simply just pickup the first entry if the list is non-empty (that 
would be the default mount anyway, no)? [0]


[0] 
https://github.com/chetanmeh/jackrabbit-oak/blob/f73df0d4d4288ae90ab29b3ca7e5939b8a14da1c/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/IndexNode.java#L116

> Multiplexing store support in Lucene Indexes
> 
>
> Key: OAK-4566
> URL: https://issues.apache.org/jira/browse/OAK-4566
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.6
>
>
> Similar to OAK-3403 we need support multiplexing store in Lucene indexes. 
> This can be done by having multiple directories under given index definition. 
> For e.g. currently the Lucene indexes for an index /oak:index/assetIndex are 
> stored in node /oak:index/assetIndex/:dir. For supporting multiple indexes 
> which get stored in different stores we can have structure like
> {noformat}
> /oak:index/assetIndex
>  + :oak:mount1-dir
>  + :dir
> {noformat}
> In above structure index content for paths which are part of mount1 would be 
> store in Lucene files stores under {{:oak:mount1-dir}} while the rest go in 
> default location {{:dir}
> # *Writing* - At the time of indexing the {{LuceneIndexEditor}} should pick 
> up correct writer i.e. one which is mapped to right directory node in 
> repository
> # *Reading* - For reading we would have one {{IndexSearcher}} per directory 
> node and then query would be executed against both and a joined cursor would 
> be made



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4566) Multiplexing store support in Lucene Indexes

2016-08-01 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403433#comment-15403433
 ] 

Chetan Mehrotra commented on OAK-4566:
--

With MultiReader from Lucene the query side is now working in presence of 
multiple readers. Pending work is modifying the JMX MBeans and later modifying 
suggestor etc to support it properly. Those can be done later and we can now 
start the merge work for this feature.

[~alexparvulescu] Would be helpful if you can review the highlevel changes 
before I do the merge. Changes can be seen 
[here|https://github.com/chetanmeh/jackrabbit-oak/compare/trunk...chetanmeh:OAK-4566]
 split in multiple commits. Before modifying key parts a refactoring commit is 
done which just moves out code to new class without much functional change and 
then only current feature related changes are done.

Key part of the changes are. 
 *Index Side*  

{{LuceneIndexEditorContext}} now makes use of a {{LuceneIndexWriterFactory}} to 
construct an instance of {{LuceneIndexWriter}} which takes care of adding 
{{Document}} created by {{LuceneIndexEditor}} to actual Lucene index. So far 
all this logic was in {{LuceneIndexEditorContext}} which was refactored  with 
[39e4867|https://github.com/chetanmeh/jackrabbit-oak/commit/39e486704bfb77dce85bc90dbbaab7fb42e828d1]

To add support for multiple writers configured per mount 
{{MultiplexingIndexWriter}} is introduced which determines the {{Mount}} for 
path being indexed and then delegates to {{DefaultIndexWriter}} (which has 
matching :data node configured like _:oak:mount-private-index-dir_)

One key difference between approach taken in PropertyIndexes is that instead of 
having multiple writers which are bound to different Mounts we have a 
{{MultiplexingIndexWriter}} which determines the Mount for the path and then 
pick up a {{DefaultIndexWriter}} configured for that Mount. This ensures that 
Mount related calculations are minimized per path (done only once).

*Query Side*

{{IndexNode}} is refactored to make use of {{LuceneIndexReaderFactory}} to 
construct multiple instances of {{LuceneIndexReader}} which are then wrapped in 
a {{MultiReader}} (if more than 1 otherwise the reader is used directly). So 
far all this logic was present in {{IndexNode}} which was moved out with 
[f73df0d4d428|https://github.com/chetanmeh/jackrabbit-oak/commit/f73df0d4d4288ae90ab29b3ca7e5939b8a14da1c]

> Multiplexing store support in Lucene Indexes
> 
>
> Key: OAK-4566
> URL: https://issues.apache.org/jira/browse/OAK-4566
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.6
>
>
> Similar to OAK-3403 we need support multiplexing store in Lucene indexes. 
> This can be done by having multiple directories under given index definition. 
> For e.g. currently the Lucene indexes for an index /oak:index/assetIndex are 
> stored in node /oak:index/assetIndex/:dir. For supporting multiple indexes 
> which get stored in different stores we can have structure like
> {noformat}
> /oak:index/assetIndex
>  + :oak:mount1-dir
>  + :dir
> {noformat}
> In above structure index content for paths which are part of mount1 would be 
> store in Lucene files stores under {{:oak:mount1-dir}} while the rest go in 
> default location {{:dir}
> # *Writing* - At the time of indexing the {{LuceneIndexEditor}} should pick 
> up correct writer i.e. one which is mapped to right directory node in 
> repository
> # *Reading* - For reading we would have one {{IndexSearcher}} per directory 
> node and then query would be executed against both and a joined cursor would 
> be made



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4566) Multiplexing store support in Lucene Indexes

2016-08-01 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402008#comment-15402008
 ] 

Chetan Mehrotra commented on OAK-4566:
--

Thanks [~teofili] for that tip. Looks like that is what we require and this 
would keep things lot more simpler! Would give that a try

> Multiplexing store support in Lucene Indexes
> 
>
> Key: OAK-4566
> URL: https://issues.apache.org/jira/browse/OAK-4566
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.6
>
>
> Similar to OAK-3403 we need support multiplexing store in Lucene indexes. 
> This can be done by having multiple directories under given index definition. 
> For e.g. currently the Lucene indexes for an index /oak:index/assetIndex are 
> stored in node /oak:index/assetIndex/:dir. For supporting multiple indexes 
> which get stored in different stores we can have structure like
> {noformat}
> /oak:index/assetIndex
>  + :oak:mount1-dir
>  + :dir
> {noformat}
> In above structure index content for paths which are part of mount1 would be 
> store in Lucene files stores under {{:oak:mount1-dir}} while the rest go in 
> default location {{:dir}
> # *Writing* - At the time of indexing the {{LuceneIndexEditor}} should pick 
> up correct writer i.e. one which is mapped to right directory node in 
> repository
> # *Reading* - For reading we would have one {{IndexSearcher}} per directory 
> node and then query would be executed against both and a joined cursor would 
> be made



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4566) Multiplexing store support in Lucene Indexes

2016-08-01 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402005#comment-15402005
 ] 

Tommaso Teofili commented on OAK-4566:
--

I think it may be worth trying Lucene's {{MultipleReader}} [1] which wraps 
multiple {{IndexReaders}} and therefore one may rely on Lucene's built in 
support for that (although it's usually meant for readers coming from the same 
{{Directory}}) by simply building the {{IndexSearcher}} on top of the 
{{MultipleReader}} and avoid any post processing on the result set.

[1] : 
https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/index/MultiReader.html

> Multiplexing store support in Lucene Indexes
> 
>
> Key: OAK-4566
> URL: https://issues.apache.org/jira/browse/OAK-4566
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.6
>
>
> Similar to OAK-3403 we need support multiplexing store in Lucene indexes. 
> This can be done by having multiple directories under given index definition. 
> For e.g. currently the Lucene indexes for an index /oak:index/assetIndex are 
> stored in node /oak:index/assetIndex/:dir. For supporting multiple indexes 
> which get stored in different stores we can have structure like
> {noformat}
> /oak:index/assetIndex
>  + :oak:mount1-dir
>  + :dir
> {noformat}
> In above structure index content for paths which are part of mount1 would be 
> store in Lucene files stores under {{:oak:mount1-dir}} while the rest go in 
> default location {{:dir}
> # *Writing* - At the time of indexing the {{LuceneIndexEditor}} should pick 
> up correct writer i.e. one which is mapped to right directory node in 
> repository
> # *Reading* - For reading we would have one {{IndexSearcher}} per directory 
> node and then query would be executed against both and a joined cursor would 
> be made



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4566) Multiplexing store support in Lucene Indexes

2016-08-01 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401983#comment-15401983
 ] 

Chetan Mehrotra commented on OAK-4566:
--

Supporting multiple IndexReader on query side involves 2 things
* Creating individual iterators for LuceneResultRow for each reader and 
combining them
* Handle sorting

The sorting aspects makes thing tricky as QE would not be doing sorting here we 
need ensure that iterators are merge sorted with comparison done on 
LuceneResultRow level. For that there are 2 options
# O1 - Do comparison based on reading the value from the PropertyState. The 
query also has associated NodeState which can be used to read the value of the 
ordered property and comparison done based on that. Note that root NodeState 
bound to the query would be more recent compared to NodeState at which index 
was populated/updated. May be node itself might not exist. In such a case we 
might need to rely on NodeState at which index update was detected. 
# O2 - Make use of Doc values which are stored in Lucene index and then perform 
comparison based on the stored value. This would involved accessing the doc 
value of specific property as iterator is traversed

Had discussion with [~teofili] - Both approach are feasible and would need 
performance benchmark to confirm the result. 

Note that actual sorting is still taken care by Lucene. Its just the part of 
merging two iterators that requires comparison to be performed

/cc  [~tmueller] [~alex.parvulescu] [~catholicon]

> Multiplexing store support in Lucene Indexes
> 
>
> Key: OAK-4566
> URL: https://issues.apache.org/jira/browse/OAK-4566
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.6
>
>
> Similar to OAK-3403 we need support multiplexing store in Lucene indexes. 
> This can be done by having multiple directories under given index definition. 
> For e.g. currently the Lucene indexes for an index /oak:index/assetIndex are 
> stored in node /oak:index/assetIndex/:dir. For supporting multiple indexes 
> which get stored in different stores we can have structure like
> {noformat}
> /oak:index/assetIndex
>  + :oak:mount1-dir
>  + :dir
> {noformat}
> In above structure index content for paths which are part of mount1 would be 
> store in Lucene files stores under {{:oak:mount1-dir}} while the rest go in 
> default location {{:dir}
> # *Writing* - At the time of indexing the {{LuceneIndexEditor}} should pick 
> up correct writer i.e. one which is mapped to right directory node in 
> repository
> # *Reading* - For reading we would have one {{IndexSearcher}} per directory 
> node and then query would be executed against both and a joined cursor would 
> be made



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4566) Multiplexing store support in Lucene Indexes

2016-07-29 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399195#comment-15399195
 ] 

Chetan Mehrotra commented on OAK-4566:
--

Following part done in branch so far
* Refactor the index logic and move out the logic related to IndexWriter 
creation to a LuceneIndexWriter interface
* Implement multiplexing support on the indexing side with index data for paths 
belonging to different mount going to different Lucene directory
* Refactor the IndexNode to move out the logic related to creation of 
IndexSearcher and related classes to LuceneIndexReader. IndexNode would have 
access to list of LuceneIndexReader instances 

Next is to add support for multiplexing on query side such that 
LucenePropertyIndex performs query against all LuceneIndexReader instances and 
provide a combined result

> Multiplexing store support in Lucene Indexes
> 
>
> Key: OAK-4566
> URL: https://issues.apache.org/jira/browse/OAK-4566
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.6
>
>
> Similar to OAK-3403 we need support multiplexing store in Lucene indexes. 
> This can be done by having multiple directories under given index definition. 
> For e.g. currently the Lucene indexes for an index /oak:index/assetIndex are 
> stored in node /oak:index/assetIndex/:dir. For supporting multiple indexes 
> which get stored in different stores we can have structure like
> {noformat}
> /oak:index/assetIndex
>  + :oak:mount1-dir
>  + :dir
> {noformat}
> In above structure index content for paths which are part of mount1 would be 
> store in Lucene files stores under {{:oak:mount1-dir}} while the rest go in 
> default location {{:dir}
> # *Writing* - At the time of indexing the {{LuceneIndexEditor}} should pick 
> up correct writer i.e. one which is mapped to right directory node in 
> repository
> # *Reading* - For reading we would have one {{IndexSearcher}} per directory 
> node and then query would be executed against both and a joined cursor would 
> be made



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4566) Multiplexing store support in Lucene Indexes

2016-07-26 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393647#comment-15393647
 ] 

Chetan Mehrotra commented on OAK-4566:
--

Feature branch at https://github.com/chetanmeh/jackrabbit-oak/tree/OAK-4566

> Multiplexing store support in Lucene Indexes
> 
>
> Key: OAK-4566
> URL: https://issues.apache.org/jira/browse/OAK-4566
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.6
>
>
> Similar to OAK-3403 we need support multiplexing store in Lucene indexes. 
> This can be done by having multiple directories under given index definition. 
> For e.g. currently the Lucene indexes for an index /oak:index/assetIndex are 
> stored in node /oak:index/assetIndex/:dir. For supporting multiple indexes 
> which get stored in different stores we can have structure like
> {noformat}
> /oak:index/assetIndex
>  + :oak:mount1-dir
>  + :dir
> {noformat}
> In above structure index content for paths which are part of mount1 would be 
> store in Lucene files stores under {{:oak:mount1-dir}} while the rest go in 
> default location {{:dir}
> # *Writing* - At the time of indexing the {{LuceneIndexEditor}} should pick 
> up correct writer i.e. one which is mapped to right directory node in 
> repository
> # *Reading* - For reading we would have one {{IndexSearcher}} per directory 
> node and then query would be executed against both and a joined cursor would 
> be made



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)