[jira] [Updated] (OAK-4412) Lucene hybrid index

2017-03-24 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-4412:
-
 Labels:   (was: docs-impacting)
Description: 
When running Oak in a cluster, each write operation is expensive. After 
performing some stress-tests with a geo-distributed Mongo cluster, we've found 
out that updating property indexes is a large part of the overall traffic.

The asynchronous index would be an answer here (as the index update won't be 
made in the client request thread), but the AEM requires the updates to be 
visible immediately in order to work properly.

The idea here is to enhance the existing asynchronous Lucene index with a 
synchronous, locally-stored counterpart that will persist only the data since 
the last Lucene background reindexing job.

The new index can be stored in memory or (if necessary) in MMAPed local files. 
Once the "main" Lucene index is being updated, the local index will be purged.

Queries will use an union of results from the {{lucene}} and {{lucene-memory}} 
indexes.

The {{lucene-memory}} index, as a local stored entity, will be updated using an 
observer, so it'll get both local and remote changes.

The original idea has been suggested by [~chetanm] in the discussion for the 
OAK-4233.

*Feature Docs*
http://jackrabbit.apache.org/oak/docs/query/indexing.html#nrt-indexing

  was:
When running Oak in a cluster, each write operation is expensive. After 
performing some stress-tests with a geo-distributed Mongo cluster, we've found 
out that updating property indexes is a large part of the overall traffic.

The asynchronous index would be an answer here (as the index update won't be 
made in the client request thread), but the AEM requires the updates to be 
visible immediately in order to work properly.

The idea here is to enhance the existing asynchronous Lucene index with a 
synchronous, locally-stored counterpart that will persist only the data since 
the last Lucene background reindexing job.

The new index can be stored in memory or (if necessary) in MMAPed local files. 
Once the "main" Lucene index is being updated, the local index will be purged.

Queries will use an union of results from the {{lucene}} and {{lucene-memory}} 
indexes.

The {{lucene-memory}} index, as a local stored entity, will be updated using an 
observer, so it'll get both local and remote changes.

The original idea has been suggested by [~chetanm] in the discussion for the 
OAK-4233.


> Lucene hybrid index
> ---
>
> Key: OAK-4412
> URL: https://issues.apache.org/jira/browse/OAK-4412
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: lucene
>Reporter: Tomek Rękawek
>Assignee: Chetan Mehrotra
> Fix For: 1.5.11, 1.6.0
>
> Attachments: hybrid-benchmark.sh, hybrid-result-v1.txt, 
> OAK-4412.patch, OAK-4412-v1.diff
>
>
> When running Oak in a cluster, each write operation is expensive. After 
> performing some stress-tests with a geo-distributed Mongo cluster, we've 
> found out that updating property indexes is a large part of the overall 
> traffic.
> The asynchronous index would be an answer here (as the index update won't be 
> made in the client request thread), but the AEM requires the updates to be 
> visible immediately in order to work properly.
> The idea here is to enhance the existing asynchronous Lucene index with a 
> synchronous, locally-stored counterpart that will persist only the data since 
> the last Lucene background reindexing job.
> The new index can be stored in memory or (if necessary) in MMAPed local 
> files. Once the "main" Lucene index is being updated, the local index will be 
> purged.
> Queries will use an union of results from the {{lucene}} and 
> {{lucene-memory}} indexes.
> The {{lucene-memory}} index, as a local stored entity, will be updated using 
> an observer, so it'll get both local and remote changes.
> The original idea has been suggested by [~chetanm] in the discussion for the 
> OAK-4233.
> *Feature Docs*
> http://jackrabbit.apache.org/oak/docs/query/indexing.html#nrt-indexing



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-4412) Lucene hybrid index

2016-09-15 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-4412:
-
Labels: docs-impacting  (was: )

> Lucene hybrid index
> ---
>
> Key: OAK-4412
> URL: https://issues.apache.org/jira/browse/OAK-4412
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: lucene
>Reporter: Tomek Rękawek
>Assignee: Chetan Mehrotra
>  Labels: docs-impacting
> Fix For: 1.6, 1.5.11
>
> Attachments: OAK-4412-v1.diff, OAK-4412.patch, hybrid-benchmark.sh, 
> hybrid-result-v1.txt
>
>
> When running Oak in a cluster, each write operation is expensive. After 
> performing some stress-tests with a geo-distributed Mongo cluster, we've 
> found out that updating property indexes is a large part of the overall 
> traffic.
> The asynchronous index would be an answer here (as the index update won't be 
> made in the client request thread), but the AEM requires the updates to be 
> visible immediately in order to work properly.
> The idea here is to enhance the existing asynchronous Lucene index with a 
> synchronous, locally-stored counterpart that will persist only the data since 
> the last Lucene background reindexing job.
> The new index can be stored in memory or (if necessary) in MMAPed local 
> files. Once the "main" Lucene index is being updated, the local index will be 
> purged.
> Queries will use an union of results from the {{lucene}} and 
> {{lucene-memory}} indexes.
> The {{lucene-memory}} index, as a local stored entity, will be updated using 
> an observer, so it'll get both local and remote changes.
> The original idea has been suggested by [~chetanm] in the discussion for the 
> OAK-4233.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4412) Lucene hybrid index

2016-09-08 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-4412:
-
Attachment: hybrid-result-v1.txt
hybrid-benchmark.sh

> Lucene hybrid index
> ---
>
> Key: OAK-4412
> URL: https://issues.apache.org/jira/browse/OAK-4412
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: lucene
>Reporter: Tomek Rękawek
>Assignee: Chetan Mehrotra
> Fix For: 1.6
>
> Attachments: OAK-4412-v1.diff, OAK-4412.patch, hybrid-benchmark.sh, 
> hybrid-result-v1.txt
>
>
> When running Oak in a cluster, each write operation is expensive. After 
> performing some stress-tests with a geo-distributed Mongo cluster, we've 
> found out that updating property indexes is a large part of the overall 
> traffic.
> The asynchronous index would be an answer here (as the index update won't be 
> made in the client request thread), but the AEM requires the updates to be 
> visible immediately in order to work properly.
> The idea here is to enhance the existing asynchronous Lucene index with a 
> synchronous, locally-stored counterpart that will persist only the data since 
> the last Lucene background reindexing job.
> The new index can be stored in memory or (if necessary) in MMAPed local 
> files. Once the "main" Lucene index is being updated, the local index will be 
> purged.
> Queries will use an union of results from the {{lucene}} and 
> {{lucene-memory}} indexes.
> The {{lucene-memory}} index, as a local stored entity, will be updated using 
> an observer, so it'll get both local and remote changes.
> The original idea has been suggested by [~chetanm] in the discussion for the 
> OAK-4233.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4412) Lucene hybrid index

2016-09-06 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-4412:
-
Attachment: OAK-4412-v1.diff

> Lucene hybrid index
> ---
>
> Key: OAK-4412
> URL: https://issues.apache.org/jira/browse/OAK-4412
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: lucene
>Reporter: Tomek Rękawek
>Assignee: Chetan Mehrotra
> Fix For: 1.6
>
> Attachments: OAK-4412-v1.diff, OAK-4412.patch
>
>
> When running Oak in a cluster, each write operation is expensive. After 
> performing some stress-tests with a geo-distributed Mongo cluster, we've 
> found out that updating property indexes is a large part of the overall 
> traffic.
> The asynchronous index would be an answer here (as the index update won't be 
> made in the client request thread), but the AEM requires the updates to be 
> visible immediately in order to work properly.
> The idea here is to enhance the existing asynchronous Lucene index with a 
> synchronous, locally-stored counterpart that will persist only the data since 
> the last Lucene background reindexing job.
> The new index can be stored in memory or (if necessary) in MMAPed local 
> files. Once the "main" Lucene index is being updated, the local index will be 
> purged.
> Queries will use an union of results from the {{lucene}} and 
> {{lucene-memory}} indexes.
> The {{lucene-memory}} index, as a local stored entity, will be updated using 
> an observer, so it'll get both local and remote changes.
> The original idea has been suggested by [~chetanm] in the discussion for the 
> OAK-4233.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4412) Lucene hybrid index

2016-08-03 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-4412:
-
Issue Type: New Feature  (was: Improvement)

> Lucene hybrid index
> ---
>
> Key: OAK-4412
> URL: https://issues.apache.org/jira/browse/OAK-4412
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: lucene
>Reporter: Tomek Rękawek
>Assignee: Chetan Mehrotra
> Fix For: 1.6
>
> Attachments: OAK-4412.patch
>
>
> When running Oak in a cluster, each write operation is expensive. After 
> performing some stress-tests with a geo-distributed Mongo cluster, we've 
> found out that updating property indexes is a large part of the overall 
> traffic.
> The asynchronous index would be an answer here (as the index update won't be 
> made in the client request thread), but the AEM requires the updates to be 
> visible immediately in order to work properly.
> The idea here is to enhance the existing asynchronous Lucene index with a 
> synchronous, locally-stored counterpart that will persist only the data since 
> the last Lucene background reindexing job.
> The new index can be stored in memory or (if necessary) in MMAPed local 
> files. Once the "main" Lucene index is being updated, the local index will be 
> purged.
> Queries will use an union of results from the {{lucene}} and 
> {{lucene-memory}} indexes.
> The {{lucene-memory}} index, as a local stored entity, will be updated using 
> an observer, so it'll get both local and remote changes.
> The original idea has been suggested by [~chetanm] in the discussion for the 
> OAK-4233.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4412) Lucene hybrid index

2016-07-14 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomek Rękawek updated OAK-4412:
---
Attachment: OAK-4412.patch

> Lucene hybrid index
> ---
>
> Key: OAK-4412
> URL: https://issues.apache.org/jira/browse/OAK-4412
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Tomek Rękawek
>Assignee: Tomek Rękawek
> Fix For: 1.6
>
> Attachments: OAK-4412.patch
>
>
> When running Oak in a cluster, each write operation is expensive. After 
> performing some stress-tests with a geo-distributed Mongo cluster, we've 
> found out that updating property indexes is a large part of the overall 
> traffic.
> The asynchronous index would be an answer here (as the index update won't be 
> made in the client request thread), but the AEM requires the updates to be 
> visible immediately in order to work properly.
> The idea here is to enhance the existing asynchronous Lucene index with a 
> synchronous, locally-stored counterpart that will persist only the data since 
> the last Lucene background reindexing job.
> The new index can be stored in memory or (if necessary) in MMAPed local 
> files. Once the "main" Lucene index is being updated, the local index will be 
> purged.
> Queries will use an union of results from the {{lucene}} and 
> {{lucene-memory}} indexes.
> The {{lucene-memory}} index, as a local stored entity, will be updated using 
> an observer, so it'll get both local and remote changes.
> The original idea has been suggested by [~chetanm] in the discussion for the 
> OAK-4233.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4412) Lucene hybrid index

2016-07-14 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomek Rękawek updated OAK-4412:
---
Attachment: (was: OAK-4412.patch)

> Lucene hybrid index
> ---
>
> Key: OAK-4412
> URL: https://issues.apache.org/jira/browse/OAK-4412
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Tomek Rękawek
>Assignee: Tomek Rękawek
> Fix For: 1.6
>
> Attachments: OAK-4412.patch
>
>
> When running Oak in a cluster, each write operation is expensive. After 
> performing some stress-tests with a geo-distributed Mongo cluster, we've 
> found out that updating property indexes is a large part of the overall 
> traffic.
> The asynchronous index would be an answer here (as the index update won't be 
> made in the client request thread), but the AEM requires the updates to be 
> visible immediately in order to work properly.
> The idea here is to enhance the existing asynchronous Lucene index with a 
> synchronous, locally-stored counterpart that will persist only the data since 
> the last Lucene background reindexing job.
> The new index can be stored in memory or (if necessary) in MMAPed local 
> files. Once the "main" Lucene index is being updated, the local index will be 
> purged.
> Queries will use an union of results from the {{lucene}} and 
> {{lucene-memory}} indexes.
> The {{lucene-memory}} index, as a local stored entity, will be updated using 
> an observer, so it'll get both local and remote changes.
> The original idea has been suggested by [~chetanm] in the discussion for the 
> OAK-4233.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4412) Lucene hybrid index

2016-07-07 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomek Rękawek updated OAK-4412:
---
Due Date: 21/Jul/16

> Lucene hybrid index
> ---
>
> Key: OAK-4412
> URL: https://issues.apache.org/jira/browse/OAK-4412
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Tomek Rękawek
>Assignee: Tomek Rękawek
> Fix For: 1.6
>
> Attachments: OAK-4412.patch
>
>
> When running Oak in a cluster, each write operation is expensive. After 
> performing some stress-tests with a geo-distributed Mongo cluster, we've 
> found out that updating property indexes is a large part of the overall 
> traffic.
> The asynchronous index would be an answer here (as the index update won't be 
> made in the client request thread), but the AEM requires the updates to be 
> visible immediately in order to work properly.
> The idea here is to enhance the existing asynchronous Lucene index with a 
> synchronous, locally-stored counterpart that will persist only the data since 
> the last Lucene background reindexing job.
> The new index can be stored in memory or (if necessary) in MMAPed local 
> files. Once the "main" Lucene index is being updated, the local index will be 
> purged.
> Queries will use an union of results from the {{lucene}} and 
> {{lucene-memory}} indexes.
> The {{lucene-memory}} index, as a local stored entity, will be updated using 
> an observer, so it'll get both local and remote changes.
> The original idea has been suggested by [~chetanm] in the discussion for the 
> OAK-4233.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-4412) Lucene hybrid index

2016-07-07 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomek Rękawek updated OAK-4412:
---
Attachment: OAK-4412.patch

> Lucene hybrid index
> ---
>
> Key: OAK-4412
> URL: https://issues.apache.org/jira/browse/OAK-4412
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Tomek Rękawek
>Assignee: Tomek Rękawek
> Fix For: 1.6
>
> Attachments: OAK-4412.patch
>
>
> When running Oak in a cluster, each write operation is expensive. After 
> performing some stress-tests with a geo-distributed Mongo cluster, we've 
> found out that updating property indexes is a large part of the overall 
> traffic.
> The asynchronous index would be an answer here (as the index update won't be 
> made in the client request thread), but the AEM requires the updates to be 
> visible immediately in order to work properly.
> The idea here is to enhance the existing asynchronous Lucene index with a 
> synchronous, locally-stored counterpart that will persist only the data since 
> the last Lucene background reindexing job.
> The new index can be stored in memory or (if necessary) in MMAPed local 
> files. Once the "main" Lucene index is being updated, the local index will be 
> purged.
> Queries will use an union of results from the {{lucene}} and 
> {{lucene-memory}} indexes.
> The {{lucene-memory}} index, as a local stored entity, will be updated using 
> an observer, so it'll get both local and remote changes.
> The original idea has been suggested by [~chetanm] in the discussion for the 
> OAK-4233.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)