[jira] [Resolved] (OAK-3421) RDBDocumentStore: force DB2 to use a clustered index

2015-11-09 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke resolved OAK-3421.
-
Resolution: Fixed

trunk: http://svn.apache.org/r1713439
1.2: http://svn.apache.org/r1713448
1.0: http://svn.apache.org/r1713454

> RDBDocumentStore: force DB2 to use a clustered index
> 
>
> Key: OAK-3421
> URL: https://issues.apache.org/jira/browse/OAK-3421
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Affects Versions: 1.2.7, 1.3.10, 1.0.23
>Reporter: Julian Reschke
>Assignee: Julian Reschke
> Fix For: 1.3.11, 1.2.8, 1.0.24
>
> Attachments: OAK-3421.diff, RDBLastRevRecoveryPerfTest.java
>
>
> DB2 by default does not create a clustered index; consider to force it to do 
> so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3092) Cache recently extracted text to avoid duplicate extraction

2015-11-09 Thread Alex Parvulescu (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996501#comment-14996501
 ] 

Alex Parvulescu commented on OAK-3092:
--

looks good! +1

how complicated would it be to try to purge old/unneeded entries from the cache 
based on the referencing content being removed? bookkeeping for the binary ids 
would be a pain and I'm not sure the gains are worth it, how much binary 
volatile content would end up in this cache anyway?

> Cache recently extracted text to avoid duplicate extraction
> ---
>
> Key: OAK-3092
> URL: https://issues.apache.org/jira/browse/OAK-3092
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>  Labels: performance
> Fix For: 1.3.11
>
> Attachments: OAK-3092-v1.patch, OAK-3092-v2.patch
>
>
> It can happen that text can be extracted from same binary multiple times in a 
> given indexing cycle. This can happen due to 2 reasons
> # Multiple Lucene indexes indexing same node - A system might have multiple 
> Lucene indexes e.g. a global Lucene index and an index for specific nodeType. 
> In a given indexing cycle same file would be picked up by both index 
> definition and both would extract same text
> # Aggregation - With Index time aggregation same file get picked up multiple 
> times due to aggregation rules
> To avoid the wasted effort for duplicate text extraction from same file in a 
> given indexing cycle it would be better to have an expiring cache which can 
> hold on to extracted text content for some time. The cache should have 
> following features
> # Limit on total size
> # Way to expire the content using [Timed 
> Evicition|https://code.google.com/p/guava-libraries/wiki/CachesExplained#Timed_Eviction]
>  - As chances of same file getting picked up are high only for a given 
> indexing cycle it would be better to expire the cache entries after some time 
> to avoid hogging memory unnecessarily 
> Such a cache would provide following benefit
> # Avoid duplicate text extraction - Text extraction is costly and has to be 
> minimized on critical path of {{indexEditor}}
> # Avoid expensive IO specially if binary content are to be fetched from a 
> remote {{BlobStore}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (OAK-3602) Remove the SegmentWriter segmentId from the cleanup set

2015-11-09 Thread Alex Parvulescu (JIRA)
Alex Parvulescu created OAK-3602:


 Summary: Remove the SegmentWriter segmentId from the cleanup set
 Key: OAK-3602
 URL: https://issues.apache.org/jira/browse/OAK-3602
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: segmentmk
Reporter: Alex Parvulescu
Assignee: Alex Parvulescu
Priority: Minor


It looks like the current head's segment id (coming from the SegmentWriter) is 
always passed to the cleanup's referenced segment ids set, even though it 
cannot be cleaned (similar to the TarWriter situation) so I propose removing it 
early from the mentioned set.
benefits include making the cleanup set more stable (head changes quite often 
so the cleanup set is more volatile) which will help in figuring out if we 
still need to clean a specific tar file or not based on previous cleanup runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3421) RDBDocumentStore: force DB2 to use a clustered index

2015-11-09 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-3421:

Fix Version/s: 1.2.8

> RDBDocumentStore: force DB2 to use a clustered index
> 
>
> Key: OAK-3421
> URL: https://issues.apache.org/jira/browse/OAK-3421
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Affects Versions: 1.2.7, 1.3.10, 1.0.23
>Reporter: Julian Reschke
>Assignee: Julian Reschke
> Fix For: 1.3.11, 1.2.8
>
> Attachments: OAK-3421.diff, RDBLastRevRecoveryPerfTest.java
>
>
> DB2 by default does not create a clustered index; consider to force it to do 
> so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (OAK-3421) RDBDocumentStore: force DB2 to use a clustered index

2015-11-09 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke reassigned OAK-3421:
---

Assignee: Julian Reschke

> RDBDocumentStore: force DB2 to use a clustered index
> 
>
> Key: OAK-3421
> URL: https://issues.apache.org/jira/browse/OAK-3421
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Reporter: Julian Reschke
>Assignee: Julian Reschke
> Attachments: OAK-3421.diff, RDBLastRevRecoveryPerfTest.java
>
>
> DB2 by default does not create a clustered index; consider to force it to do 
> so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3603) Evaluate skipping cleanup of a subset of tar files

2015-11-09 Thread Alex Parvulescu (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996622#comment-14996622
 ] 

Alex Parvulescu commented on OAK-3603:
--

this depends on a stable set or referenced ids, so we'd need to fix OAK-3602 
first.

> Evaluate skipping cleanup of a subset of tar files
> --
>
> Key: OAK-3603
> URL: https://issues.apache.org/jira/browse/OAK-3603
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: segmentmk
>Reporter: Alex Parvulescu
>Assignee: Alex Parvulescu
>
> Given the fact that tar readers are immutable (we only create new generations 
> of them once they reach a certain threshold of garbage) we can consider 
> coming up with a heuristic for skipping cleanup entirely for consequent 
> cleanup calls based on the same referenced id set (provided we can make this 
> set more stable, aka. OAK-2849).
> Ex: for a specific input set a cleanup call on a tar reader might decide that 
> there's no enough garbage (some IO involved in reading through all existing 
> entries). if the following cleanup cycle would have the exact same input, it 
> doesn't make sense to recheck the tar file, we already know cleanup can be 
> skipped, moreover we can skip the older tar files too, as their input would 
> also not change. the gains increase the larger the number of tar files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3421) RDBDocumentStore: force DB2 to use a clustered index

2015-11-09 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-3421:

Fix Version/s: 1.0.24

> RDBDocumentStore: force DB2 to use a clustered index
> 
>
> Key: OAK-3421
> URL: https://issues.apache.org/jira/browse/OAK-3421
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Affects Versions: 1.2.7, 1.3.10, 1.0.23
>Reporter: Julian Reschke
>Assignee: Julian Reschke
> Fix For: 1.3.11, 1.2.8, 1.0.24
>
> Attachments: OAK-3421.diff, RDBLastRevRecoveryPerfTest.java
>
>
> DB2 by default does not create a clustered index; consider to force it to do 
> so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3421) RDBDocumentStore: force DB2 to use a clustered index

2015-11-09 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-3421:

Affects Version/s: 1.3.10
   1.2.7
   1.0.23

> RDBDocumentStore: force DB2 to use a clustered index
> 
>
> Key: OAK-3421
> URL: https://issues.apache.org/jira/browse/OAK-3421
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Affects Versions: 1.2.7, 1.3.10, 1.0.23
>Reporter: Julian Reschke
>Assignee: Julian Reschke
> Attachments: OAK-3421.diff, RDBLastRevRecoveryPerfTest.java
>
>
> DB2 by default does not create a clustered index; consider to force it to do 
> so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (OAK-3603) Evaluate skipping cleanup of a subset of tar files

2015-11-09 Thread Alex Parvulescu (JIRA)
Alex Parvulescu created OAK-3603:


 Summary: Evaluate skipping cleanup of a subset of tar files
 Key: OAK-3603
 URL: https://issues.apache.org/jira/browse/OAK-3603
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: segmentmk
Reporter: Alex Parvulescu
Assignee: Alex Parvulescu


Given the fact that tar readers are immutable (we only create new generations 
of them once they reach a certain threshold of garbage) we can consider coming 
up with a heuristic for skipping cleanup entirely for consequent cleanup calls 
based on the same referenced id set (provided we can make this set more stable, 
aka. OAK-2849).

Ex: for a specific input set a cleanup call on a tar reader might decide that 
there's no enough garbage (some IO involved in reading through all existing 
entries). if the following cleanup cycle would have the exact same input, it 
doesn't make sense to recheck the tar file, we already know cleanup can be 
skipped, moreover we can skip the older tar files too, as their input would 
also not change. the gains increase the larger the number of tar files.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3421) RDBDocumentStore: force DB2 to use a clustered index

2015-11-09 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-3421:

Fix Version/s: 1.3.11

> RDBDocumentStore: force DB2 to use a clustered index
> 
>
> Key: OAK-3421
> URL: https://issues.apache.org/jira/browse/OAK-3421
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Affects Versions: 1.2.7, 1.3.10, 1.0.23
>Reporter: Julian Reschke
>Assignee: Julian Reschke
> Fix For: 1.3.11
>
> Attachments: OAK-3421.diff, RDBLastRevRecoveryPerfTest.java
>
>
> DB2 by default does not create a clustered index; consider to force it to do 
> so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3092) Cache recently extracted text to avoid duplicate extraction

2015-11-09 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997983#comment-14997983
 ] 

Chetan Mehrotra commented on OAK-3092:
--

bq. how complicated would it be to try to purge old/unneeded entries from the 
cache based on the referencing content being removed

[~alexparvulescu] Currently the cached entries are evicted by expiry (default 5 
mins). So this would ensure that such extracted text does not waste memory 
beyond the indexing cycle. Would that be sufficient or you are referring to 
something else?

> Cache recently extracted text to avoid duplicate extraction
> ---
>
> Key: OAK-3092
> URL: https://issues.apache.org/jira/browse/OAK-3092
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>  Labels: performance
> Fix For: 1.3.11
>
> Attachments: OAK-3092-v1.patch, OAK-3092-v2.patch
>
>
> It can happen that text can be extracted from same binary multiple times in a 
> given indexing cycle. This can happen due to 2 reasons
> # Multiple Lucene indexes indexing same node - A system might have multiple 
> Lucene indexes e.g. a global Lucene index and an index for specific nodeType. 
> In a given indexing cycle same file would be picked up by both index 
> definition and both would extract same text
> # Aggregation - With Index time aggregation same file get picked up multiple 
> times due to aggregation rules
> To avoid the wasted effort for duplicate text extraction from same file in a 
> given indexing cycle it would be better to have an expiring cache which can 
> hold on to extracted text content for some time. The cache should have 
> following features
> # Limit on total size
> # Way to expire the content using [Timed 
> Evicition|https://code.google.com/p/guava-libraries/wiki/CachesExplained#Timed_Eviction]
>  - As chances of same file getting picked up are high only for a given 
> indexing cycle it would be better to expire the cache entries after some time 
> to avoid hogging memory unnecessarily 
> Such a cache would provide following benefit
> # Avoid duplicate text extraction - Text extraction is costly and has to be 
> minimized on critical path of {{indexEditor}}
> # Avoid expensive IO specially if binary content are to be fetched from a 
> remote {{BlobStore}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3606) Improvements for IndexStatsMBean usage

2015-11-09 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996966#comment-14996966
 ] 

Thierry Ygé commented on OAK-3606:
--

[~tmueller] Let me know if I can provide this suggested patch also in a github 
pull request with this reference. Or maybe you can use the patch and "refactor" 
it if the namings of the classes/methods are not satisfying.


> Improvements for IndexStatsMBean usage
> --
>
> Key: OAK-3606
> URL: https://issues.apache.org/jira/browse/OAK-3606
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.3.9
>Reporter: Thierry Ygé
> Attachments: adding_new_MBean.patch, 
> new_mbean_interface_and_implementation.patch
>
>
> When running integration tests, it is common to have the need to wait for the 
> async indexes to have been executed. So that the test can successfully 
> validate operations that depend on the search result.
> With the current IndexStatsMBean implementation it cannot return the start 
> time of the last successful indexing. It provide a "LastIndexedTime" which is 
> not sufficient to know if changes made recently are now indexed.
> The idea is to set the start time as value of a new attribute (i.e 
> "StartLastSuccessIndexedTime") to the IndexStatsMBean.
> Then create a new Mbean that calculate from all existing IndexStatsMBean (as 
> multiple are possible now) the oldest "StartLastSuccessIndexedTime".
> That will allow integration tests to be able to wait until that oldest 
> "StartLastSuccessIndexedTime" is greater than the time it started to wait.
> Attached is a sample patch containing the necessary changes (for a Oak core 
> 1.4.0-SNAPSHOT).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (OAK-3606) Improvements for IndexStatsMBean usage

2015-11-09 Thread JIRA
Thierry Ygé created OAK-3606:


 Summary: Improvements for IndexStatsMBean usage
 Key: OAK-3606
 URL: https://issues.apache.org/jira/browse/OAK-3606
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: core
Affects Versions: 1.3.9
Reporter: Thierry Ygé


When running integration tests, it is common to have the need to wait for the 
async indexes to have been executed. So that the test can successfully validate 
operations that depend on the search result.

With the current IndexStatsMBean implementation it cannot return the start time 
of the last successful indexing. It provide a "LastIndexedTime" which is not 
sufficient to know if changes made recently are now indexed.

The idea is to set the start time as value of a new attribute (i.e 
"StartLastSuccessIndexedTime") to the IndexStatsMBean.

Then create a new Mbean that calculate from all existing IndexStatsMBean (as 
multiple are possible now) the oldest "StartLastSuccessIndexedTime".

That will allow integration tests to be able to wait until that oldest 
"StartLastSuccessIndexedTime" is greater than the time it started to wait.

Attached is a sample patch containing the necessary changes (for a Oak core 
1.4.0-SNAPSHOT).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3606) Improvements for IndexStatsMBean usage

2015-11-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/OAK-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thierry Ygé updated OAK-3606:
-
Attachment: new_mbean_interface_and_implementation.patch
adding_new_MBean.patch

> Improvements for IndexStatsMBean usage
> --
>
> Key: OAK-3606
> URL: https://issues.apache.org/jira/browse/OAK-3606
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.3.9
>Reporter: Thierry Ygé
> Attachments: adding_new_MBean.patch, 
> new_mbean_interface_and_implementation.patch
>
>
> When running integration tests, it is common to have the need to wait for the 
> async indexes to have been executed. So that the test can successfully 
> validate operations that depend on the search result.
> With the current IndexStatsMBean implementation it cannot return the start 
> time of the last successful indexing. It provide a "LastIndexedTime" which is 
> not sufficient to know if changes made recently are now indexed.
> The idea is to set the start time as value of a new attribute (i.e 
> "StartLastSuccessIndexedTime") to the IndexStatsMBean.
> Then create a new Mbean that calculate from all existing IndexStatsMBean (as 
> multiple are possible now) the oldest "StartLastSuccessIndexedTime".
> That will allow integration tests to be able to wait until that oldest 
> "StartLastSuccessIndexedTime" is greater than the time it started to wait.
> Attached is a sample patch containing the necessary changes (for a Oak core 
> 1.4.0-SNAPSHOT).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (OAK-3605) RDBBlob/DocumentStore: reduce class complexity

2015-11-09 Thread Julian Reschke (JIRA)
Julian Reschke created OAK-3605:
---

 Summary: RDBBlob/DocumentStore: reduce class complexity
 Key: OAK-3605
 URL: https://issues.apache.org/jira/browse/OAK-3605
 Project: Jackrabbit Oak
  Issue Type: Technical task
  Components: rdbmk
Reporter: Julian Reschke
Assignee: Julian Reschke
Priority: Minor


- RDBConnectionHandler: move methods unrelated to connection handling elsewhere
- RDBBlob/DocumentStore: extract low-level JDBC related code into separate 
classes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3604) RDBDocumentStore: update JDBC drivers for PostgresQL, MySQL, and Derby

2015-11-09 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-3604:

Fix Version/s: 1.3.11

> RDBDocumentStore: update JDBC drivers for PostgresQL, MySQL, and Derby
> --
>
> Key: OAK-3604
> URL: https://issues.apache.org/jira/browse/OAK-3604
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Affects Versions: 1.2.7, 1.3.10, 1.0.23
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Trivial
> Fix For: 1.3.11
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3604) RDBDocumentStore: update JDBC drivers for PostgresQL, MySQL, and Derby

2015-11-09 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-3604:

Fix Version/s: 1.0.24

> RDBDocumentStore: update JDBC drivers for PostgresQL, MySQL, and Derby
> --
>
> Key: OAK-3604
> URL: https://issues.apache.org/jira/browse/OAK-3604
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Affects Versions: 1.2.7, 1.3.10, 1.0.23
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Trivial
> Fix For: 1.3.11, 1.2.8, 1.0.24
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3604) RDBDocumentStore: update JDBC drivers for PostgresQL, MySQL, and Derby

2015-11-09 Thread Julian Reschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-3604:

Fix Version/s: 1.2.8

> RDBDocumentStore: update JDBC drivers for PostgresQL, MySQL, and Derby
> --
>
> Key: OAK-3604
> URL: https://issues.apache.org/jira/browse/OAK-3604
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Affects Versions: 1.2.7, 1.3.10, 1.0.23
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Trivial
> Fix For: 1.3.11, 1.2.8
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (OAK-3604) RDBDocumentStore: update JDBC drivers for PostgresQL, MySQL, and Derby

2015-11-09 Thread Julian Reschke (JIRA)
Julian Reschke created OAK-3604:
---

 Summary: RDBDocumentStore: update JDBC drivers for PostgresQL, 
MySQL, and Derby
 Key: OAK-3604
 URL: https://issues.apache.org/jira/browse/OAK-3604
 Project: Jackrabbit Oak
  Issue Type: Technical task
  Components: rdbmk
Affects Versions: 1.0.23, 1.2.7, 1.3.10
Reporter: Julian Reschke
Assignee: Julian Reschke
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3111) Reconsider check for max node name length

2015-11-09 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996762#comment-14996762
 ] 

Tomek Rękawek commented on OAK-3111:


I created a [branch|https://github.com/trekawek/jackrabbit-oak/tree/OAK-3111] 
with the new 
[LongNameTest|https://github.com/trekawek/jackrabbit-oak/blob/OAK-3111/oak-upgrade/src/test/java/org/apache/jackrabbit/oak/upgrade/LongNameTest.java].
 Apparently, migrating a node with a long name may cause an exception. I'll 
look into this.

> Reconsider check for max node name length
> -
>
> Key: OAK-3111
> URL: https://issues.apache.org/jira/browse/OAK-3111
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: upgrade
>Reporter: Julian Sedding
>Priority: Minor
>
> In OAK-2619 the necessity of a check for node name length was briefly 
> discussed. It may be worthwhile to write a test case for upgrading long node 
> names and find out what happens with and without the check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (OAK-3607) Enable caching of extracted text by default

2015-11-09 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-3607:


 Summary: Enable caching of extracted text by default
 Key: OAK-3607
 URL: https://issues.apache.org/jira/browse/OAK-3607
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: lucene
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
Priority: Minor
 Fix For: 1.3.11


Followup issue for OAK-3092 meant to enable that feature by default. So by 
default the cache size would be set to 20 MB and expiry set to 5 mins

Also later we should be enabling this in branches also



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3092) Cache recently extracted text to avoid duplicate extraction

2015-11-09 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14998070#comment-14998070
 ] 

Chetan Mehrotra commented on OAK-3092:
--

Opened OAK-3607 for enabling this feature by default, starting with trunk and 
later to be merged to branches

> Cache recently extracted text to avoid duplicate extraction
> ---
>
> Key: OAK-3092
> URL: https://issues.apache.org/jira/browse/OAK-3092
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>  Labels: performance
> Fix For: 1.3.11, 1.2.8, 1.0.24
>
> Attachments: OAK-3092-v1.patch, OAK-3092-v2.patch
>
>
> It can happen that text can be extracted from same binary multiple times in a 
> given indexing cycle. This can happen due to 2 reasons
> # Multiple Lucene indexes indexing same node - A system might have multiple 
> Lucene indexes e.g. a global Lucene index and an index for specific nodeType. 
> In a given indexing cycle same file would be picked up by both index 
> definition and both would extract same text
> # Aggregation - With Index time aggregation same file get picked up multiple 
> times due to aggregation rules
> To avoid the wasted effort for duplicate text extraction from same file in a 
> given indexing cycle it would be better to have an expiring cache which can 
> hold on to extracted text content for some time. The cache should have 
> following features
> # Limit on total size
> # Way to expire the content using [Timed 
> Evicition|https://code.google.com/p/guava-libraries/wiki/CachesExplained#Timed_Eviction]
>  - As chances of same file getting picked up are high only for a given 
> indexing cycle it would be better to expire the cache entries after some time 
> to avoid hogging memory unnecessarily 
> Such a cache would provide following benefit
> # Avoid duplicate text extraction - Text extraction is costly and has to be 
> minimized on critical path of {{indexEditor}}
> # Avoid expensive IO specially if binary content are to be fetched from a 
> remote {{BlobStore}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (OAK-3608) Compare of node states on branch may be incorrect

2015-11-09 Thread Marcel Reutegger (JIRA)
Marcel Reutegger created OAK-3608:
-

 Summary: Compare of node states on branch may be incorrect
 Key: OAK-3608
 URL: https://issues.apache.org/jira/browse/OAK-3608
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: core, documentmk
Affects Versions: 1.2, 1.0
Reporter: Marcel Reutegger
Assignee: Marcel Reutegger
Priority: Minor
 Fix For: 1.3.11


In some cases comparing a branch node state with its base state does not report 
all changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (OAK-3610) BroadcastTest fails when connected with VPN client

2015-11-09 Thread Marcel Reutegger (JIRA)
Marcel Reutegger created OAK-3610:
-

 Summary: BroadcastTest fails when connected with VPN client
 Key: OAK-3610
 URL: https://issues.apache.org/jira/browse/OAK-3610
 Project: Jackrabbit Oak
  Issue Type: Test
  Components: core, documentmk
Affects Versions: 1.3.10
 Environment: Mac OS X 10.10.5 / Cisco AnyConnect 3.1.09013
Reporter: Marcel Reutegger
Assignee: Thomas Mueller
Priority: Minor
 Fix For: 1.3.11


Failed tests:

broadcastEncryptedUDP(org.apache.jackrabbit.oak.plugins.document.persistentCache.BroadcastTest):
 min: 50 got: 0
  
broadcastUDP(org.apache.jackrabbit.oak.plugins.document.persistentCache.BroadcastTest):
 min: 50 got: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (OAK-3609) Enable CopyOnWrite by default

2015-11-09 Thread Chetan Mehrotra (JIRA)
Chetan Mehrotra created OAK-3609:


 Summary: Enable CopyOnWrite by default
 Key: OAK-3609
 URL: https://issues.apache.org/jira/browse/OAK-3609
 Project: Jackrabbit Oak
  Issue Type: Task
  Components: lucene
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
Priority: Minor
 Fix For: 1.3.11


Task to track enabling CopyOnWrite feature (OAK-2247) by default

Later this needs to be enabled by default on branches also



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-2689) Test failure: QueryResultTest.testGetSize

2015-11-09 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-2689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14998109#comment-14998109
 ] 

Marcel Reutegger commented on OAK-2689:
---

[~tmueller], can we also merge this change into the 1.2 and 1.0 branches? The 
test also fails there occasionally. See most recent failure on 1.0 branch: 
https://travis-ci.org/apache/jackrabbit-oak/builds/90120482 

> Test failure: QueryResultTest.testGetSize
> -
>
> Key: OAK-2689
> URL: https://issues.apache.org/jira/browse/OAK-2689
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
> Environment: Jenkins, Ubuntu: 
> https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/
>Reporter: Michael Dürig
>Assignee: Thomas Mueller
>  Labels: CI, Jenkins
> Fix For: 1.3.9
>
>
> {{org.apache.jackrabbit.core.query.QueryResultTest.testGetSize}} fails every 
> couple of builds:
> {noformat}
> junit.framework.AssertionFailedError: Wrong size of NodeIterator in result 
> expected:<48> but was:<-1>
>   at junit.framework.Assert.fail(Assert.java:50)
>   at junit.framework.Assert.failNotEquals(Assert.java:287)
>   at junit.framework.Assert.assertEquals(Assert.java:67)
>   at junit.framework.Assert.assertEquals(Assert.java:134)
>   at 
> org.apache.jackrabbit.core.query.QueryResultTest.testGetSize(QueryResultTest.java:47)
> {noformat}
> Failure seen at builds: 29, 39, 59, 61, 114, 117, 118, 120, 139, 142
> See e.g. 
> https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/59/jdk=jdk-1.6u45,label=Ubuntu,nsfixtures=DOCUMENT_NS,profile=unittesting/testReport/junit/org.apache.jackrabbit.core.query/QueryResultTest/testGetSize/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3607) Enable caching of extracted text by default

2015-11-09 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-3607:
-
Labels: candidate_oak_1_0 candidate_oak_1_2  (was: )

> Enable caching of extracted text by default
> ---
>
> Key: OAK-3607
> URL: https://issues.apache.org/jira/browse/OAK-3607
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
>  Labels: candidate_oak_1_0, candidate_oak_1_2
> Fix For: 1.3.11
>
>
> Followup issue for OAK-3092 meant to enable that feature by default. So by 
> default the cache size would be set to 20 MB and expiry set to 5 mins
> Also later we should be enabling this in branches also



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3092) Cache recently extracted text to avoid duplicate extraction

2015-11-09 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14998068#comment-14998068
 ] 

Chetan Mehrotra commented on OAK-3092:
--

Changes committed
* trunk - 1713580
* 1.0 - 1713582
* 1.2 - 1713584

> Cache recently extracted text to avoid duplicate extraction
> ---
>
> Key: OAK-3092
> URL: https://issues.apache.org/jira/browse/OAK-3092
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>  Labels: performance
> Fix For: 1.3.11, 1.2.8, 1.0.24
>
> Attachments: OAK-3092-v1.patch, OAK-3092-v2.patch
>
>
> It can happen that text can be extracted from same binary multiple times in a 
> given indexing cycle. This can happen due to 2 reasons
> # Multiple Lucene indexes indexing same node - A system might have multiple 
> Lucene indexes e.g. a global Lucene index and an index for specific nodeType. 
> In a given indexing cycle same file would be picked up by both index 
> definition and both would extract same text
> # Aggregation - With Index time aggregation same file get picked up multiple 
> times due to aggregation rules
> To avoid the wasted effort for duplicate text extraction from same file in a 
> given indexing cycle it would be better to have an expiring cache which can 
> hold on to extracted text content for some time. The cache should have 
> following features
> # Limit on total size
> # Way to expire the content using [Timed 
> Evicition|https://code.google.com/p/guava-libraries/wiki/CachesExplained#Timed_Eviction]
>  - As chances of same file getting picked up are high only for a given 
> indexing cycle it would be better to expire the cache entries after some time 
> to avoid hogging memory unnecessarily 
> Such a cache would provide following benefit
> # Avoid duplicate text extraction - Text extraction is costly and has to be 
> minimized on critical path of {{indexEditor}}
> # Avoid expensive IO specially if binary content are to be fetched from a 
> remote {{BlobStore}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3092) Cache recently extracted text to avoid duplicate extraction

2015-11-09 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-3092:
-
Fix Version/s: 1.0.24
   1.2.8

> Cache recently extracted text to avoid duplicate extraction
> ---
>
> Key: OAK-3092
> URL: https://issues.apache.org/jira/browse/OAK-3092
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>  Labels: performance
> Fix For: 1.3.11, 1.2.8, 1.0.24
>
> Attachments: OAK-3092-v1.patch, OAK-3092-v2.patch
>
>
> It can happen that text can be extracted from same binary multiple times in a 
> given indexing cycle. This can happen due to 2 reasons
> # Multiple Lucene indexes indexing same node - A system might have multiple 
> Lucene indexes e.g. a global Lucene index and an index for specific nodeType. 
> In a given indexing cycle same file would be picked up by both index 
> definition and both would extract same text
> # Aggregation - With Index time aggregation same file get picked up multiple 
> times due to aggregation rules
> To avoid the wasted effort for duplicate text extraction from same file in a 
> given indexing cycle it would be better to have an expiring cache which can 
> hold on to extracted text content for some time. The cache should have 
> following features
> # Limit on total size
> # Way to expire the content using [Timed 
> Evicition|https://code.google.com/p/guava-libraries/wiki/CachesExplained#Timed_Eviction]
>  - As chances of same file getting picked up are high only for a given 
> indexing cycle it would be better to expire the cache entries after some time 
> to avoid hogging memory unnecessarily 
> Such a cache would provide following benefit
> # Avoid duplicate text extraction - Text extraction is costly and has to be 
> minimized on critical path of {{indexEditor}}
> # Avoid expensive IO specially if binary content are to be fetched from a 
> remote {{BlobStore}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (OAK-3607) Enable caching of extracted text by default

2015-11-09 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-3607.
--
Resolution: Fixed

Enabled it with 1713585

> Enable caching of extracted text by default
> ---
>
> Key: OAK-3607
> URL: https://issues.apache.org/jira/browse/OAK-3607
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
>  Labels: candidate_oak_1_0, candidate_oak_1_2
> Fix For: 1.3.11
>
>
> Followup issue for OAK-3092 meant to enable that feature by default. So by 
> default the cache size would be set to 20 MB and expiry set to 5 mins
> Also later we should be enabling this in branches also



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (OAK-3092) Cache recently extracted text to avoid duplicate extraction

2015-11-09 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-3092.
--
Resolution: Fixed

> Cache recently extracted text to avoid duplicate extraction
> ---
>
> Key: OAK-3092
> URL: https://issues.apache.org/jira/browse/OAK-3092
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>  Labels: performance
> Fix For: 1.3.11, 1.2.8, 1.0.24
>
> Attachments: OAK-3092-v1.patch, OAK-3092-v2.patch
>
>
> It can happen that text can be extracted from same binary multiple times in a 
> given indexing cycle. This can happen due to 2 reasons
> # Multiple Lucene indexes indexing same node - A system might have multiple 
> Lucene indexes e.g. a global Lucene index and an index for specific nodeType. 
> In a given indexing cycle same file would be picked up by both index 
> definition and both would extract same text
> # Aggregation - With Index time aggregation same file get picked up multiple 
> times due to aggregation rules
> To avoid the wasted effort for duplicate text extraction from same file in a 
> given indexing cycle it would be better to have an expiring cache which can 
> hold on to extracted text content for some time. The cache should have 
> following features
> # Limit on total size
> # Way to expire the content using [Timed 
> Evicition|https://code.google.com/p/guava-libraries/wiki/CachesExplained#Timed_Eviction]
>  - As chances of same file getting picked up are high only for a given 
> indexing cycle it would be better to expire the cache entries after some time 
> to avoid hogging memory unnecessarily 
> Such a cache would provide following benefit
> # Avoid duplicate text extraction - Text extraction is costly and has to be 
> minimized on critical path of {{indexEditor}}
> # Avoid expensive IO specially if binary content are to be fetched from a 
> remote {{BlobStore}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3092) Cache recently extracted text to avoid duplicate extraction

2015-11-09 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-3092:
-
Attachment: OAK-3092-v2.patch

[updated patch|^OAK-3092-v2.patch] which fixes OSGi issue around CacheStats by 
inlining the class

> Cache recently extracted text to avoid duplicate extraction
> ---
>
> Key: OAK-3092
> URL: https://issues.apache.org/jira/browse/OAK-3092
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>  Labels: performance
> Fix For: 1.3.11
>
> Attachments: OAK-3092-v1.patch, OAK-3092-v2.patch
>
>
> It can happen that text can be extracted from same binary multiple times in a 
> given indexing cycle. This can happen due to 2 reasons
> # Multiple Lucene indexes indexing same node - A system might have multiple 
> Lucene indexes e.g. a global Lucene index and an index for specific nodeType. 
> In a given indexing cycle same file would be picked up by both index 
> definition and both would extract same text
> # Aggregation - With Index time aggregation same file get picked up multiple 
> times due to aggregation rules
> To avoid the wasted effort for duplicate text extraction from same file in a 
> given indexing cycle it would be better to have an expiring cache which can 
> hold on to extracted text content for some time. The cache should have 
> following features
> # Limit on total size
> # Way to expire the content using [Timed 
> Evicition|https://code.google.com/p/guava-libraries/wiki/CachesExplained#Timed_Eviction]
>  - As chances of same file getting picked up are high only for a given 
> indexing cycle it would be better to expire the cache entries after some time 
> to avoid hogging memory unnecessarily 
> Such a cache would provide following benefit
> # Avoid duplicate text extraction - Text extraction is costly and has to be 
> minimized on critical path of {{indexEditor}}
> # Avoid expensive IO specially if binary content are to be fetched from a 
> remote {{BlobStore}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3598) Export org.apache.jackrabbit.oak.cache package from oak-core

2015-11-09 Thread Davide Giannella (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996229#comment-14996229
 ] 

Davide Giannella commented on OAK-3598:
---

Don't know exactly what's in there, but +1 on [~mreutegg]'s suggestion. Sounds 
like something should go in commons.

> Export org.apache.jackrabbit.oak.cache package from oak-core
> 
>
> Key: OAK-3598
> URL: https://issues.apache.org/jira/browse/OAK-3598
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: cache
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.3.10
>
>
> For OAK-3092 oak-lucene would need to access classes from 
> {{org.apache.jackrabbit.oak.cache}} package. For now its limited to 
> {{CacheStats}} to expose the cache related statistics.
> This task is meant to determine steps needed to export the package 
> * Update the pom.xml to export the package
> * Review current set of classes to see if they need to be reviewed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3092) Cache recently extracted text to avoid duplicate extraction

2015-11-09 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-3092:
-
Attachment: OAK-3092-v1.patch

[patch|^OAK-3092-v1.patch] implementing above mentioned approach

* Exposed 2 OSGi config - Size of cache and expiry time for cached entries
* Setting cache size to 0 would disable the cache. 
* By default the cache is disabled. Once the feature is validated in actual use 
the default would be changed
* CacheStateMBean is exposed if the cache is enabled

[~alexparvulescu] [~edivad] Can you review the patch?

> Cache recently extracted text to avoid duplicate extraction
> ---
>
> Key: OAK-3092
> URL: https://issues.apache.org/jira/browse/OAK-3092
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>  Labels: performance
> Fix For: 1.3.10
>
> Attachments: OAK-3092-v1.patch
>
>
> It can happen that text can be extracted from same binary multiple times in a 
> given indexing cycle. This can happen due to 2 reasons
> # Multiple Lucene indexes indexing same node - A system might have multiple 
> Lucene indexes e.g. a global Lucene index and an index for specific nodeType. 
> In a given indexing cycle same file would be picked up by both index 
> definition and both would extract same text
> # Aggregation - With Index time aggregation same file get picked up multiple 
> times due to aggregation rules
> To avoid the wasted effort for duplicate text extraction from same file in a 
> given indexing cycle it would be better to have an expiring cache which can 
> hold on to extracted text content for some time. The cache should have 
> following features
> # Limit on total size
> # Way to expire the content using [Timed 
> Evicition|https://code.google.com/p/guava-libraries/wiki/CachesExplained#Timed_Eviction]
>  - As chances of same file getting picked up are high only for a given 
> indexing cycle it would be better to expire the cache entries after some time 
> to avoid hogging memory unnecessarily 
> Such a cache would provide following benefit
> # Avoid duplicate text extraction - Text extraction is costly and has to be 
> minimized on critical path of {{indexEditor}}
> # Avoid expensive IO specially if binary content are to be fetched from a 
> remote {{BlobStore}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (OAK-3092) Cache recently extracted text to avoid duplicate extraction

2015-11-09 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996247#comment-14996247
 ] 

Chetan Mehrotra edited comment on OAK-3092 at 11/9/15 9:29 AM:
---

[patch|^OAK-3092-v1.patch] implementing above mentioned approach

* Exposed 2 OSGi config - Size of cache and expiry time for cached entries
* Setting cache size to 0 would disable the cache. 
* By default the cache is disabled. Once the feature is validated in actual use 
the default would be changed
* CacheStateMBean is exposed if the cache is enabled

The patch would need minor tweaks once OAK-3598 is resolved

[~alexparvulescu] [~edivad] Can you review the patch?


was (Author: chetanm):
[patch|^OAK-3092-v1.patch] implementing above mentioned approach

* Exposed 2 OSGi config - Size of cache and expiry time for cached entries
* Setting cache size to 0 would disable the cache. 
* By default the cache is disabled. Once the feature is validated in actual use 
the default would be changed
* CacheStateMBean is exposed if the cache is enabled

[~alexparvulescu] [~edivad] Can you review the patch?

> Cache recently extracted text to avoid duplicate extraction
> ---
>
> Key: OAK-3092
> URL: https://issues.apache.org/jira/browse/OAK-3092
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>  Labels: performance
> Fix For: 1.3.10
>
> Attachments: OAK-3092-v1.patch
>
>
> It can happen that text can be extracted from same binary multiple times in a 
> given indexing cycle. This can happen due to 2 reasons
> # Multiple Lucene indexes indexing same node - A system might have multiple 
> Lucene indexes e.g. a global Lucene index and an index for specific nodeType. 
> In a given indexing cycle same file would be picked up by both index 
> definition and both would extract same text
> # Aggregation - With Index time aggregation same file get picked up multiple 
> times due to aggregation rules
> To avoid the wasted effort for duplicate text extraction from same file in a 
> given indexing cycle it would be better to have an expiring cache which can 
> hold on to extracted text content for some time. The cache should have 
> following features
> # Limit on total size
> # Way to expire the content using [Timed 
> Evicition|https://code.google.com/p/guava-libraries/wiki/CachesExplained#Timed_Eviction]
>  - As chances of same file getting picked up are high only for a given 
> indexing cycle it would be better to expire the cache entries after some time 
> to avoid hogging memory unnecessarily 
> Such a cache would provide following benefit
> # Avoid duplicate text extraction - Text extraction is costly and has to be 
> minimized on critical path of {{indexEditor}}
> # Avoid expensive IO specially if binary content are to be fetched from a 
> remote {{BlobStore}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3598) Export org.apache.jackrabbit.oak.cache package from oak-core

2015-11-09 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996348#comment-14996348
 ] 

Chetan Mehrotra commented on OAK-3598:
--

Had a discussion with [~mreutegg]. For now (at least for merge to branch) we 
would inline the CacheStats in oak-lucene (similar to way we dealt with 
ImmutableTree OAK-2270) as a short term solution. On longer term we would need 
to cleanup the cache package and export it in supportable manner

> Export org.apache.jackrabbit.oak.cache package from oak-core
> 
>
> Key: OAK-3598
> URL: https://issues.apache.org/jira/browse/OAK-3598
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: cache
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.3.11
>
>
> For OAK-3092 oak-lucene would need to access classes from 
> {{org.apache.jackrabbit.oak.cache}} package. For now its limited to 
> {{CacheStats}} to expose the cache related statistics.
> This task is meant to determine steps needed to export the package 
> * Update the pom.xml to export the package
> * Review current set of classes to see if they need to be reviewed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3598) Export org.apache.jackrabbit.oak.cache package from oak-core

2015-11-09 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-3598:
-
Issue Type: Improvement  (was: Technical task)
Parent: (was: OAK-3092)

> Export org.apache.jackrabbit.oak.cache package from oak-core
> 
>
> Key: OAK-3598
> URL: https://issues.apache.org/jira/browse/OAK-3598
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: cache
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.3.11
>
>
> For OAK-3092 oak-lucene would need to access classes from 
> {{org.apache.jackrabbit.oak.cache}} package. For now its limited to 
> {{CacheStats}} to expose the cache related statistics.
> This task is meant to determine steps needed to export the package 
> * Update the pom.xml to export the package
> * Review current set of classes to see if they need to be reviewed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3598) Export cache related classes for usage in other oak bundle

2015-11-09 Thread Chetan Mehrotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-3598:
-
Summary: Export cache related classes for usage in other oak bundle  (was: 
Export org.apache.jackrabbit.oak.cache package from oak-core)

> Export cache related classes for usage in other oak bundle
> --
>
> Key: OAK-3598
> URL: https://issues.apache.org/jira/browse/OAK-3598
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: cache
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.3.11
>
>
> For OAK-3092 oak-lucene would need to access classes from 
> {{org.apache.jackrabbit.oak.cache}} package. For now its limited to 
> {{CacheStats}} to expose the cache related statistics.
> This task is meant to determine steps needed to export the package 
> * Update the pom.xml to export the package
> * Review current set of classes to see if they need to be reviewed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (OAK-3601) oak-upgrade CLI tool should validate the directories

2015-11-09 Thread JIRA
Tomek Rękawek created OAK-3601:
--

 Summary: oak-upgrade CLI tool should validate the directories
 Key: OAK-3601
 URL: https://issues.apache.org/jira/browse/OAK-3601
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: upgrade
Affects Versions: 1.3.9
Reporter: Tomek Rękawek


The {{oak-upgrade}} tool accepts a few parameters requiring the directory path. 
Sometimes it's path to the repository home, in other cases segmenstore or 
datastore. The tool should use some heuristics to find out if the right 
directory has been passed and reject invalid values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3001) Simplify JournalGarbageCollector using a dedicated timestamp property

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3001:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Simplify JournalGarbageCollector using a dedicated timestamp property
> -
>
> Key: OAK-3001
> URL: https://issues.apache.org/jira/browse/OAK-3001
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Stefan Egli
>Assignee: Stefan Egli
>Priority: Critical
>  Labels: scalability
> Fix For: 1.3.11, 1.2.8
>
>
> This subtask is about spawning out a 
> [comment|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585733=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585733]
>  from [~chetanm] re JournalGC:
> {quote}
> Further looking at JournalGarbageCollector ... it would be simpler if you 
> record the journal entry timestamp as an attribute in JournalEntry document 
> and then you can delete all the entries which are older than some time by a 
> simple query. This would avoid fetching all the entries to be deleted on the 
> Oak side
> {quote}
> and a corresponding 
> [reply|https://issues.apache.org/jira/browse/OAK-2829?focusedCommentId=14585870=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14585870]
>  from myself:
> {quote}
> Re querying by timestamp: that would indeed be simpler. With the current set 
> of DocumentStore API however, I believe this is not possible. But: 
> [DocumentStore.query|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/document/DocumentStore.java#L127]
>  comes quite close: it would probably just require the opposite of that 
> method too: 
> {code}
> public  List query(Collection collection,
>   String fromKey,
>   String toKey,
>   String indexedProperty,
>   long endValue,
>   int limit) {
> {code}
> .. or what about generalizing this method to have both a {{startValue}} and 
> an {{endValue}} - with {{-1}} indicating when one of them is not used?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3436) Prevent missing checkpoint due to unstable topology from causing complete reindexing

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3436:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Prevent missing checkpoint due to unstable topology from causing complete 
> reindexing
> 
>
> Key: OAK-3436
> URL: https://issues.apache.org/jira/browse/OAK-3436
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.3.11, 1.2.8, 1.0.24
>
> Attachments: AsyncIndexUpdateClusterTest.java, OAK-3436-0.patch
>
>
> Async indexing logic relies on embedding application to ensure that async 
> indexing job is run as a singleton in a cluster. For Sling based apps it 
> depends on Sling Discovery support. At times it is being seen that if 
> topology is not stable then different cluster nodes can consider them as 
> leader and execute the async indexing job concurrently.
> This can cause problem as both cluster node might not see same repository 
> state (due to write skew and eventual consistency) and might remove the 
> checkpoint which other cluster node is still relying upon. For e.g. consider 
> a 2 node cluster N1 and N2 where both are performing async indexing.
> # Base state - CP1 is the checkpoint for "async" job
> # N2 starts indexing and removes changes CP1 to CP2. For Mongo the 
> checkpoints are saved in {{settings}} collection
> # N1 also decides to execute indexing but has yet not seen the latest 
> repository state so still thinks that CP1 is the base checkpoint and tries to 
> read it. However CP1 is already removed from {{settings}} and this makes N1 
> think that checkpoint is missing and it decides to reindex everything!
> To avoid this topology must be stable but at Oak level we should still handle 
> such a case and avoid doing a full reindexing. So we would need to have a 
> {{MissingCheckpointStrategy}} similar to {{MissingIndexEditorStrategy}} as 
> done in OAK-2203 
> Possible approaches
> # A1 - Fail the indexing run if checkpoint is missing - Checkpoint being 
> missing can have valid reason and invalid reason. Need to see what are valid 
> scenarios where a checkpoint can go missing
> # A2 - When a checkpoint is created also store the creation time. When a 
> checkpoint is found to be missing and its a *recent* checkpoint then fail the 
> run. For e.g. we would fail the run till checkpoint found to be missing is 
> less than an hour old (for just started take startup time into account)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-2065) JMX stats for operations being performed in DocumentStore

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-2065:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> JMX stats for operations being performed in DocumentStore
> -
>
> Key: OAK-2065
> URL: https://issues.apache.org/jira/browse/OAK-2065
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: mongomk
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>  Labels: tooling
> Fix For: 1.3.11
>
> Attachments: 
> 0001-OAK-2065-JMX-stats-for-operations-being-performed-in.patch, 
> OAK-2065-1.patch
>
>
> Currently DocumentStore performs various background operations like
> # Cache consistency check
> # Pushing the lastRev updates
> # Synchrnizing the root node version
> We should capture some stats like time taken in various task and expose them 
> over JMX to determine if those background operations are performing well or 
> not. For example its important that all tasks performed in background task 
> should be completed under 1 sec (default polling interval). If the time taken 
> increases then it would be cause of concern
> See http://markmail.org/thread/57fax4nyabbubbef



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-2808) Active deletion of 'deleted' Lucene index files from DataStore without relying on full scale Blob GC

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-2808:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Active deletion of 'deleted' Lucene index files from DataStore without 
> relying on full scale Blob GC
> 
>
> Key: OAK-2808
> URL: https://issues.apache.org/jira/browse/OAK-2808
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Thomas Mueller
>  Labels: datastore, performance
> Fix For: 1.3.11
>
> Attachments: OAK-2808-1.patch, copyonread-stats.png
>
>
> With storing of Lucene index files within DataStore our usage pattern
> of DataStore has changed between JR2 and Oak.
> With JR2 the writes were mostly application based i.e. if application
> stores a pdf/image file then that would be stored in DataStore. JR2 by
> default would not write stuff to DataStore. Further in deployment
> where large number of binary content is present then systems tend to
> share the DataStore to avoid duplication of storage. In such cases
> running Blob GC is a non trivial task as it involves a manual step and
> coordination across multiple deployments. Due to this systems tend to
> delay frequency of GC
> Now with Oak apart from application the Oak system itself *actively*
> uses the DataStore to store the index files for Lucene and there the
> churn might be much higher i.e. frequency of creation and deletion of
> index file is lot higher. This would accelerate the rate of garbage
> generation and thus put lot more pressure on the DataStore storage
> requirements.
> Discussion thread http://markmail.org/thread/iybd3eq2bh372zrl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3362) Estimate compaction based on diff to previous compacted head state

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3362:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Estimate compaction based on diff to previous compacted head state
> --
>
> Key: OAK-3362
> URL: https://issues.apache.org/jira/browse/OAK-3362
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: segmentmk
>Reporter: Alex Parvulescu
>Assignee: Alex Parvulescu
>Priority: Minor
>  Labels: compaction, gc
> Fix For: 1.3.11
>
>
> Food for thought: try to base the compaction estimation on a diff between the 
> latest compacted state and the current state.
> Pros
> * estimation duration would be proportional to number of changes on the 
> current head state
> * using the size on disk as a reference, we could actually stop the 
> estimation early when we go over the gc threshold.
> * data collected during this diff could in theory be passed as input to the 
> compactor so it could focus on compacting a specific subtree
> Cons
> * need to keep a reference to a previous compacted state. post-startup and 
> pre-compaction this might prove difficult (except maybe if we only persist 
> the revision similar to what the async indexer is doing currently)
> * coming up with a threshold for running compaction might prove difficult
> * diff might be costly, but still cheaper than the current full diff



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-318) Excerpt support

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-318:
-
Fix Version/s: (was: 1.3.10)
   1.3.11

> Excerpt support
> ---
>
> Key: OAK-318
> URL: https://issues.apache.org/jira/browse/OAK-318
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: core, query
>Reporter: Alex Parvulescu
> Fix For: 1.3.11
>
>
> Test class: ExcerptTest.
> Right now I only see parse errors:
> Caused by: java.text.ParseException: Query:
> {noformat}
> testroot/*[jcr:contains(., 'jackrabbit')]/rep:excerpt((*).); expected: 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3368) Speed up ExternalPrivateStoreIT and ExternalSharedStoreIT

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3368:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Speed up ExternalPrivateStoreIT and ExternalSharedStoreIT
> -
>
> Key: OAK-3368
> URL: https://issues.apache.org/jira/browse/OAK-3368
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: tarmk-standby
>Reporter: Marcel Reutegger
>Assignee: Manfred Baedke
> Fix For: 1.3.11
>
>
> Both tests run for more than 5 minutes. Most of the time the tests are 
> somehow stuck in shutting down the server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-1819) oak-solr-core test failures on Java 8

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-1819:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> oak-solr-core test failures on Java 8
> -
>
> Key: OAK-1819
> URL: https://issues.apache.org/jira/browse/OAK-1819
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: solr
>Affects Versions: 1.0
> Environment: {noformat}
> Apache Maven 3.1.0 (893ca28a1da9d5f51ac03827af98bb730128f9f2; 2013-06-27 
> 22:15:32-0400)
> Maven home: c:\Program Files\apache-maven-3.1.0
> Java version: 1.8.0, vendor: Oracle Corporation
> Java home: c:\Program Files\Java\jdk1.8.0\jre
> Default locale: en_US, platform encoding: Cp1252
> OS name: "windows 7", version: "6.1", arch: "amd64", family: "dos"
> {noformat}
>Reporter: Jukka Zitting
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: java8, test
> Fix For: 1.3.11
>
>
> The following {{oak-solr-core}} test failures occur when building Oak with 
> Java 8:
> {noformat}
> Failed tests:
>   
> testNativeMLTQuery(org.apache.jackrabbit.oak.plugins.index.solr.query.SolrIndexQueryTest):
>  expected: but was:
>   
> testNativeMLTQueryWithStream(org.apache.jackrabbit.oak.plugins.index.solr.query.SolrIndexQueryTest):
>  expected: but was:
> {noformat}
> The cause of this might well be something as simple as the test case 
> incorrectly expecting a specific ordering of search results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-2911) Analyze inter package dependency in oak-core

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-2911:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Analyze inter package dependency in oak-core
> 
>
> Key: OAK-2911
> URL: https://issues.apache.org/jira/browse/OAK-2911
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: core
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
>  Labels: modularization, technical_debt
> Fix For: 1.3.11
>
> Attachments: oak-core-jdepend-report.html
>
>
> For better code health the packages should have proper inter dependency. Its 
> preferable that various {{plugin}} packages within oak-core have minimal 
> inter dependency and should be able to exist independently.
> Following work need to be performed
> # Check whats the current state
> # Look into ways to ensure that such dependency are minimal and at minimum 
> must not have cycle
> See 
> * 
> http://stackoverflow.com/questions/3416547/maven-jdepend-fail-build-with-cycles
> * https://github.com/andrena/no-package-cycles-enforcer-rule



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3071) Add a compound index for _modified + _id

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3071:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Add a compound index for _modified + _id
> 
>
> Key: OAK-3071
> URL: https://issues.apache.org/jira/browse/OAK-3071
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: mongomk
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>  Labels: performance, resilience
> Fix For: 1.3.11
>
>
> As explained in OAK-1966 diff logic makes a call like
> bq. db.nodes.find({ _id: { $gt: "3:/content/foo/01/", $lt: 
> "3:/content/foo010" }, _modified: { $gte: 1405085300 } }).sort({_id:1})
> For better and deterministic query performance we would need to create a 
> compound index like \{_modified:1, _id:1\}. This index would ensure that 
> Mongo does not have to perform object scan while evaluating such a query.
> Care must be taken that index is only created by default for fresh setup. For 
> existing setup we should expose a JMX operation which can be invoked by 
> system admin to create the required index as per maintenance window



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3092) Cache recently extracted text to avoid duplicate extraction

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3092:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Cache recently extracted text to avoid duplicate extraction
> ---
>
> Key: OAK-3092
> URL: https://issues.apache.org/jira/browse/OAK-3092
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>  Labels: performance
> Fix For: 1.3.11
>
> Attachments: OAK-3092-v1.patch
>
>
> It can happen that text can be extracted from same binary multiple times in a 
> given indexing cycle. This can happen due to 2 reasons
> # Multiple Lucene indexes indexing same node - A system might have multiple 
> Lucene indexes e.g. a global Lucene index and an index for specific nodeType. 
> In a given indexing cycle same file would be picked up by both index 
> definition and both would extract same text
> # Aggregation - With Index time aggregation same file get picked up multiple 
> times due to aggregation rules
> To avoid the wasted effort for duplicate text extraction from same file in a 
> given indexing cycle it would be better to have an expiring cache which can 
> hold on to extracted text content for some time. The cache should have 
> following features
> # Limit on total size
> # Way to expire the content using [Timed 
> Evicition|https://code.google.com/p/guava-libraries/wiki/CachesExplained#Timed_Eviction]
>  - As chances of same file getting picked up are high only for a given 
> indexing cycle it would be better to expire the cache entries after some time 
> to avoid hogging memory unnecessarily 
> Such a cache would provide following benefit
> # Avoid duplicate text extraction - Text extraction is costly and has to be 
> minimized on critical path of {{indexEditor}}
> # Avoid expensive IO specially if binary content are to be fetched from a 
> remote {{BlobStore}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3509) Lucene suggestion results should have 1 row per suggestion with appropriate column names

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3509:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Lucene suggestion results should have 1 row per suggestion with appropriate 
> column names
> 
>
> Key: OAK-3509
> URL: https://issues.apache.org/jira/browse/OAK-3509
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Vikas Saurabh
>Assignee: Tommaso Teofili
>Priority: Minor
> Fix For: 1.3.11
>
>
> Currently suggest query returns just one row with {{rep:suggest()}} column 
> containing a string that needs to be parsed.
> It'd better if each suggestion is returned as individual row with column 
> names such as {{suggestion}}, {{weight}}(???), etc.
> (cc [~teofili])



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3151) Lucene Version should be based on IndexFormatVersion

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3151:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Lucene Version should be based on IndexFormatVersion
> 
>
> Key: OAK-3151
> URL: https://issues.apache.org/jira/browse/OAK-3151
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>  Labels: technical_debt
> Fix For: 1.3.11
>
>
> Currently in oak-lucene where ever call is made to Lucene it passes 
> Version.LUCENE_47 as hardcoded version. To enable easier upgrade of Lucene 
> and hence change of defaults for fresh setup this version should be instead 
> based on {{IndexFormatVersion}}.
> Say
> * For IndexFormatVersion set to V2 (current default) - Lucene version used is 
> LUCENE_47
> * For IndexFormatVersion set to V3 (proposed) - Lucene version used would be 
> per Lucene library version
> If the index is reindexed then it would automatically be updated to the 
> latest revision



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3319) Disabling IndexRule inheritence is not working in all cases

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3319:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Disabling IndexRule inheritence is not working in all cases
> ---
>
> Key: OAK-3319
> URL: https://issues.apache.org/jira/browse/OAK-3319
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.3.11
>
>
> IndexRules are inherited by default i.e. a rule defined for nt:hierrachyNode 
> is also applicable for nt:folder (nt:folder extends nt:hierrachyNode). Lucene 
> indexing supports {{inherited}} floag (defaults to true). If this is set to 
> false then inheritance is disabled.
> As per current implementation disabling works fine on indexing side. An node 
> which is not having an explicit indexRule defined is not indexed. However 
> same is not working on query side i.e. IndexPlanner would still opt in for a 
> given query ignoring the fact that inheritance is disabled 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3193) Integrate with Felix WebConsole

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3193:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Integrate with Felix WebConsole
> ---
>
> Key: OAK-3193
> URL: https://issues.apache.org/jira/browse/OAK-3193
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: webapp
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.3.11
>
>
> To allow better debugging support of repository setup it would be useful if 
> Felix WebConsole is configured with the webapp. This would allow easier 
> access to OSGi runtime state



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3554) Use write concern of w:majority when connected to a replica set

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3554:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Use write concern of w:majority when connected to a replica set
> ---
>
> Key: OAK-3554
> URL: https://issues.apache.org/jira/browse/OAK-3554
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, mongomk
>Reporter: Chetan Mehrotra
>Assignee: Marcel Reutegger
>  Labels: resilience
> Fix For: 1.3.11
>
>
> Currently while connecting to Mongo MongoDocumentStore relies on default 
> write concern provided as part of mongouri. 
> Recently some issues were seen where Mongo based Oak was connecting to 3 
> member replica set and there were frequent replica state changes due to use 
> of VM for Mongo. This caused data loss and corruption of data in Oak.
> To avoid such situation Oak should default to write concern of majority by 
> default. If some write concern is specified as part of mongouri then that 
> should take precedence. This would allow system admin to take the call of 
> tweaking write concern if required and at same time allows Oak to use the 
> safe write concern.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-2556) do intermediate commit during async indexing

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-2556:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> do intermediate commit during async indexing
> 
>
> Key: OAK-2556
> URL: https://issues.apache.org/jira/browse/OAK-2556
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Affects Versions: 1.0.11
>Reporter: Stefan Egli
>  Labels: resilience
> Fix For: 1.3.11
>
>
> A recent issue found at a customer unveils a potential issue with the async 
> indexer. Reading the AsyncIndexUpdate.updateIndex it looks like it is doing 
> the entire update of the async indexer *in one go*, ie in one commit.
> When there is - for some reason - however, a huge diff that the async indexer 
> has to process, the 'one big commit' can become gigantic. There is no limit 
> to the size of the commit in fact.
> So the suggestion is to do intermediate commits while the async indexer is 
> going on. The reason this is acceptable is the fact that by doing async 
> indexing, that index is anyway not 100% up-to-date - so it would not make 
> much of a difference if it would commit after every 100 or 1000 changes 
> either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3236) integration test that simulates influence of clock drift

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3236:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> integration test that simulates influence of clock drift
> 
>
> Key: OAK-3236
> URL: https://issues.apache.org/jira/browse/OAK-3236
> Project: Jackrabbit Oak
>  Issue Type: Test
>  Components: core
>Affects Versions: 1.3.4
>Reporter: Stefan Egli
>Assignee: Stefan Egli
> Fix For: 1.3.11
>
>
> Spin-off of OAK-2739 [of this 
> comment|https://issues.apache.org/jira/browse/OAK-2739?focusedCommentId=14693398=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14693398]
>  - ie there should be an integration test that show cases the issues with 
> clock drift and why it is a good idea to have a lease-check (that refuses to 
> let the document store be used any further once the lease times out locally)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3303) FileStore flush thread can get stuck

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3303:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> FileStore flush thread can get stuck
> 
>
> Key: OAK-3303
> URL: https://issues.apache.org/jira/browse/OAK-3303
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segmentmk
>Reporter: Alex Parvulescu
>Assignee: Alex Parvulescu
> Fix For: 1.3.11
>
>
> In some very rare circumstances the flush thread was seen as possibly stuck 
> for a while following a restart of the system. This results in data loss on 
> restart (the system will roll back to the latest persisted revision on 
> restart), and worse off there's no way of extracting the latest head revision 
> using the tar files, so recovery is not (yet) possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-2910) oak-jcr bundle should be usable as a standalone bundle

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-2910:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> oak-jcr bundle should be usable as a standalone bundle
> --
>
> Key: OAK-2910
> URL: https://issues.apache.org/jira/browse/OAK-2910
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: jcr
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>  Labels: modularization, osgi, technical_debt
> Fix For: 1.3.11
>
>
> Currently oak-jcr bundle needs to be embedded within some other bundle if the 
> Oak needs to be properly configured in OSGi env. Need to revisit this aspect 
> and see what needs to be done to enable Oak to be properly configured without 
> requiring the oak-jcr bundle to be embedded in the repo



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-2478) Move spellcheck config to own configuration node

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-2478:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Move spellcheck config to own configuration node
> 
>
> Key: OAK-2478
> URL: https://issues.apache.org/jira/browse/OAK-2478
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Tommaso Teofili
>  Labels: technical_debt
> Fix For: 1.3.11
>
>
> Currently spellcheck configuration is controlled via properties defined on 
> main config / props node but it'd be good if we would have its own place to 
> configure the whole spellcheck feature to not mix up configuration of other 
> features / parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3406) Configuration to rank exact match suggestions over partial match suggestions

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3406:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Configuration to rank exact match suggestions over partial match suggestions
> 
>
> Key: OAK-3406
> URL: https://issues.apache.org/jira/browse/OAK-3406
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Vikas Saurabh
>Assignee: Vikas Saurabh
>Priority: Minor
> Fix For: 1.3.11
>
>
> Currently, a suggestion query ranks the results according to popularity. But, 
> at times, it's intended to have suggested phrases based on exact matches to 
> be ranked above a more popular suggestion based on a partial match. e.g. a 
> repository might have a 1000 docs with {{windows is a very popular OS}} and 
> say 4 with {{win over them}} - it's a useful case to configure suggestions 
> such that for a suggestions query for {{win}}, we'd get {{win over them}} as 
> a higher ranked suggestion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-2835) TARMK Cold Standby inefficient cleanup

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-2835:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> TARMK Cold Standby inefficient cleanup
> --
>
> Key: OAK-2835
> URL: https://issues.apache.org/jira/browse/OAK-2835
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: segmentmk, tarmk-standby
>Reporter: Alex Parvulescu
>Assignee: Alex Parvulescu
>Priority: Critical
>  Labels: compaction, gc, production, resilience
> Fix For: 1.3.11
>
> Attachments: OAK-2835.patch
>
>
> Following OAK-2817, it turns out that patching the data corruption issue 
> revealed an inefficiency of the cleanup method. similar to the online 
> compaction situation, the standby has issues clearing some of the in-memory 
> references to old revisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3159) Extend documentation for SegmentNodeStoreService in http://jackrabbit.apache.org/oak/docs/osgi_config.html#SegmentNodeStore

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3159:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Extend documentation for SegmentNodeStoreService in 
> http://jackrabbit.apache.org/oak/docs/osgi_config.html#SegmentNodeStore
> ---
>
> Key: OAK-3159
> URL: https://issues.apache.org/jira/browse/OAK-3159
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: doc
>Reporter: Konrad Windszus
> Fix For: 1.3.11
>
>
> Currently the documentation at 
> http://jackrabbit.apache.org/oak/docs/osgi_config.html#SegmentNodeStore only 
> documents the properties
> # repository.home and
> # tarmk.size
> All the other properties like customBlobStore, tarmk.mode,  are not 
> documented. Please extend that. Also it would be good, if the table could be 
> extended with what type is supported for the individual properties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-2675) Include change type information in perf logs for diff logic

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-2675:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Include change type information in perf logs for diff logic
> ---
>
> Key: OAK-2675
> URL: https://issues.apache.org/jira/browse/OAK-2675
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core
>Reporter: Chetan Mehrotra
>Priority: Minor
>  Labels: observation, performance, resilience, tooling
> Fix For: 1.3.11
>
>
> Currently the diff perf logs in {{NodeObserver}} does not indicate what type 
> of change was processed i.e. was the change an internal one or an external 
> one. 
> Having this information would allow us to determine how the cache is being 
> used. For e.g. if we see slower number even for local changes then that would 
> indicate that there is some issue with the diff cache and its not be being 
> utilized effectively 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3090) Caching BlobStore implementation

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3090:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Caching BlobStore implementation 
> -
>
> Key: OAK-3090
> URL: https://issues.apache.org/jira/browse/OAK-3090
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: blob
>Reporter: Chetan Mehrotra
>  Labels: performance, resilience
> Fix For: 1.3.11
>
>
> Storing binaries in Mongo puts lots of pressure on the MongoDB for reads. To 
> reduce the read load it would be useful to have a filesystem based cache of 
> frequently used binaries. 
> This would be similar to CachingFDS (OAK-3005) but would be implemented on 
> top of BlobStore API. 
> Requirements
> * Specify the max binary size which can be cached on file system
> * Limit the size of all binary content present in the cache



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-1695) Document Solr index

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-1695:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Document Solr index
> ---
>
> Key: OAK-1695
> URL: https://issues.apache.org/jira/browse/OAK-1695
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: doc, solr
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
>  Labels: documentation
> Fix For: 1.3.11
>
>
> Provide documentation about the Oak Solr index. That should contain 
> information about the design and how to configure it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3149) SuggestHelper should manage a suggestor per index definition

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3149:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> SuggestHelper should manage a suggestor per index definition
> 
>
> Key: OAK-3149
> URL: https://issues.apache.org/jira/browse/OAK-3149
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Tommaso Teofili
> Fix For: 1.3.11
>
>
> {{SuggestHelper}} currently keeps a static reference to suggestor and thus 
> have a singleton suggestor for whole repo. Instead it should be implemented 
> in such a way that a suggestor instance is associated with index definition. 
> Logically the suggestor instance should be part of IndexNode similar to how 
> {{IndexSearcher}} instances are managed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3185) Port and refactor jackrabbit-webapp module to Oak

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3185:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Port and refactor jackrabbit-webapp module to Oak 
> --
>
> Key: OAK-3185
> URL: https://issues.apache.org/jira/browse/OAK-3185
> Project: Jackrabbit Oak
>  Issue Type: New Feature
>  Components: webapp
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.3.11
>
>
> As mentioned at [1] we should port the jackrabbit-webapp [2] module to Oak 
> and refactor it to run complete Oak stack. Purpose of this module would be to 
> demonstrate
> # How to embed Oak in standalone web applications which are not based on OSGi
> # Configure various aspect of Oak via config
> h3. Proposed Appraoch
> # Copy jackrabbit-webapp to Oak repo under oak-webapp
> # Refactor the repository initialization logic to use Oak Pojosr to configure 
> Repository [3]
> # Bonus configure Felix WebConsole to enable users to see what all OSGi 
> services are exposed and what config options are supported
> This would also enable us to document what all thirdparty dependencies are 
> required for getting Oak to work in such environments
> [1] 
> http://mail-archives.apache.org/mod_mbox/jackrabbit-oak-dev/201508.mbox/%3CCAHCW-mkbpS6qSkgFe1h1anFcD-dYWFrcr9xBWx9dpKaxr91Q3Q%40mail.gmail.com%3E
> [2] 
> https://jackrabbit.apache.org/jcr/components/jackrabbit-web-application.html
> [3] https://github.com/apache/jackrabbit-oak/tree/trunk/oak-pojosr



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3450) Configuration to have case insensitive suggestions

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3450:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Configuration to have case insensitive suggestions
> --
>
> Key: OAK-3450
> URL: https://issues.apache.org/jira/browse/OAK-3450
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Vikas Saurabh
>Assignee: Vikas Saurabh
>Priority: Minor
> Fix For: 1.3.11
>
>
> Currently suggestions follow the same case as requested in query parameter. 
> It makes sense to allow for it to be case insensitive. e.g. Asking for 
> suggestions for {{cat}} should give {{Cat is an animal}} as well as 
> {{category needs to be assigned}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3219) Lucene IndexPlanner should also account for number of property constraints evaluated while giving cost estimation

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3219:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Lucene IndexPlanner should also account for number of property constraints 
> evaluated while giving cost estimation
> -
>
> Key: OAK-3219
> URL: https://issues.apache.org/jira/browse/OAK-3219
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
>  Labels: performance
> Fix For: 1.3.11
>
>
> Currently the cost returned by Lucene index is a function of number of 
> indexed documents present in the index. If the number of indexed entries are 
> high then it might reduce chances of this index getting selected if some 
> property index also support of the property constraint.
> {noformat}
> /jcr:root/content/freestyle-cms/customers//element(*, 
> cq:Page)[(jcr:content/@title = 'm' or jcr:like(jcr:content/@title, 'm%')) and 
> jcr:content/@sling:resourceType = '/components/page/customer’]
> {noformat}
> Consider above query with following index definition
> * A property index on resourceType
> * A Lucene index for cq:Page with properties {{jcr:content/title}}, 
> {{jcr:content/sling:resourceType}} indexed and also path restriction 
> evaluation enabled
> Now what the two indexes can help in
> # Property index
> ## Path restriction
> ## Property restriction on  {{sling:resourceType}}
> # Lucene index
> ## NodeType restriction
> ## Property restriction on  {{sling:resourceType}}
> ## Property restriction on  {{title}}
> ## Path restriction
> Now cost estimate currently works like this
> * Property index - {{f(indexedValueEstimate, estimateOfNodesUnderGivenPath)}}
> ** indexedValueEstimate - For 'sling:resourceType=foo' its the approximate 
> count for nodes having that as 'foo'
> ** estimateOfNodesUnderGivenPath - Its derived from an approximate estimation 
> of nodes present under given path
> * Lucene Index - {{f(totalIndexedEntries)}}
> As cost of Lucene is too simple it does not reflect the reality. Following 2 
> changes can be done to make it better
> * Given that Lucene index can handle multiple constraints compared (4) to 
> property index (2), the cost estimate returned by it should also reflect this 
> state. This can be done by setting costPerEntry to 1/(no of property 
> restriction evaluated)
> * Get the count for queried property value - This is similar to what 
> PropertyIndex does and assumes that Lucene can provide that information in 
> O(1) cost. In case of multiple supported property restriction this can be 
> minima of all



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3176) Provide an option to include a configured boost query while searching

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3176:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Provide an option to include a configured boost query while searching
> -
>
> Key: OAK-3176
> URL: https://issues.apache.org/jira/browse/OAK-3176
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: solr
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
> Fix For: 1.3.11, 1.2.8
>
>
> For tweaking relevancy it's sometimes useful to include a boost query that 
> gets applied at query time and modifies the ranking accordingly.
> This can be done also by setting it by hand as a default parameter to the 
> /select request handler, but for convenience it'd be good if the Solr 
> instance configuration files wouldn't be touched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3600) Cache property index definitions

2015-11-09 Thread Joel Richard (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996326#comment-14996326
 ] 

Joel Richard commented on OAK-3600:
---

[~tmueller], can you please share your thoughts about this?

> Cache property index definitions
> 
>
> Key: OAK-3600
> URL: https://issues.apache.org/jira/browse/OAK-3600
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, query
>Affects Versions: 1.3.9
>Reporter: Joel Richard
>  Labels: performance
>
> At the moment, most of the time of the execution plan calculation (often 70%) 
> is spent in PropertyIndex.getCost. Therefore, it would make sense to cache 
> the property index definitions in a map which avoids all 
> unnecessary traversal/repository operations.
> This cache could either be attached to the session or expire after a few 
> seconds because the problem is not that createPlan is slow itself, but is 
> just called too often.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3559) Bulk document updates

2015-11-09 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996200#comment-14996200
 ] 

Marcel Reutegger commented on OAK-3559:
---

bq. The original test didn't work on the delayed network, so I modified it to 
create 1000 nodes, rather than 1.

Can you please explain why it didn't work?

bq. It seems that the network latency is the deciding factor for the sequential 
approach

Hmm, that's strange. In my tests for OAK-3554 I was able to reproduce the 
calculated average journal flush wait time of 16ms with the default MongoDB 
journalCommitInterval. I would have expected to see these 16ms added to the 
20ms latency.

> Bulk document updates
> -
>
> Key: OAK-3559
> URL: https://issues.apache.org/jira/browse/OAK-3559
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: core, documentmk, mongomk
>Reporter: Tomek Rękawek
> Fix For: 1.4
>
> Attachments: OAK-3559.patch
>
>
> The {{DocumentStore#createOrUpdate(Collection, UpdateOp)}} method is invoked 
> in a loop in the {{Commit#applyToDocumentStore()}}, once for each changed 
> node. Investigate if it's possible to implement a batch version of the 
> createOrUpdate method, using the MongoDB [Bulk 
> API|https://docs.mongodb.org/manual/reference/method/Bulk/#Bulk]. It should 
> return all documents before they are modified, so the Commit class can 
> discover conflicts (if they are any).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3559) Bulk document updates

2015-11-09 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996258#comment-14996258
 ] 

Tomek Rękawek commented on OAK-3559:


{quote}
>The original test didn't work on the delayed network, so I modified it to 
>create 1000 nodes, rather than 1.

Can you please explain why it didn't work?{quote}

Well, it'd work, but also it'd be long. It takes about 45 seconds for the 
sequential code to create 1000 nodes on the delayed network, so it'd be about 8 
minutes to do a single iteration with 10 000 nodes. I wanted to have at least a 
few iterations during the 5-minutes test.

{quote}
>It seems that the network latency is the deciding factor for the sequential 
>approach

Hmm, that's strange. In my tests for OAK-3554 I was able to reproduce the 
calculated average journal flush wait time of 16ms with the default MongoDB 
journalCommitInterval. I would have expected to see these 16ms added to the 
20ms latency.{quote}
That's indeed strange. I compared the sequential CreateManyChildNodesTest on 
non-journaled and journaled mongo (without latency in both cases):
{noformat}
 ### latency: 0ms, sequential (SNAPSHOT) ###
 C min 10% 50% 90% max   N
no journal   1 395 406 450 5301130 299
journal  1 813 813 87610461046   4
{noformat}
So, according to time results the journaled version is only 2x longer, but on 
the other hand it was able to do just 4 iterations (rather than 299). I'll look 
into the benchmark code. It isn't related to the bulk update, though.

> Bulk document updates
> -
>
> Key: OAK-3559
> URL: https://issues.apache.org/jira/browse/OAK-3559
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: core, documentmk, mongomk
>Reporter: Tomek Rękawek
> Fix For: 1.4
>
> Attachments: OAK-3559.patch
>
>
> The {{DocumentStore#createOrUpdate(Collection, UpdateOp)}} method is invoked 
> in a loop in the {{Commit#applyToDocumentStore()}}, once for each changed 
> node. Investigate if it's possible to implement a batch version of the 
> createOrUpdate method, using the MongoDB [Bulk 
> API|https://docs.mongodb.org/manual/reference/method/Bulk/#Bulk]. It should 
> return all documents before they are modified, so the Commit class can 
> discover conflicts (if they are any).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3586) ConflictException and CommitQueue should support a list of revisions

2015-11-09 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996259#comment-14996259
 ] 

Tomek Rękawek commented on OAK-3586:


[~mreutegg], is there something more to improve in this patch?

> ConflictException and CommitQueue should support a list of revisions
> 
>
> Key: OAK-3586
> URL: https://issues.apache.org/jira/browse/OAK-3586
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: core, documentmk
>Reporter: Tomek Rękawek
> Fix For: 1.4
>
>
> The OAK-3559 aims at providing bulk version of the 
> {{DocumentStore#createOrUpdate()}}, so the {{Commit}} class can apply many 
> changes at the same time. If there's a conflict detected afterwards, it may 
> involve many documents and revisions. That's why the {{ConflictException}} 
> needs to be extended, so it can contain a revision list rather than a single 
> revision.
> Once the {{ConflictException}} contains revision list, the 
> {{CommitQueue#suspendUntil()}} method should be updated as well, to suspend 
> thread until all revisions from the passed collection are visible and all 
> conflicts are resolved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3598) Export org.apache.jackrabbit.oak.cache package from oak-core

2015-11-09 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996290#comment-14996290
 ] 

Chetan Mehrotra commented on OAK-3598:
--

Hit a problem here - CacheStats depend on 
{{org.apache.jackrabbit.oak.api.jmx.CacheStatsMBean}} so it would not be 
possible to just move it to oak-common. So need to think of some other way

> Export org.apache.jackrabbit.oak.cache package from oak-core
> 
>
> Key: OAK-3598
> URL: https://issues.apache.org/jira/browse/OAK-3598
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: cache
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.3.10
>
>
> For OAK-3092 oak-lucene would need to access classes from 
> {{org.apache.jackrabbit.oak.cache}} package. For now its limited to 
> {{CacheStats}} to expose the cache related statistics.
> This task is meant to determine steps needed to export the package 
> * Update the pom.xml to export the package
> * Review current set of classes to see if they need to be reviewed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-2629) Cleanup Oak Travis jobs

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-2629:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Cleanup Oak Travis jobs
> ---
>
> Key: OAK-2629
> URL: https://issues.apache.org/jira/browse/OAK-2629
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: it
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
>  Labels: CI
> Fix For: 1.3.11
>
>
> Since we're moving toward Jenkins, let's remove the Travis jobs for Oak. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-2592) Commit does not ensure w:majority

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-2592:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Commit does not ensure w:majority
> -
>
> Key: OAK-2592
> URL: https://issues.apache.org/jira/browse/OAK-2592
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core, mongomk
>Reporter: Marcel Reutegger
>Assignee: Marcel Reutegger
>  Labels: resilience
> Fix For: 1.3.11
>
>
> The MongoDocumentStore uses {{findAndModify()}} to commit a transaction. This 
> operation does not allow an application specified write concern and always 
> uses the MongoDB default write concern {{Acknowledged}}. This means a commit 
> may not make it to a majority of a replica set when the primary fails. From a 
> MongoDocumentStore perspective it may appear as if a write was successful and 
> later reverted. See also the test in OAK-1641.
> To fix this, we'd probably have to change the MongoDocumentStore to avoid 
> {{findAndModify()}} and use {{update()}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-2618) Improve performance of queries with ORDER BY and multiple OR filters

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-2618:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Improve performance of queries with ORDER BY and multiple OR filters
> 
>
> Key: OAK-2618
> URL: https://issues.apache.org/jira/browse/OAK-2618
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Reporter: Amit Jain
>Assignee: Amit Jain
>  Labels: performance
> Fix For: 1.3.11
>
>
> When multiple OR constraints are specified in the XPATH query, itis broken up 
> into union of multiple clauses. If query includes an order by clause, the 
> sorting in this case is done by traversing the result set in memory leading 
> to slow query performance.
> Possible improvements could include:
> * For indexes which can support multiple filters (like lucene, solr) such 
> queries should be efficient and the query engine can pass-thru the query as 
> is.
> ** Possibly also needed for other cases also. So, we can have some sort of 
> capability advertiser for indexes which can hint the query engine 
> and/or
> * Batched merging of the sorted iterators returned for the multiple union 
> queries (possible externally).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-2891) Use more efficient approach to manage in memory map in LengthCachingDataStore

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-2891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-2891:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Use more efficient approach to manage in memory map in LengthCachingDataStore
> -
>
> Key: OAK-2891
> URL: https://issues.apache.org/jira/browse/OAK-2891
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: upgrade
>Reporter: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.3.11
>
>
> LengthCachingDataStore introduced in OAK-2882 has an in memory map for 
> keeping the mapping between blobId and length. This would pose issue when 
> number of binaries are very large.
> Instead of in memory map we should use some off heap store like MVStore of 
> MapDB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3253) Support caching in FileDataStoreService

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3253:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Support caching in FileDataStoreService
> ---
>
> Key: OAK-3253
> URL: https://issues.apache.org/jira/browse/OAK-3253
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: blob
>Affects Versions: 1.3.3
>Reporter: Shashank Gupta
>Assignee: Shashank Gupta
>  Labels: candidate_oak_1_0, candidate_oak_1_2, docs-impacting, 
> features, performance
> Fix For: 1.3.11
>
>
> FDS on SAN/NAS storage is not efficient as it involves network call. In OAK. 
> indexes are stored SAN/NAS and even idle system does lot of read system 
> generated data. 
> Enable caching in FDS so the reads are done locally and async upload to 
> SAN/NAS
> See [previous 
> discussions|https://issues.apache.org/jira/browse/OAK-3005?focusedCommentId=14700801=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14700801]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3215) Solr test often fail with No such core: oak

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3215:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Solr test often fail with  No such core: oak
> 
>
> Key: OAK-3215
> URL: https://issues.apache.org/jira/browse/OAK-3215
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: solr
>Reporter: Chetan Mehrotra
>Assignee: Tommaso Teofili
>Priority: Minor
>  Labels: CI
> Fix For: 1.3.11
>
>
> Often it can be seen that all test from oak-solr module fail. And in all such 
> failure following error is reported 
> {noformat}
> org.apache.solr.common.SolrException: No such core: oak
>   at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:112)
>   at 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:118)
>   at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
>   at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
>   at 
> org.apache.jackrabbit.oak.plugins.index.solr.query.SolrQueryIndexTest.testQueryOnIgnoredExistingProperty(SolrQueryIndexTest.java:330)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> {noformat}
> Most recent failure in 
> https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/325/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-2797) Closeable aspect of Analyzer should be accounted for

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-2797:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Closeable aspect of Analyzer should be accounted for
> 
>
> Key: OAK-2797
> URL: https://issues.apache.org/jira/browse/OAK-2797
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Reporter: Chetan Mehrotra
>  Labels: technical_debt
> Fix For: 1.3.11
>
>
> Lucene {{Analyzer}} implements {{Closeable}} [1] interface and internally it 
> has a ThreadLocal storage of some persistent resource
> So far in oak-lucene we do not take care of closing any analyzer. In fact we 
> use a singleton Analyzer in all cases. Opening this bug to think about this 
> aspect and see if our usage of Analyzer follows the best practices
> [1] 
> http://lucene.apache.org/core/4_7_0/core/org/apache/lucene/analysis/Analyzer.html#close%28%29
> /cc [~teofili] [~alex.parvulescu]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-1828) Improved SegmentWriter

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-1828:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Improved SegmentWriter
> --
>
> Key: OAK-1828
> URL: https://issues.apache.org/jira/browse/OAK-1828
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: segmentmk
>Reporter: Jukka Zitting
>Assignee: Alex Parvulescu
>Priority: Minor
>  Labels: technical_debt
> Fix For: 1.3.11
>
>
> At about 1kLOC and dozens of methods, the SegmentWriter class currently a bit 
> too complex for one of the key components of the TarMK. It also uses a 
> somewhat non-obvious mix of synchronized and unsynchronized code to 
> coordinate multiple concurrent threads that may be writing content at the 
> same time. The synchronization blocks are also broader than what really would 
> be needed, which in some cases causes unnecessary lock contention in 
> concurrent write loads.
> To improve the readability and maintainability of the code, and to increase 
> performance of concurrent writes, it would be useful to split part of the 
> SegmentWriter functionality to a separate RecordWriter class that would be 
> responsible for writing individual records into a segment. The 
> SegmentWriter.prepare() method would return a new RecordWriter instance, and 
> the higher-level SegmentWriter methods would use the returned instance for 
> all the work that's currently guarded in synchronization blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-2722) IndexCopier fails to delete older index directory upon reindex

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-2722:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> IndexCopier fails to delete older index directory upon reindex
> --
>
> Key: OAK-2722
> URL: https://issues.apache.org/jira/browse/OAK-2722
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
>  Labels: resilience
> Fix For: 1.3.11
>
>
> {{IndexCopier}} tries to remove the older index directory incase of reindex. 
> This might fails on platform like Windows if the files are still memory 
> mapped or are locked.
> For deleting directories we would need to take similar approach like being 
> done with deleting old index files i.e. do retries later.
> Due to this following test fails on Windows (Per [~julian.resc...@gmx.de] )
> {noformat}
> Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.07 sec <<< 
> FAILURE!
> deleteOldPostReindex(org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopierTest)
>   Time elapsed: 0.02 sec  <<< FAILURE!
> java.lang.AssertionError: Old index directory should have been removed
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.assertTrue(Assert.java:43)
> at org.junit.Assert.assertFalse(Assert.java:68)
> at 
> org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopierTest.deleteOldPostReindex(IndexCopierTest.java:160)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-2719) Warn about local copy size different than remote copy in oak-lucene with copyOnRead enabled

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-2719:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Warn about local copy size different than remote copy in oak-lucene with 
> copyOnRead enabled
> ---
>
> Key: OAK-2719
> URL: https://issues.apache.org/jira/browse/OAK-2719
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>Priority: Minor
>  Labels: resilience
> Fix For: 1.3.11
>
>
> At times following warning is seen in logs
> {noformat}
> 31.03.2015 14:04:57.610 *WARN* [pool-6-thread-7] 
> org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopier Found local copy 
> for _0.cfs in 
> NIOFSDirectory@/path/to/index/e5a943cdec3000bd8ce54924fd2070ab5d1d35b9ecf530963a3583d43bf28293/1
>  
> lockFactory=NativeFSLockFactory@/path/to/index/e5a943cdec3000bd8ce54924fd2070ab5d1d35b9ecf530963a3583d43bf28293/1
>  but size of local 1040384 differs from remote 1958385. Content would be read 
> from remote file only
> {noformat}
> The file length check provides a weak check around index file consistency. In 
> some cases this warning is misleading. For e.g. 
> # Index version Rev1 - Task submitted to copy index file F1 
> # Index updated to Rev2 - Directory bound to Rev1 is closed
> # Read is performed with Rev2 for F1 - Here as the file would be locally 
> created the size would be different as the copying is in progress
> In such a case the logic should ensure that once copy is done the local file 
> gets used



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3580) Make it possible to use indexes for providing excerpts

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3580:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Make it possible to use indexes for providing excerpts
> --
>
> Key: OAK-3580
> URL: https://issues.apache.org/jira/browse/OAK-3580
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query, solr
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
> Fix For: 1.3.11
>
> Attachments: OAK-3580.1.patch
>
>
> Currently {{SimpleExcerptProvider}} always provides excerpt, regardless of 
> the underlying index used for the query, this having the limitation of not 
> working with binaries.
> Because of that it'd be good to leverage existing indexes capabilities to use 
> their highlighter implementations to provide excerpt support, also because 
> Lucene and Solr Oak indexes already perform full text extraction from 
> binaries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-2847) Dependency cleanup

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-2847:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Dependency cleanup 
> ---
>
> Key: OAK-2847
> URL: https://issues.apache.org/jira/browse/OAK-2847
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>Reporter: Michael Dürig
>Assignee: Vikas Saurabh
>  Labels: technical_debt
> Fix For: 1.3.11
>
>
> Early in the next release cycle we should go through the list of Oak's 
> dependencies and decide whether we have candidates we want to upgrade and 
> remove orphaned dependencies. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-937) Query engine index selection tweaks: shortcut and hint

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-937:
-
Fix Version/s: (was: 1.3.10)
   1.3.11

> Query engine index selection tweaks: shortcut and hint
> --
>
> Key: OAK-937
> URL: https://issues.apache.org/jira/browse/OAK-937
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core, query
>Reporter: Alex Parvulescu
>Priority: Minor
>  Labels: performance
> Fix For: 1.3.11
>
>
> This issue covers 2 different changes related to the way the QueryEngine 
> selects a query index:
>  Firstly there could be a way to end the index selection process early via a 
> known constant value: if an index returns a known value token (like -1000) 
> then the query engine would effectively stop iterating through the existing 
> index impls and use that index directly.
>  Secondly it would be nice to be able to specify a desired index (if one is 
> known to perform better) thus skipping the existing selection mechanism (cost 
> calculation and comparison). This could be done via certain query hints [0].
> [0] http://en.wikipedia.org/wiki/Hint_(SQL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-1610) Improved default indexing by JCR type in SolrIndexEditor

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-1610:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Improved default indexing by JCR type in SolrIndexEditor
> 
>
> Key: OAK-1610
> URL: https://issues.apache.org/jira/browse/OAK-1610
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: solr
>Reporter: Tommaso Teofili
> Fix For: 1.3.11
>
>
> It'd be good to provide a typed indexing default so that properties of a 
> certain type can be mapped to certain Solr dynamic fields with dedicated 
> types. The infrastructure for doing that is already in place as per 
> OakSolrConfiguration#getFieldNameFor(Type) but the default configuration is 
> not properly set with a good mapping.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-3286) Persistent Cache improvements

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-3286:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Persistent Cache improvements
> -
>
> Key: OAK-3286
> URL: https://issues.apache.org/jira/browse/OAK-3286
> Project: Jackrabbit Oak
>  Issue Type: Epic
>  Components: cache
>Reporter: Michael Marth
>Priority: Minor
> Fix For: 1.3.11
>
>
> Issue for collecting various improvements to the persistent cache



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OAK-2477) Move suggester specific config to own configuration node

2015-11-09 Thread Davide Giannella (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Giannella updated OAK-2477:
--
Fix Version/s: (was: 1.3.10)
   1.3.11

> Move suggester specific config to own configuration node
> 
>
> Key: OAK-2477
> URL: https://issues.apache.org/jira/browse/OAK-2477
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
>  Labels: technical_debt
> Fix For: 1.3.11
>
>
> Currently suggester configuration is controlled via properties defined on 
> main config / props node but it'd be good if we would have its own place to 
> configure the whole suggest feature to not mix up configuration of other 
> features / parameters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (OAK-3600) Cache property index definitions

2015-11-09 Thread Joel Richard (JIRA)
Joel Richard created OAK-3600:
-

 Summary: Cache property index definitions
 Key: OAK-3600
 URL: https://issues.apache.org/jira/browse/OAK-3600
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: core, query
Affects Versions: 1.3.9
Reporter: Joel Richard


At the moment, most of the time of the execution plan calculation (often 70%) 
is spent in PropertyIndex.getCost. Therefore, it would make sense to cache the 
property index definitions in a map which avoids all 
unnecessary traversal/repository operations.

This cache could either be attached to the session or expire after a few 
seconds because the problem is not that createPlan is slow itself, but is just 
called too often.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3598) Export org.apache.jackrabbit.oak.cache package from oak-core

2015-11-09 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996240#comment-14996240
 ] 

Chetan Mehrotra commented on OAK-3598:
--

bq. This package sound rather like general purpose functionality. Wouldn't it 
be better to move it to oak-commons and then export it from there?

Makes sense. oak-commons already has a package 
{{org.apache.jackrabbit.oak.commons.cache}}. For current requirement I would 
just move the {{CacheStats}} class.

For other would need inputs from [~tmueller]

> Export org.apache.jackrabbit.oak.cache package from oak-core
> 
>
> Key: OAK-3598
> URL: https://issues.apache.org/jira/browse/OAK-3598
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: cache
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.3.10
>
>
> For OAK-3092 oak-lucene would need to access classes from 
> {{org.apache.jackrabbit.oak.cache}} package. For now its limited to 
> {{CacheStats}} to expose the cache related statistics.
> This task is meant to determine steps needed to export the package 
> * Update the pom.xml to export the package
> * Review current set of classes to see if they need to be reviewed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3598) Export org.apache.jackrabbit.oak.cache package from oak-core

2015-11-09 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996183#comment-14996183
 ] 

Marcel Reutegger commented on OAK-3598:
---

This package sound rather like general purpose functionality. Wouldn't it be 
better to move it to oak-commons and then export it from there?

> Export org.apache.jackrabbit.oak.cache package from oak-core
> 
>
> Key: OAK-3598
> URL: https://issues.apache.org/jira/browse/OAK-3598
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: cache
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
> Fix For: 1.3.10
>
>
> For OAK-3092 oak-lucene would need to access classes from 
> {{org.apache.jackrabbit.oak.cache}} package. For now its limited to 
> {{CacheStats}} to expose the cache related statistics.
> This task is meant to determine steps needed to export the package 
> * Update the pom.xml to export the package
> * Review current set of classes to see if they need to be reviewed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (OAK-3599) Release Oak 1.3.10

2015-11-09 Thread Davide Giannella (JIRA)
Davide Giannella created OAK-3599:
-

 Summary: Release Oak 1.3.10
 Key: OAK-3599
 URL: https://issues.apache.org/jira/browse/OAK-3599
 Project: Jackrabbit Oak
  Issue Type: Task
Reporter: Davide Giannella
Assignee: Davide Giannella






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3092) Cache recently extracted text to avoid duplicate extraction

2015-11-09 Thread Davide Giannella (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996399#comment-14996399
 ] 

Davide Giannella commented on OAK-3092:
---

+1

> Cache recently extracted text to avoid duplicate extraction
> ---
>
> Key: OAK-3092
> URL: https://issues.apache.org/jira/browse/OAK-3092
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Chetan Mehrotra
>  Labels: performance
> Fix For: 1.3.11
>
> Attachments: OAK-3092-v1.patch, OAK-3092-v2.patch
>
>
> It can happen that text can be extracted from same binary multiple times in a 
> given indexing cycle. This can happen due to 2 reasons
> # Multiple Lucene indexes indexing same node - A system might have multiple 
> Lucene indexes e.g. a global Lucene index and an index for specific nodeType. 
> In a given indexing cycle same file would be picked up by both index 
> definition and both would extract same text
> # Aggregation - With Index time aggregation same file get picked up multiple 
> times due to aggregation rules
> To avoid the wasted effort for duplicate text extraction from same file in a 
> given indexing cycle it would be better to have an expiring cache which can 
> hold on to extracted text content for some time. The cache should have 
> following features
> # Limit on total size
> # Way to expire the content using [Timed 
> Evicition|https://code.google.com/p/guava-libraries/wiki/CachesExplained#Timed_Eviction]
>  - As chances of same file getting picked up are high only for a given 
> indexing cycle it would be better to expire the cache entries after some time 
> to avoid hogging memory unnecessarily 
> Such a cache would provide following benefit
> # Avoid duplicate text extraction - Text extraction is costly and has to be 
> minimized on critical path of {{indexEditor}}
> # Avoid expensive IO specially if binary content are to be fetched from a 
> remote {{BlobStore}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)