[jira] [Resolved] (OAK-8950) DataStore: FileCache should use one cache segment

2020-03-13 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller resolved OAK-8950.
-
Resolution: Fixed

> DataStore: FileCache should use one cache segment
> -
>
> Key: OAK-8950
> URL: https://issues.apache.org/jira/browse/OAK-8950
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: blob
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
>
> The FileCache in the caching data store (Azure, S3) uses the default segment 
> count of 16. The effect of that is:
>  * if the maximum cache size is e.g. 16 GB
>  * and there are e.g. 15 files 1 GB each (total 15 GB),
>  * it can happen that some files are evicted, 
>  * because internally the cache is using 16 segments of 1 GB each,
>  * and by chance 2 files could be in the same segment,
>  * so that one of those files is evicted
> The workaround is to use a really large cache size (e.g. 100 GB if you only 
> want 15 GB of cache size), but the drawback is that, if most files are very 
> small, that the cache size could become actually 100 GB.
> The best solution is probably to use only 1 segment. There is tiny a 
> concurrency issue: right now, deleting files is synchronized on the segment. 
> But I think that's not a big problem (to be tested).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8950) DataStore: FileCache should use one cache segment

2020-03-13 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058695#comment-17058695
 ] 

Thomas Mueller commented on OAK-8950:
-

http://svn.apache.org/r1875151 (trunk)

> DataStore: FileCache should use one cache segment
> -
>
> Key: OAK-8950
> URL: https://issues.apache.org/jira/browse/OAK-8950
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: blob
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
>
> The FileCache in the caching data store (Azure, S3) uses the default segment 
> count of 16. The effect of that is:
>  * if the maximum cache size is e.g. 16 GB
>  * and there are e.g. 15 files 1 GB each (total 15 GB),
>  * it can happen that some files are evicted, 
>  * because internally the cache is using 16 segments of 1 GB each,
>  * and by chance 2 files could be in the same segment,
>  * so that one of those files is evicted
> The workaround is to use a really large cache size (e.g. 100 GB if you only 
> want 15 GB of cache size), but the drawback is that, if most files are very 
> small, that the cache size could become actually 100 GB.
> The best solution is probably to use only 1 segment. There is tiny a 
> concurrency issue: right now, deleting files is synchronized on the segment. 
> But I think that's not a big problem (to be tested).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-8950) DataStore: FileCache should use one cache segment

2020-03-13 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-8950:

Fix Version/s: 1.26.0

> DataStore: FileCache should use one cache segment
> -
>
> Key: OAK-8950
> URL: https://issues.apache.org/jira/browse/OAK-8950
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: blob
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.26.0
>
>
> The FileCache in the caching data store (Azure, S3) uses the default segment 
> count of 16. The effect of that is:
>  * if the maximum cache size is e.g. 16 GB
>  * and there are e.g. 15 files 1 GB each (total 15 GB),
>  * it can happen that some files are evicted, 
>  * because internally the cache is using 16 segments of 1 GB each,
>  * and by chance 2 files could be in the same segment,
>  * so that one of those files is evicted
> The workaround is to use a really large cache size (e.g. 100 GB if you only 
> want 15 GB of cache size), but the drawback is that, if most files are very 
> small, that the cache size could become actually 100 GB.
> The best solution is probably to use only 1 segment. There is tiny a 
> concurrency issue: right now, deleting files is synchronized on the segment. 
> But I think that's not a big problem (to be tested).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8950) DataStore: FileCache should use one cache segment

2020-03-12 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057929#comment-17057929
 ] 

Thomas Mueller commented on OAK-8950:
-

Patch for review: [https://github.com/oak-indexing/jackrabbit-oak/pull/63]

> DataStore: FileCache should use one cache segment
> -
>
> Key: OAK-8950
> URL: https://issues.apache.org/jira/browse/OAK-8950
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: blob
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
>
> The FileCache in the caching data store (Azure, S3) uses the default segment 
> count of 16. The effect of that is:
>  * if the maximum cache size is e.g. 16 GB
>  * and there are e.g. 15 files 1 GB each (total 15 GB),
>  * it can happen that some files are evicted, 
>  * because internally the cache is using 16 segments of 1 GB each,
>  * and by chance 2 files could be in the same segment,
>  * so that one of those files is evicted
> The workaround is to use a really large cache size (e.g. 100 GB if you only 
> want 15 GB of cache size), but the drawback is that, if most files are very 
> small, that the cache size could become actually 100 GB.
> The best solution is probably to use only 1 segment. There is tiny a 
> concurrency issue: right now, deleting files is synchronized on the segment. 
> But I think that's not a big problem (to be tested).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (OAK-8950) DataStore: FileCache should use one cache segment

2020-03-12 Thread Thomas Mueller (Jira)
Thomas Mueller created OAK-8950:
---

 Summary: DataStore: FileCache should use one cache segment
 Key: OAK-8950
 URL: https://issues.apache.org/jira/browse/OAK-8950
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: blob
Reporter: Thomas Mueller
Assignee: Thomas Mueller


The FileCache in the caching data store (Azure, S3) uses the default segment 
count of 16. The effect of that is:
 * if the maximum cache size is e.g. 16 GB
 * and there are e.g. 15 files 1 GB each (total 15 GB),
 * it can happen that some files are evicted, 
 * because internally the cache is using 16 segments of 1 GB each,
 * and by chance 2 files could be in the same segment,
 * so that one of those files is evicted

The workaround is to use a really large cache size (e.g. 100 GB if you only 
want 15 GB of cache size), but the drawback is that, if most files are very 
small, that the cache size could become actually 100 GB.

The best solution is probably to use only 1 segment. There is tiny a 
concurrency issue: right now, deleting files is synchronized on the segment. 
But I think that's not a big problem (to be tested).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8898) On querying, IndexReader failed with AlreadyClosedException

2020-03-10 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056052#comment-17056052
 ] 

Thomas Mueller commented on OAK-8898:
-

[~mkataria] I created a branch here: 
[https://github.com/oak-indexing/jackrabbit-oak/tree/OAK-8898]

This allows to reproduce the issue (it is based on your test case).

I also found the root cause, and a possible solution (see 
LucenePropertyIndex.OLD_FACET_PROVIDER). The problem seems to be that the 
reader is used after it is closed, by leaking the reference to the searcher to 
the LuceneFacetProvider in loadDocs(). I created a DelayedLuceneFacetProvider 
that opens acquires and releases the searcher when needed (acquireIndexNode, 
release in finally).

It would be good if the test can reproduce the issue even without the delays; 
we can discuss this. 

> On querying, IndexReader failed with AlreadyClosedException
> ---
>
> Key: OAK-8898
> URL: https://issues.apache.org/jira/browse/OAK-8898
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Reporter: Mohit Kataria
>Priority: Major
>
>  This is an intermittent issue, where on querying the code throws 
> AlreadyClosedException.
>  
> {code:java}
> Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexReader 
> is closed
>   at org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:262) 
> [org.apache.jackrabbit.oak-lucene:1.10.2]
>   at 
> org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:108)
>  [org.apache.jackrabbit.oak-lucene:1.10.2]
>   at org.apache.lucene.index.IndexReader.document(IndexReader.java:446) 
> [org.apache.jackrabbit.oak-lucene:1.10.2]
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.util.StatisticalSortedSetDocValuesFacetCounts.getAccessibleSampleCount(StatisticalSortedSetDocValuesFacetCounts.java:169)
>  [org.apache.jackrabbit.oak-lucene:1.10.2]
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.util.StatisticalSortedSetDocValuesFacetCounts.getTopChildren0(StatisticalSortedSetDocValuesFacetCounts.java:104)
>  [org.apache.jackrabbit.oak-lucene:1.10.2]
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.util.StatisticalSortedSetDocValuesFacetCounts.getTopChildren(StatisticalSortedSetDocValuesFacetCounts.java:70)
>  [org.apache.jackrabbit.oak-lucene:1.10.2]
>   at 
> org.apache.lucene.facet.MultiFacets.getTopChildren(MultiFacets.java:52) 
> [org.apache.jackrabbit.oak-lucene:1.10.2]
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LuceneFacetProvider.getFacets(LucenePropertyIndex.java:1547)
>  [org.apache.jackrabbit.oak-lucene:1.10.2]
>   at 
> org.apache.jackrabbit.oak.plugins.index.search.spi.query.FulltextIndex$FulltextResultRow.getFacets(FulltextIndex.java:353)
>  [org.apache.jackrabbit.oak-lucene:1.10.2]
>   at 
> org.apache.jackrabbit.oak.plugins.index.search.spi.query.FulltextIndex$FulltextPathCursor$2.getValue(FulltextIndex.java:472)
>  [org.apache.jackrabbit.oak-lucene:1.10.2]
>   ... 237 common frames omitted
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (OAK-8934) Indexing: filter entries with a regular expression

2020-03-04 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051214#comment-17051214
 ] 

Thomas Mueller edited comment on OAK-8934 at 3/4/20, 1:11 PM:
--

[http://svn.apache.org/r1874786|http://svn.apache.org/r1874786]


was (Author: tmueller):
[http://svn.apache.org/r1874786|http://svn/]

> Indexing: filter entries with a regular expression
> --
>
> Key: OAK-8934
> URL: https://issues.apache.org/jira/browse/OAK-8934
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
>  Labels: amrit
>
> We should provide a way to filter the index using a regular expression. For 
> example, only index nodes that contain a reference to another node. (Not a 
> JCR reference, but a reference within the value itself). For example, index a 
> node if one of the properties contains:
> * /content/abc
> *  
> * and so on
> This will allow to run a query to find if /content/abc is referenced. The 
> index and the query will probably need to use a tag, and the cost of the 
> index needs to be high. Otherwise the query engine can't know when this index 
> should be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (OAK-8934) Indexing: filter entries with a regular expression

2020-03-04 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller resolved OAK-8934.
-
Resolution: Fixed

> Indexing: filter entries with a regular expression
> --
>
> Key: OAK-8934
> URL: https://issues.apache.org/jira/browse/OAK-8934
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
>  Labels: amrit
> Fix For: 1.26.0
>
>
> We should provide a way to filter the index using a regular expression. For 
> example, only index nodes that contain a reference to another node. (Not a 
> JCR reference, but a reference within the value itself). For example, index a 
> node if one of the properties contains:
> * /content/abc
> *  
> * and so on
> This will allow to run a query to find if /content/abc is referenced. The 
> index and the query will probably need to use a tag, and the cost of the 
> index needs to be high. Otherwise the query engine can't know when this index 
> should be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-8934) Indexing: filter entries with a regular expression

2020-03-04 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-8934:

Fix Version/s: 1.26.0

> Indexing: filter entries with a regular expression
> --
>
> Key: OAK-8934
> URL: https://issues.apache.org/jira/browse/OAK-8934
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
>  Labels: amrit
> Fix For: 1.26.0
>
>
> We should provide a way to filter the index using a regular expression. For 
> example, only index nodes that contain a reference to another node. (Not a 
> JCR reference, but a reference within the value itself). For example, index a 
> node if one of the properties contains:
> * /content/abc
> *  
> * and so on
> This will allow to run a query to find if /content/abc is referenced. The 
> index and the query will probably need to use a tag, and the cost of the 
> index needs to be high. Otherwise the query engine can't know when this index 
> should be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8934) Indexing: filter entries with a regular expression

2020-03-04 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051214#comment-17051214
 ] 

Thomas Mueller commented on OAK-8934:
-

[http://svn.apache.org/r1874786|http://svn/]

> Indexing: filter entries with a regular expression
> --
>
> Key: OAK-8934
> URL: https://issues.apache.org/jira/browse/OAK-8934
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
>  Labels: amrit
>
> We should provide a way to filter the index using a regular expression. For 
> example, only index nodes that contain a reference to another node. (Not a 
> JCR reference, but a reference within the value itself). For example, index a 
> node if one of the properties contains:
> * /content/abc
> *  
> * and so on
> This will allow to run a query to find if /content/abc is referenced. The 
> index and the query will probably need to use a tag, and the cost of the 
> index needs to be high. Otherwise the query engine can't know when this index 
> should be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (OAK-8934) Indexing: filter entries with a regular expression

2020-03-03 Thread Thomas Mueller (Jira)
Thomas Mueller created OAK-8934:
---

 Summary: Indexing: filter entries with a regular expression
 Key: OAK-8934
 URL: https://issues.apache.org/jira/browse/OAK-8934
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Thomas Mueller
Assignee: Thomas Mueller


We should provide a way to filter the index using a regular expression. For 
example, only index nodes that contain a reference to another node. (Not a JCR 
reference, but a reference within the value itself). For example, index a node 
if one of the properties contains:

* /content/abc
*  
* and so on

This will allow to run a query to find if /content/abc is referenced. The index 
and the query will probably need to use a tag, and the cost of the index needs 
to be high. Otherwise the query engine can't know when this index should be 
used.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-8934) Indexing: filter entries with a regular expression

2020-03-03 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-8934:

Labels: amrit  (was: )

> Indexing: filter entries with a regular expression
> --
>
> Key: OAK-8934
> URL: https://issues.apache.org/jira/browse/OAK-8934
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
>  Labels: amrit
>
> We should provide a way to filter the index using a regular expression. For 
> example, only index nodes that contain a reference to another node. (Not a 
> JCR reference, but a reference within the value itself). For example, index a 
> node if one of the properties contains:
> * /content/abc
> *  
> * and so on
> This will allow to run a query to find if /content/abc is referenced. The 
> index and the query will probably need to use a tag, and the cost of the 
> index needs to be high. Otherwise the query engine can't know when this index 
> should be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (OAK-8910) Improve OAK Lucene Index Documentation

2020-02-27 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller resolved OAK-8910.
-
Fix Version/s: 1.26.0
   Resolution: Fixed

> Improve OAK Lucene Index Documentation
> --
>
> Key: OAK-8910
> URL: https://issues.apache.org/jira/browse/OAK-8910
> Project: Jackrabbit Oak
>  Issue Type: Task
>Reporter: Amrit Verma
>Assignee: Thomas Mueller
>Priority: Minor
>  Labels: amrit
> Fix For: 1.26.0
>
> Attachments: OAK-8910.patch
>
>
> Improve [http://jackrabbit.apache.org/oak/docs/query/lucene.html] with the 
> following:
>  * Extend the *analyzers* section including a reference on how to support 
> *stemming* ([http://jackrabbit.apache.org/oak/docs/query/lucene.html])
>  * *supersedes* - does not seem to be documented**
>  * *functionName (string)* & *useIfExists (string)* are not listed in the 
> canonical *Index Definition* structure.
>  * *function (string)* is not listed in the canonical *Property Definitions* 
> structure
>  * *weight* - in the canonical structure the default value is -1, but the 
> actual default is 5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8910) Improve OAK Lucene Index Documentation

2020-02-27 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17046769#comment-17046769
 ] 

Thomas Mueller commented on OAK-8910:
-

http://svn.apache.org/r1874582 (trunk)

> Improve OAK Lucene Index Documentation
> --
>
> Key: OAK-8910
> URL: https://issues.apache.org/jira/browse/OAK-8910
> Project: Jackrabbit Oak
>  Issue Type: Task
>Reporter: Amrit Verma
>Assignee: Thomas Mueller
>Priority: Minor
>  Labels: amrit
> Attachments: OAK-8910.patch
>
>
> Improve [http://jackrabbit.apache.org/oak/docs/query/lucene.html] with the 
> following:
>  * Extend the *analyzers* section including a reference on how to support 
> *stemming* ([http://jackrabbit.apache.org/oak/docs/query/lucene.html])
>  * *supersedes* - does not seem to be documented**
>  * *functionName (string)* & *useIfExists (string)* are not listed in the 
> canonical *Index Definition* structure.
>  * *function (string)* is not listed in the canonical *Property Definitions* 
> structure
>  * *weight* - in the canonical structure the default value is -1, but the 
> actual default is 5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (OAK-8910) Improve OAK Lucene Index Documentation

2020-02-27 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller reassigned OAK-8910:
---

Assignee: Thomas Mueller

> Improve OAK Lucene Index Documentation
> --
>
> Key: OAK-8910
> URL: https://issues.apache.org/jira/browse/OAK-8910
> Project: Jackrabbit Oak
>  Issue Type: Task
>Reporter: Amrit Verma
>Assignee: Thomas Mueller
>Priority: Minor
>  Labels: amrit
> Attachments: OAK-8910.patch
>
>
> Improve [http://jackrabbit.apache.org/oak/docs/query/lucene.html] with the 
> following:
>  * Extend the *analyzers* section including a reference on how to support 
> *stemming* ([http://jackrabbit.apache.org/oak/docs/query/lucene.html])
>  * *supersedes* - does not seem to be documented**
>  * *functionName (string)* & *useIfExists (string)* are not listed in the 
> canonical *Index Definition* structure.
>  * *function (string)* is not listed in the canonical *Property Definitions* 
> structure
>  * *weight* - in the canonical structure the default value is -1, but the 
> actual default is 5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-7671) [oak-run] Deprecate the datastorecheck command in favor of datastore

2020-02-27 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-7671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17046750#comment-17046750
 ] 

Thomas Mueller commented on OAK-7671:
-

Github has some issues currently according to https://www.githubstatus.com/
For me the patch looks good.

For the method "encodeId", it would be good to add some comments on what it is 
doing and some example input and output. It's very hard to understand right 
now. But this was the case before, and is not related to the patch. If you 
already know some details (maybe by debugging), it would be good to add the 
info. It doesn't need to be a Javadoc:

{noformat}
/**
 * Encode the ... and extract the ...
 * Example:
 *  => ...
 *  => ...
 */
static String encodeId(String line, BlobStoreOptions.Type dsType) {
   // 0102030405... => 01/02/03/0102030405...
   blobId = (blobId.substring(0, 2) + FILE_SEPARATOR.value() + 
blobId.substring(2, 4) + FILE_SEPARATOR.value() + blobId
.substring(4, 6) + FILE_SEPARATOR.value() + blobId);
// 0102030405... => 0102-030405... 
blobId = (blobId.substring(0, 4) + DASH + blobId.substring(4));


if (list.size() > 1) {
 // ( this part I don't understand... why list.get(1)? what does it do?)
return delimJoiner.join(blobId, 
EscapeUtils.unescapeLineBreaks(list.get(1)));
{noformat}


> [oak-run] Deprecate the datastorecheck command in favor of datastore
> 
>
> Key: OAK-7671
> URL: https://issues.apache.org/jira/browse/OAK-7671
> Project: Jackrabbit Oak
>  Issue Type: Task
>  Components: run
>Reporter: Amit Jain
>Assignee: Nitin Gupta
>Priority: Major
> Fix For: 1.26.0
>
>
> With the introduction of \{{datastore}} command which supports both garbage 
> collection as well as consistency check the \{{datastorecheck}} command 
> should be deprecated and delegated internally to use that implementation. 
> Besides some options which are currently not supported by the new command 
> should also be implemented e.g. --ids, --refs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8783) Merge index definitions

2020-02-19 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040129#comment-17040129
 ] 

Thomas Mueller commented on OAK-8783:
-

http://svn.apache.org/r1874198 (trunk)

> Merge index definitions
> ---
>
> Key: OAK-8783
> URL: https://issues.apache.org/jira/browse/OAK-8783
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch, 
> OAK-8783-v2.patch
>
>
> If there are multiple versions of an index, e.g. asset-2-custom-2 and 
> asset-3, then oak-run should be able to merge them to asset-3-custom-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-8892) Add javadoc to package-info files

2020-02-19 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-8892:

Labels: amrit  (was: )

> Add javadoc to package-info files
> -
>
> Key: OAK-8892
> URL: https://issues.apache.org/jira/browse/OAK-8892
> Project: Jackrabbit Oak
>  Issue Type: Task
>Reporter: Amrit Verma
>Assignee: Thomas Mueller
>Priority: Minor
>  Labels: amrit
> Fix For: 1.26.0
>
> Attachments: OAK-8892.patch
>
>
> Add javadoc to package-info files in all packages of {{oak-lucene}} , 
> {{oak-query-spi}} and {{oak-search}} .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (OAK-8892) Add javadoc to package-info files

2020-02-19 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller reassigned OAK-8892:
---

Assignee: Thomas Mueller

> Add javadoc to package-info files
> -
>
> Key: OAK-8892
> URL: https://issues.apache.org/jira/browse/OAK-8892
> Project: Jackrabbit Oak
>  Issue Type: Task
>Reporter: Amrit Verma
>Assignee: Thomas Mueller
>Priority: Minor
> Fix For: 1.26.0
>
> Attachments: OAK-8892.patch
>
>
> Add javadoc to package-info files in all packages of {{oak-lucene}} , 
> {{oak-query-spi}} and {{oak-search}} .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (OAK-8892) Add javadoc to package-info files

2020-02-19 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller resolved OAK-8892.
-
Resolution: Fixed

> Add javadoc to package-info files
> -
>
> Key: OAK-8892
> URL: https://issues.apache.org/jira/browse/OAK-8892
> Project: Jackrabbit Oak
>  Issue Type: Task
>Reporter: Amrit Verma
>Assignee: Thomas Mueller
>Priority: Minor
>  Labels: amrit
> Fix For: 1.26.0
>
> Attachments: OAK-8892.patch
>
>
> Add javadoc to package-info files in all packages of {{oak-lucene}} , 
> {{oak-query-spi}} and {{oak-search}} .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8892) Add javadoc to package-info files

2020-02-19 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040100#comment-17040100
 ] 

Thomas Mueller commented on OAK-8892:
-

Thanks [~reschke]! "svn patch" didn't work as expected... Now hopefully it's 
better:

http://svn.apache.org/r1874197 (trunk)



> Add javadoc to package-info files
> -
>
> Key: OAK-8892
> URL: https://issues.apache.org/jira/browse/OAK-8892
> Project: Jackrabbit Oak
>  Issue Type: Task
>Reporter: Amrit Verma
>Priority: Minor
> Fix For: 1.26.0
>
> Attachments: OAK-8892.patch
>
>
> Add javadoc to package-info files in all packages of {{oak-lucene}} , 
> {{oak-query-spi}} and {{oak-search}} .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8902) Add support in oak-run to list down blob ids for lucene indexes

2020-02-18 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039035#comment-17039035
 ] 

Thomas Mueller commented on OAK-8902:
-

See my comments.

> Add support in oak-run to list down blob ids for lucene indexes
> ---
>
> Key: OAK-8902
> URL: https://issues.apache.org/jira/browse/OAK-8902
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Reporter: Nitin Gupta
>Assignee: Nitin Gupta
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8783) Merge index definitions

2020-02-17 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17038820#comment-17038820
 ] 

Thomas Mueller commented on OAK-8783:
-

Thanks [~amitjain]! I didn't think about this...

> Merge index definitions
> ---
>
> Key: OAK-8783
> URL: https://issues.apache.org/jira/browse/OAK-8783
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch, 
> OAK-8783-v2.patch
>
>
> If there are multiple versions of an index, e.g. asset-2-custom-2 and 
> asset-3, then oak-run should be able to merge them to asset-3-custom-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8892) Add javadoc to package-info files

2020-02-17 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17038182#comment-17038182
 ] 

Thomas Mueller commented on OAK-8892:
-

http://svn.apache.org/r1874108

> Add javadoc to package-info files
> -
>
> Key: OAK-8892
> URL: https://issues.apache.org/jira/browse/OAK-8892
> Project: Jackrabbit Oak
>  Issue Type: Task
>Reporter: Amrit Verma
>Priority: Minor
>  Labels: amrit
> Attachments: OAK-8892.patch
>
>
> Add javadoc to package-info files in all packages of {{oak-lucene}} , 
> {{oak-query-spi}} and {{oak-search}} .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (OAK-8892) Add javadoc to package-info files

2020-02-17 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller resolved OAK-8892.
-
Resolution: Fixed

> Add javadoc to package-info files
> -
>
> Key: OAK-8892
> URL: https://issues.apache.org/jira/browse/OAK-8892
> Project: Jackrabbit Oak
>  Issue Type: Task
>Reporter: Amrit Verma
>Priority: Minor
>  Labels: amrit
> Attachments: OAK-8892.patch
>
>
> Add javadoc to package-info files in all packages of {{oak-lucene}} , 
> {{oak-query-spi}} and {{oak-search}} .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8783) Merge index definitions

2020-02-17 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17038181#comment-17038181
 ] 

Thomas Mueller commented on OAK-8783:
-

http://svn.apache.org/r1874107 (trunk).
Review is still welcome.

> Merge index definitions
> ---
>
> Key: OAK-8783
> URL: https://issues.apache.org/jira/browse/OAK-8783
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch, 
> OAK-8783-v2.patch
>
>
> If there are multiple versions of an index, e.g. asset-2-custom-2 and 
> asset-3, then oak-run should be able to merge them to asset-3-custom-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8892) Add javadoc to package-info files

2020-02-16 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17038128#comment-17038128
 ] 

Thomas Mueller commented on OAK-8892:
-

[~reschke] no that was a mistake, I'm sorry... I will remove the export 
versions and try again.

/cc [~amrverma]

> Add javadoc to package-info files
> -
>
> Key: OAK-8892
> URL: https://issues.apache.org/jira/browse/OAK-8892
> Project: Jackrabbit Oak
>  Issue Type: Task
>Reporter: Amrit Verma
>Priority: Minor
>  Labels: amrit
> Attachments: OAK-8892.patch
>
>
> Add javadoc to package-info files in all packages of {{oak-lucene}} , 
> {{oak-query-spi}} and {{oak-search}} .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8783) Merge index definitions

2020-02-14 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036858#comment-17036858
 ] 

Thomas Mueller commented on OAK-8783:
-

[~ngupta] [~tihom88] [~fabrizio.fort...@gmail.com] could you please review 
OAK-8783-v2.patch ?

> Merge index definitions
> ---
>
> Key: OAK-8783
> URL: https://issues.apache.org/jira/browse/OAK-8783
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch, 
> OAK-8783-v2.patch
>
>
> If there are multiple versions of an index, e.g. asset-2-custom-2 and 
> asset-3, then oak-run should be able to merge them to asset-3-custom-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-8783) Merge index definitions

2020-02-14 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-8783:

Attachment: OAK-8783-v2.patch

> Merge index definitions
> ---
>
> Key: OAK-8783
> URL: https://issues.apache.org/jira/browse/OAK-8783
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch, 
> OAK-8783-v2.patch
>
>
> If there are multiple versions of an index, e.g. asset-2-custom-2 and 
> asset-3, then oak-run should be able to merge them to asset-3-custom-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (OAK-8892) Add javadoc to package-info files

2020-02-13 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller resolved OAK-8892.
-
Resolution: Fixed

> Add javadoc to package-info files
> -
>
> Key: OAK-8892
> URL: https://issues.apache.org/jira/browse/OAK-8892
> Project: Jackrabbit Oak
>  Issue Type: Task
>Reporter: Amrit Verma
>Priority: Minor
>  Labels: amrit
> Attachments: OAK-8892.patch
>
>
> Add javadoc to package-info files in all packages of {{oak-lucene}} , 
> {{oak-query-spi}} and {{oak-search}} .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8892) Add javadoc to package-info files

2020-02-13 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036188#comment-17036188
 ] 

Thomas Mueller commented on OAK-8892:
-

Thanks! http://svn.apache.org/r1873977

> Add javadoc to package-info files
> -
>
> Key: OAK-8892
> URL: https://issues.apache.org/jira/browse/OAK-8892
> Project: Jackrabbit Oak
>  Issue Type: Task
>Reporter: Amrit Verma
>Priority: Minor
>  Labels: amrit
> Attachments: OAK-8892.patch
>
>
> Add javadoc to package-info files in all packages of {{oak-lucene}} , 
> {{oak-query-spi}} and {{oak-search}} .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8711) Queries with facets should not use traversal

2020-02-05 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030682#comment-17030682
 ] 

Thomas Mueller commented on OAK-8711:
-

The attached patch looks good to me. One nitpick: in the test case, you could 
in theory check if the right index is used, by executing "explain select ..." 
and then check the query plan. But I think it's not strictly needed to have 
such a test case, I'm fine with what you have right now.


> Queries with facets should not use traversal
> 
>
> Key: OAK-8711
> URL: https://issues.apache.org/jira/browse/OAK-8711
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Reporter: Nitin Gupta
>Assignee: Nitin Gupta
>Priority: Major
>  Labels: amrit
> Attachments: OAK-8711.patch
>
>
> Consider a scenario where a query is there with facets and the traversal cost 
> is less than the index cost that serves the facet query . This would be 
> problematic.
>  
> In this case we should maybe set the traversal cost to infinity so that 
> traversal is not an option for queries with facets.
>  
> In case there is no index available to serve this faceted query we can 
> probably throw an exception with a meaningful message .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8892) Add javadoc to package-info files

2020-02-04 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029829#comment-17029829
 ] 

Thomas Mueller commented on OAK-8892:
-

See pull request https://github.com/apache/jackrabbit-oak/pull/175

> Add javadoc to package-info files
> -
>
> Key: OAK-8892
> URL: https://issues.apache.org/jira/browse/OAK-8892
> Project: Jackrabbit Oak
>  Issue Type: Task
>Reporter: Amrit Verma
>Priority: Minor
>  Labels: amrit
>
> Add javadoc to package-info files in all packages of {{oak-lucene}} , 
> {{oak-query-spi}} and {{oak-search}} .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8711) Queries with facets should not use traversal

2020-02-04 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029830#comment-17029830
 ] 

Thomas Mueller commented on OAK-8711:
-

See pull request https://github.com/apache/jackrabbit-oak/pull/174

> Queries with facets should not use traversal
> 
>
> Key: OAK-8711
> URL: https://issues.apache.org/jira/browse/OAK-8711
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Reporter: Nitin Gupta
>Assignee: Nitin Gupta
>Priority: Major
>  Labels: amrit
>
> Consider a scenario where a query is there with facets and the traversal cost 
> is less than the index cost that serves the facet query . This would be 
> problematic.
>  
> In this case we should maybe set the traversal cost to infinity so that 
> traversal is not an option for queries with facets.
>  
> In case there is no index available to serve this faceted query we can 
> probably throw an exception with a meaningful message .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-8711) Queries with facets should not use traversal

2020-01-22 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-8711:

Labels: amrit  (was: )

> Queries with facets should not use traversal
> 
>
> Key: OAK-8711
> URL: https://issues.apache.org/jira/browse/OAK-8711
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Reporter: Nitin Gupta
>Assignee: Nitin Gupta
>Priority: Major
>  Labels: amrit
>
> Consider a scenario where a query is there with facets and the traversal cost 
> is less than the index cost that serves the facet query . This would be 
> problematic.
>  
> In this case we should maybe set the traversal cost to infinity so that 
> traversal is not an option for queries with facets.
>  
> In case there is no index available to serve this faceted query we can 
> probably throw an exception with a meaningful message .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (OAK-8854) Improved log message when failed to index an node due to IOException

2020-01-10 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller resolved OAK-8854.
-
Resolution: Fixed

> Improved log message when failed to index an node due to IOException
> 
>
> Key: OAK-8854
> URL: https://issues.apache.org/jira/browse/OAK-8854
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.22.0
>
>
> When there is an IOException trying to index the node, there are cases where 
> the root cause (IOException message) is not logged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-8854) Improved log message when failed to index an node due to IOException

2020-01-10 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-8854:

Fix Version/s: 1.22.0

> Improved log message when failed to index an node due to IOException
> 
>
> Key: OAK-8854
> URL: https://issues.apache.org/jira/browse/OAK-8854
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.22.0
>
>
> When there is an IOException trying to index the node, there are cases where 
> the root cause (IOException message) is not logged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8854) Improved log message when failed to index an node due to IOException

2020-01-10 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012951#comment-17012951
 ] 

Thomas Mueller commented on OAK-8854:
-

http://svn.apache.org/r1872603
http://svn.apache.org/r1872604


> Improved log message when failed to index an node due to IOException
> 
>
> Key: OAK-8854
> URL: https://issues.apache.org/jira/browse/OAK-8854
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
>
> When there is an IOException trying to index the node, there are cases where 
> the root cause (IOException message) is not logged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (OAK-8854) Improved log message when failed to index an node due to IOException

2020-01-10 Thread Thomas Mueller (Jira)
Thomas Mueller created OAK-8854:
---

 Summary: Improved log message when failed to index an node due to 
IOException
 Key: OAK-8854
 URL: https://issues.apache.org/jira/browse/OAK-8854
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: indexing
Reporter: Thomas Mueller
Assignee: Thomas Mueller


When there is an IOException trying to index the node, there are cases where 
the root cause (IOException message) is not logged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-6254) DataStore: API to retrieve approximate storage size

2020-01-10 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-6254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-6254:

Priority: Minor  (was: Major)

> DataStore: API to retrieve approximate storage size
> ---
>
> Key: OAK-6254
> URL: https://issues.apache.org/jira/browse/OAK-6254
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: blob
>Reporter: Thomas Mueller
>Priority: Minor
>
> The estimated size of the datastore (on disk) is needed to:
> * monitor growth over time, or growth of certain operations
> * monitor if garbage collection is effective
> * avoid out of disk space
> * estimate backup size
> * statistical purposes (for example, if there are many repositories, to group 
> them by size)
> Datastore size: we could use the following heuristic: We could read the file 
> sizes in ./datastore/00/00 (if it exists) and multiply by 65536; or 
> ./datastore/00 and multiply by 256. That would give a rough estimation 
> (within about 20% for repositories with datastore size > 50 GB).
> I think this is mainly important for the FileDataStore. The S3 datastore, if 
> there is a simple and fast S3 API to read the size, then that would be good 
> as well, but if there is none, then returning "unknown" is fine for me.
> As for the API, I would use something like this: {{long 
> getEstimatedStorageSize(int accuracyLevel)}} with accuracyLevel 1 for 
> inaccurate (fastest), 2 more accurate (slower),..., 9 precise (possibly very 
> slow). Similar to 
> [java.util.zip.Deflater.setLevel|https://docs.oracle.com/javase/7/docs/api/java/util/zip/Deflater.html#setLevel(int)].
>  I would expect it takes up to 1 second for accuracyLevel 0, up to 5 seconds 
> for accuracyLevel 1, and possibly hours for level 9. With level 1, I would 
> read files in 00/00, with level 2 - 8 I would read files in 00, and with 
> level 9 I would read all the files. For level 1, I wouldn't stop; for level 
> 2, if it takes more than 5 seconds, I would stop and return the current best 
> estimate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-6254) DataStore: API to retrieve approximate storage size

2020-01-10 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-6254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-6254:

Fix Version/s: (was: 1.22.0)

> DataStore: API to retrieve approximate storage size
> ---
>
> Key: OAK-6254
> URL: https://issues.apache.org/jira/browse/OAK-6254
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: blob
>Reporter: Thomas Mueller
>Priority: Major
>
> The estimated size of the datastore (on disk) is needed to:
> * monitor growth over time, or growth of certain operations
> * monitor if garbage collection is effective
> * avoid out of disk space
> * estimate backup size
> * statistical purposes (for example, if there are many repositories, to group 
> them by size)
> Datastore size: we could use the following heuristic: We could read the file 
> sizes in ./datastore/00/00 (if it exists) and multiply by 65536; or 
> ./datastore/00 and multiply by 256. That would give a rough estimation 
> (within about 20% for repositories with datastore size > 50 GB).
> I think this is mainly important for the FileDataStore. The S3 datastore, if 
> there is a simple and fast S3 API to read the size, then that would be good 
> as well, but if there is none, then returning "unknown" is fine for me.
> As for the API, I would use something like this: {{long 
> getEstimatedStorageSize(int accuracyLevel)}} with accuracyLevel 1 for 
> inaccurate (fastest), 2 more accurate (slower),..., 9 precise (possibly very 
> slow). Similar to 
> [java.util.zip.Deflater.setLevel|https://docs.oracle.com/javase/7/docs/api/java/util/zip/Deflater.html#setLevel(int)].
>  I would expect it takes up to 1 second for accuracyLevel 0, up to 5 seconds 
> for accuracyLevel 1, and possibly hours for level 9. With level 1, I would 
> read files in 00/00, with level 2 - 8 I would read files in 00, and with 
> level 9 I would read all the files. For level 1, I wouldn't stop; for level 
> 2, if it takes more than 5 seconds, I would stop and return the current best 
> estimate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-5787) BlobStore should be AutoCloseable

2019-12-12 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994394#comment-16994394
 ] 

Thomas Mueller commented on OAK-5787:
-

+1

> BlobStore should be AutoCloseable
> -
>
> Key: OAK-5787
> URL: https://issues.apache.org/jira/browse/OAK-5787
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: blob
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
> Fix For: 1.22.0
>
> Attachments: OAK-5787.diff
>
>
> {{DocumentNodeStore}} currently calls {{close()}} if the blob store instance 
> implements {{Closeable}}.
> This has led to problems where wrapper implementations did not implement it, 
> and thus the actual blob store instance wasn't properly shut down.
> Proposal: make {{BlobStore}} extend {{Closeable}} and get rid of all 
> {{instanceof}} checks.
> [~thomasm] [~amitjain] - feedback appreciated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8783) Merge index definitions

2019-11-29 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984861#comment-16984861
 ] 

Thomas Mueller commented on OAK-8783:
-

http://svn.apache.org/r1870584 (trunk) - reviews are still welcome. I also had 
to change the version (from 1.0.1 to 1.1.0).

> Merge index definitions
> ---
>
> Key: OAK-8783
> URL: https://issues.apache.org/jira/browse/OAK-8783
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch
>
>
> If there are multiple versions of an index, e.g. asset-2-custom-2 and 
> asset-3, then oak-run should be able to merge them to asset-3-custom-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-8783) Merge index definitions

2019-11-29 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-8783:

Component/s: indexing

> Merge index definitions
> ---
>
> Key: OAK-8783
> URL: https://issues.apache.org/jira/browse/OAK-8783
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch
>
>
> If there are multiple versions of an index, e.g. asset-2-custom-2 and 
> asset-3, then oak-run should be able to merge them to asset-3-custom-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-8783) Merge index definitions

2019-11-29 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-8783:

Fix Version/s: (was: 1.22.0)

> Merge index definitions
> ---
>
> Key: OAK-8783
> URL: https://issues.apache.org/jira/browse/OAK-8783
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch
>
>
> If there are multiple versions of an index, e.g. asset-2-custom-2 and 
> asset-3, then oak-run should be able to merge them to asset-3-custom-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-8783) Merge index definitions

2019-11-29 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-8783:

Fix Version/s: 1.22.0

> Merge index definitions
> ---
>
> Key: OAK-8783
> URL: https://issues.apache.org/jira/browse/OAK-8783
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: indexing
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.22.0
>
> Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch
>
>
> If there are multiple versions of an index, e.g. asset-2-custom-2 and 
> asset-3, then oak-run should be able to merge them to asset-3-custom-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8783) Merge index definitions

2019-11-29 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984859#comment-16984859
 ] 

Thomas Mueller commented on OAK-8783:
-

Good point! I will change the newObjectNotRespectingOrder test, so that it 
doesn't expect any specific order.

> Merge index definitions
> ---
>
> Key: OAK-8783
> URL: https://issues.apache.org/jira/browse/OAK-8783
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch
>
>
> If there are multiple versions of an index, e.g. asset-2-custom-2 and 
> asset-3, then oak-run should be able to merge them to asset-3-custom-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8783) Merge index definitions

2019-11-29 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984815#comment-16984815
 ] 

Thomas Mueller commented on OAK-8783:
-

[~ngupta] [~tihom88] [~fabrizio.fort...@gmail.com] could you please review 
OAK-8783-json-1.patch ?

> Merge index definitions
> ---
>
> Key: OAK-8783
> URL: https://issues.apache.org/jira/browse/OAK-8783
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch
>
>
> If there are multiple versions of an index, e.g. asset-2-custom-2 and 
> asset-3, then oak-run should be able to merge them to asset-3-custom-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-8783) Merge index definitions

2019-11-29 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-8783:

Attachment: OAK-8783-json-1.patch

> Merge index definitions
> ---
>
> Key: OAK-8783
> URL: https://issues.apache.org/jira/browse/OAK-8783
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-8783-json-1.patch, OAK-8783-v1.patch
>
>
> If there are multiple versions of an index, e.g. asset-2-custom-2 and 
> asset-3, then oak-run should be able to merge them to asset-3-custom-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8783) Merge index definitions

2019-11-29 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984812#comment-16984812
 ] 

Thomas Mueller commented on OAK-8783:
-

One problem is that the Gson library doesn't support the child order
https://stackoverflow.com/questions/6365851/how-to-keep-fields-sequence-in-gson-serialization

This is a problem because indexes in Oak do need to respect order of child 
nodes for some features:
http://jackrabbit.apache.org/oak/docs/query/lucene.html
"The rules are looked up in the order of there entry under indexRules node 
(indexRule node itself is of type nt:unstructured which has orderable child 
nodes)" - "Order of property definition node is important as some properties 
are based on regular expressions"

Instead of Gson, we need use a different serialization library, e.g. the Oak 
JsonObject. I will add the needed features and tests there first.

> Merge index definitions
> ---
>
> Key: OAK-8783
> URL: https://issues.apache.org/jira/browse/OAK-8783
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-8783-v1.patch
>
>
> If there are multiple versions of an index, e.g. asset-2-custom-2 and 
> asset-3, then oak-run should be able to merge them to asset-3-custom-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8794) oak-solr-osgi does not build for Java 8 if Jackson libraries upgraded to 2.10.0

2019-11-26 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16982268#comment-16982268
 ] 

Thomas Mueller commented on OAK-8794:
-

Un-assigning from me right now.

> Would it be possible to update oak-parent/pom.xml to Jackson version 2.10.0 
> and then specify 2.9.10 in oak-solr-osgi?

[~teofili], do you know if this might work?

> oak-solr-osgi does not build for Java 8 if Jackson libraries upgraded to 
> 2.10.0
> ---
>
> Key: OAK-8794
> URL: https://issues.apache.org/jira/browse/OAK-8794
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: solr
>Affects Versions: 1.20.0
>Reporter: Matt Ryan
>Priority: Major
>
> If the Jackson version in {{oak-parent/pom.xml}} is updated from 2.9.10 to 
> 2.10.0, we get a build failure in {{oak-solr-osgi}} if we try to build with 
> Java 8.
> This is blocking OAK-8105 which in turn is blocking OAK-8607 and OAK-8104.  
> OAK-8105 is about updating {{AzureDataStore}} to the Azure version 12 SDK 
> which requires Jackson 2.10.0.
> Would it be possible to update {{oak-parent/pom.xml}} to Jackson version 
> 2.10.0 and then specify 2.9.10 in {{oak-solr-osgi}}?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (OAK-8794) oak-solr-osgi does not build for Java 8 if Jackson libraries upgraded to 2.10.0

2019-11-26 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller reassigned OAK-8794:
---

Assignee: (was: Thomas Mueller)

> oak-solr-osgi does not build for Java 8 if Jackson libraries upgraded to 
> 2.10.0
> ---
>
> Key: OAK-8794
> URL: https://issues.apache.org/jira/browse/OAK-8794
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: solr
>Affects Versions: 1.20.0
>Reporter: Matt Ryan
>Priority: Major
>
> If the Jackson version in {{oak-parent/pom.xml}} is updated from 2.9.10 to 
> 2.10.0, we get a build failure in {{oak-solr-osgi}} if we try to build with 
> Java 8.
> This is blocking OAK-8105 which in turn is blocking OAK-8607 and OAK-8104.  
> OAK-8105 is about updating {{AzureDataStore}} to the Azure version 12 SDK 
> which requires Jackson 2.10.0.
> Would it be possible to update {{oak-parent/pom.xml}} to Jackson version 
> 2.10.0 and then specify 2.9.10 in {{oak-solr-osgi}}?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8783) Merge index definitions

2019-11-22 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980259#comment-16980259
 ] 

Thomas Mueller commented on OAK-8783:
-

Attached a first patch (work in progress).

> Merge index definitions
> ---
>
> Key: OAK-8783
> URL: https://issues.apache.org/jira/browse/OAK-8783
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-8783-v1.patch
>
>
> If there are multiple versions of an index, e.g. asset-2-custom-2 and 
> asset-3, then oak-run should be able to merge them to asset-3-custom-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-8783) Merge index definitions

2019-11-22 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-8783:

Attachment: OAK-8783-v1.patch

> Merge index definitions
> ---
>
> Key: OAK-8783
> URL: https://issues.apache.org/jira/browse/OAK-8783
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Attachments: OAK-8783-v1.patch
>
>
> If there are multiple versions of an index, e.g. asset-2-custom-2 and 
> asset-3, then oak-run should be able to merge them to asset-3-custom-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (OAK-8783) Merge index definitions

2019-11-22 Thread Thomas Mueller (Jira)
Thomas Mueller created OAK-8783:
---

 Summary: Merge index definitions
 Key: OAK-8783
 URL: https://issues.apache.org/jira/browse/OAK-8783
 Project: Jackrabbit Oak
  Issue Type: Improvement
Reporter: Thomas Mueller
Assignee: Thomas Mueller


If there are multiple versions of an index, e.g. asset-2-custom-2 and asset-3, 
then oak-run should be able to merge them to asset-3-custom-1.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8779) QueryImpl: indexPlan used for logging always is null

2019-11-21 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979313#comment-16979313
 ] 

Thomas Mueller commented on OAK-8779:
-

You are right.

I saw this as well some time ago, but so far didn't log an issue.

I will add that to the technical dept list.

> QueryImpl: indexPlan used for logging always is null
> 
>
> Key: OAK-8779
> URL: https://issues.apache.org/jira/browse/OAK-8779
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Reporter: Julian Reschke
>Priority: Minor
>
>  
> {noformat}
> if (indexPlan != null && indexPlan.getPlanName() != null) {
>  indexName += "[" + indexPlan.getPlanName() + "]";
>  } {noformat}
>  
> (indexPlan always is null, maybe caused by code being moved around)
>  
> cc: [~chetanm] [~thomasm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-6261) Log queries that sort by un-indexed properties

2019-11-19 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-6261:

Fix Version/s: (was: 1.22.0)

> Log queries that sort by un-indexed properties
> --
>
> Key: OAK-6261
> URL: https://issues.apache.org/jira/browse/OAK-6261
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Minor
>
> Queries that can read many nodes, and sort by properties that are not 
> indexed, can be very slow. This includes for example fulltext queries.
> As a start, it might make sense to log an "info" level message (but avoid 
> logging the same message each time a query is run). Per configuration, this 
> could be turned to "warning".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-7300) Lucene Index: per-column selectivity to improve cost estimation

2019-11-19 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-7300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-7300:

Fix Version/s: (was: 1.22.0)

> Lucene Index: per-column selectivity to improve cost estimation
> ---
>
> Key: OAK-7300
> URL: https://issues.apache.org/jira/browse/OAK-7300
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene, query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
>
> In OAK-6735 we have improved cost estimation for Lucene indexes, however the 
> following case is still not working as expected: a very common property is 
> indexes (many nodes have that property), and each value of that property is 
> more or less unique. In this case, currently the cost estimation is the total 
> number of documents that contain that property. Assuming the condition 
> "property is not null" this is correct, however for the common case "property 
> = x" the estimated cost is far too high.
> A known workaround is to set the "costPerEntry" for the given index to a low 
> value, for example 0.2. However this isn't a good solution, as it affects all 
> properties and queries.
> It would be good to be able to set the selectivity per property, for example 
> by specifying the number of distinct values, or (better yet) the average 
> number of entries for a given key (1 for unique values, 2 meaning for each 
> distinct values there are two documents on average).
> That value can be set manually (cost override), and it can be set 
> automatically, e.g. when building the index, or updated from time to time 
> during the index update, using a cardinality
> estimation algorithm. That doesn't have to be accurate; we could use an rough 
> approximation such as hyperbitbit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-7374) Investigate changing the UUID generation algorithm / format to reduce index size, improve speed

2019-11-19 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-7374:

Fix Version/s: (was: 1.22.0)

> Investigate changing the UUID generation algorithm / format to reduce index 
> size, improve speed
> ---
>
> Key: OAK-7374
> URL: https://issues.apache.org/jira/browse/OAK-7374
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: core
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
>
> UUIDs are currently randomly generated, which is bad for indexing; specially 
> read and writes access, due to low locality.
> If we could add a time component, I think the index churn (amount of writes) 
> would shrink, and lookup would be faster.
> It should be fairly easy to verify if that's really true (create a 
> proof-of-concept, and measure).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-3219) Lucene IndexPlanner should also account for number of property constraints evaluated while giving cost estimation

2019-11-19 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-3219:

Fix Version/s: (was: 1.22.0)

> Lucene IndexPlanner should also account for number of property constraints 
> evaluated while giving cost estimation
> -
>
> Key: OAK-3219
> URL: https://issues.apache.org/jira/browse/OAK-3219
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Chetan Mehrotra
>Assignee: Thomas Mueller
>Priority: Minor
>  Labels: performance
>
> Currently the cost returned by Lucene index is a function of number of 
> indexed documents present in the index. If the number of indexed entries are 
> high then it might reduce chances of this index getting selected if some 
> property index also support of the property constraint.
> {noformat}
> /jcr:root/content/freestyle-cms/customers//element(*, cq:Page)
> [(jcr:content/@title = 'm' or jcr:like(jcr:content/@title, 'm%')) 
> and jcr:content/@sling:resourceType = '/components/page/customer’]
> {noformat}
> Consider above query with following index definition
> * A property index on resourceType
> * A Lucene index for cq:Page with properties {{jcr:content/title}}, 
> {{jcr:content/sling:resourceType}} indexed and also path restriction 
> evaluation enabled
> Now what the two indexes can help in
> # Property index
> ## Path restriction
> ## Property restriction on  {{sling:resourceType}}
> # Lucene index
> ## NodeType restriction
> ## Property restriction on  {{sling:resourceType}}
> ## Property restriction on  {{title}}
> ## Path restriction
> Now cost estimate currently works like this
> * Property index - {{f(indexedValueEstimate, estimateOfNodesUnderGivenPath)}}
> ** indexedValueEstimate - For 'sling:resourceType=foo' its the approximate 
> count for nodes having that as 'foo'
> ** estimateOfNodesUnderGivenPath - Its derived from an approximate estimation 
> of nodes present under given path
> * Lucene Index - {{f(totalIndexedEntries)}}
> As cost of Lucene is too simple it does not reflect the reality. Following 2 
> changes can be done to make it better
> * Given that Lucene index can handle multiple constraints compared (4) to 
> property index (2), the cost estimate returned by it should also reflect this 
> state. This can be done by setting costPerEntry to 1/(no of property 
> restriction evaluated)
> * Get the count for queried property value - This is similar to what 
> PropertyIndex does and assumes that Lucene can provide that information in 
> O(1) cost. In case of multiple supported property restriction this can be 
> minima of all



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-6844) Consistency checker Directory value is always ":data"

2019-11-19 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-6844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-6844:

Fix Version/s: (was: 1.22.0)

> Consistency checker Directory value is always ":data"
> -
>
> Key: OAK-6844
> URL: https://issues.apache.org/jira/browse/OAK-6844
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Affects Versions: 1.7.9
>Reporter: Paul Chibulcuteanu
>Assignee: Thomas Mueller
>Priority: Minor
>
> When running a _fullCheck_ consistency check from the Lucene Index statistics 
> MBean, the _Directory_ results is always _:data_
> See below:
> {code}
> /oak:index/lucene => VALID
>   Size : 42.3 MB
> Directory : :data
>   Size : 42.3 MB
>   Num docs : 159132
>   CheckIndex status : true
> Time taken : 3.544 s
> {code}
> I'm not really sure what information should be put here, but the _:data_ 
> value is confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-6897) XPath query: option to _not_ convert "or" to "union"

2019-11-19 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-6897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-6897:

Fix Version/s: (was: 1.22.0)

> XPath query: option to _not_ convert "or" to "union"
> 
>
> Key: OAK-6897
> URL: https://issues.apache.org/jira/browse/OAK-6897
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Trivial
>
> Right now, all XPath queries that contain "or" of the form "@a=1 or @b=2" are 
> converted to SQL-2 "union". In some cases, this is a problem, specially in 
> combination with "order by @jcr:score desc".
> Now that SQL-2 "or" conditions can be converted to union (depending if union 
> has a lower cost), it is no longer strictly needed to do the union conversion 
> in the XPath conversion. Or at least emit different SQL-2 queries and take 
> the one with the lowest cost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-5787) BlobStore should be AutoCloseable

2019-11-15 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975071#comment-16975071
 ] 

Thomas Mueller commented on OAK-5787:
-

For DefaultSplitBlobStore, if both thrown exception, the first one is lost. I 
think a solution would be to use addSuppressed (available in Java 1.7):

{noformat}
+
+@Override
+public void close() throws Exception {
+Exception thrown = null;
+try {
+oldBlobStore.close();
+} catch (Exception ex) {
+thrown = ex;
+}
+try {
+newBlobStore.close();
+} catch (Exception ex) {
+if (thrown != null) {
+thrown.addSuppressed(ex);
+} else {
+thrown = ex;
+}
+}
+if (thrown != null) {
+throw thrown;
+}
+}
{noformat}

> BlobStore should be AutoCloseable
> -
>
> Key: OAK-5787
> URL: https://issues.apache.org/jira/browse/OAK-5787
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: blob
>Reporter: Julian Reschke
>Assignee: Julian Reschke
>Priority: Minor
> Fix For: 1.22.0
>
> Attachments: OAK-5787.diff
>
>
> {{DocumentNodeStore}} currently calls {{close()}} if the blob store instance 
> implements {{Closeable}}.
> This has led to problems where wrapper implementations did not implement it, 
> and thus the actual blob store instance wasn't properly shut down.
> Proposal: make {{BlobStore}} extend {{Closeable}} and get rid of all 
> {{instanceof}} checks.
> [~thomasm] [~amitjain] - feedback appreciated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8673) Determine and possibly adjust size of eagerCacheSize

2019-11-13 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973400#comment-16973400
 ] 

Thomas Mueller commented on OAK-8673:
-

[~angela] I'm sorry I don't fully understand this... Is there some 
documentation where this is explained? It might help to have it, for cases were 
the cache sizes need to be adjusted (to avoid out of memory). As far as I know 
(maybe wrong), there is:

* eager cache (per session? in number of entries and not memory usage. 
configurable as you configured it, but how?)
* lazy-evaluation cache (per session? how large? I assume in number of entries 
and not memory usage. configurable?)
* defaultpermissioncache (what is that exactly? is it lazy-evaluation cache or 
eager cache or something else?)

When opening a session, the eager cache is filled if cache size is large 
enough(?) If too large, then not. But there is a lazy-evaluation. What I still 
don't get - If benchmark results are if the eager cache is disabled, why is it 
so slow? Is it just that for this test case, hit rate on the lazy-evaluation 
cache is so bad?

> Determine and possibly adjust size of eagerCacheSize
> 
>
> Key: OAK-8673
> URL: https://issues.apache.org/jira/browse/OAK-8673
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: core, security
>Reporter: Angela Schreiber
>Assignee: Angela Schreiber
>Priority: Major
>
> The initial results of the {{EagerCacheSizeTest}} seem to indicate that we 
> almost never benefit from the lazy permission evaluation (compared to reading 
> all permission entries right away). From my understanding of the results the 
> only exception are those cases where only very few items are being accessed 
> (e.g. reading 100 items).
> However, I am not totally sure if this is not a artifact of the random-read. 
> I therefore started extending the benchmark with an option to re-read a 
> randomly picked item more that once, which according to some analysis done 
> quite some time ago is a common scenario specially when using Oak in 
> combination with Apache Sling.
> Benchmarks with 10-times re-reading the same random item:
> As I would have expected it seems that the negative impact of lazy-loading is 
> somewhat reduced, as the re-reading will hit the cache populated while 
> reading.
> Result are attached to OAK-8662 (possibly more to come).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8673) Determine and possibly adjust size of eagerCacheSize

2019-11-13 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973185#comment-16973185
 ] 

Thomas Mueller commented on OAK-8673:
-

> beyond the task at hand to re-evaluate if the current value of 
> eager-cache-size is sufficient 

Well you don't want to expand the cache size if there is a risk of running out 
of memory... But given the next statement I'm not sure if there really is such 
a risk...

> even for the lazy-evaluation a cache is populated (in fact there are even 2 
> maps in that case), so depending on the distribution of permission entries 
> and the access pattern (read/writing), the lazy cache might even consume more 
> memory than the eager-cache...

But, why are benchmark results so bad the eager cache is disabled (size set to 
0)?

> Determine and possibly adjust size of eagerCacheSize
> 
>
> Key: OAK-8673
> URL: https://issues.apache.org/jira/browse/OAK-8673
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: core, security
>Reporter: Angela Schreiber
>Assignee: Angela Schreiber
>Priority: Major
>
> The initial results of the {{EagerCacheSizeTest}} seem to indicate that we 
> almost never benefit from the lazy permission evaluation (compared to reading 
> all permission entries right away). From my understanding of the results the 
> only exception are those cases where only very few items are being accessed 
> (e.g. reading 100 items).
> However, I am not totally sure if this is not a artifact of the random-read. 
> I therefore started extending the benchmark with an option to re-read a 
> randomly picked item more that once, which according to some analysis done 
> quite some time ago is a common scenario specially when using Oak in 
> combination with Apache Sling.
> Benchmarks with 10-times re-reading the same random item:
> As I would have expected it seems that the negative impact of lazy-loading is 
> somewhat reduced, as the re-reading will hit the cache populated while 
> reading.
> Result are attached to OAK-8662 (possibly more to come).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8673) Determine and possibly adjust size of eagerCacheSize

2019-11-13 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973150#comment-16973150
 ] 

Thomas Mueller commented on OAK-8673:
-

So with cach size 0 (no cache), the system is very slow (basically unusable). 
So a cache is need. I see two problems:

* A: Having one cache per session is problematic if there is no limit in the 
number of sessions: there is no way to guarantee the system will not run out of 
memory. Is there no way to use just one cache (for all sessions)?

* B: Having a cache size in number of entries is problematic, if memory usage 
of entries is very different: there is no way to guarantee the system will not 
run out of memory. To solve this, in various places in Oak we use "weighted" 
caches, and estimate memory usage of entries (e.g. for strings, 24 + number of 
characters). I can help with this. 

I think both A and B need to be addressed.




> Determine and possibly adjust size of eagerCacheSize
> 
>
> Key: OAK-8673
> URL: https://issues.apache.org/jira/browse/OAK-8673
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: core, security
>Reporter: Angela Schreiber
>Assignee: Angela Schreiber
>Priority: Major
>
> The initial results of the {{EagerCacheSizeTest}} seem to indicate that we 
> almost never benefit from the lazy permission evaluation (compared to reading 
> all permission entries right away). From my understanding of the results the 
> only exception are those cases where only very few items are being accessed 
> (e.g. reading 100 items).
> However, I am not totally sure if this is not a artifact of the random-read. 
> I therefore started extending the benchmark with an option to re-read a 
> randomly picked item more that once, which according to some analysis done 
> quite some time ago is a common scenario specially when using Oak in 
> combination with Apache Sling.
> Benchmarks with 10-times re-reading the same random item:
> As I would have expected it seems that the negative impact of lazy-loading is 
> somewhat reduced, as the re-reading will hit the cache populated while 
> reading.
> Result are attached to OAK-8662 (possibly more to come).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (OAK-8729) Lucene Directory concurrency issue

2019-11-07 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller resolved OAK-8729.
-
Resolution: Fixed

> Lucene Directory concurrency issue
> --
>
> Key: OAK-8729
> URL: https://issues.apache.org/jira/browse/OAK-8729
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Affects Versions: 1.12.0, 1.14.0, 1.16.0, 1.18.0
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.20.0
>
> Attachments: OAK-8729.patch
>
>
> There is a concurrency issue in the DefaultDirectoryFactory. It is 
> reproducible sometimes using CopyOnWriteDirectoryTest.copyOnWrite(), if run 
> in a loop (1000 times). The problem is that the MemoryNodeBuilder is used 
> concurrently:
> * thread 1 is closing the directory (after writing to it)
> * thread 2 is trying to create a new file
> {noformat}
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.exists(MemoryNodeBuilder.java:284)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setProperty(MemoryNodeBuilder.java:525)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.OakDirectory.close(OakDirectory.java:264)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.BufferedOakDirectory.close(BufferedOakDirectory.java:217)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnReadDirectory$2.run(CopyOnReadDirectory.java:305)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.exists(MemoryNodeBuilder.java:284)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:362)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:356)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.child(MemoryNodeBuilder.java:342)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.OakDirectory.createOutput(OakDirectory.java:214)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.BufferedOakDirectory.createOutput(BufferedOakDirectory.java:178)
>   at org.apache.lucene.store.Directory.copy(Directory.java:184)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$3.call(CopyOnWriteDirectory.java:322)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$3.call(CopyOnWriteDirectory.java:1)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$2$1.call(CopyOnWriteDirectory.java:105)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$2$1.call(CopyOnWriteDirectory.java:1)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8729) Lucene Directory concurrency issue

2019-11-07 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969230#comment-16969230
 ] 

Thomas Mueller commented on OAK-8729:
-

http://svn.apache.org/r1869505 (trunk)

> Lucene Directory concurrency issue
> --
>
> Key: OAK-8729
> URL: https://issues.apache.org/jira/browse/OAK-8729
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Affects Versions: 1.12.0, 1.14.0, 1.16.0, 1.18.0
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.20.0
>
> Attachments: OAK-8729.patch
>
>
> There is a concurrency issue in the DefaultDirectoryFactory. It is 
> reproducible sometimes using CopyOnWriteDirectoryTest.copyOnWrite(), if run 
> in a loop (1000 times). The problem is that the MemoryNodeBuilder is used 
> concurrently:
> * thread 1 is closing the directory (after writing to it)
> * thread 2 is trying to create a new file
> {noformat}
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.exists(MemoryNodeBuilder.java:284)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setProperty(MemoryNodeBuilder.java:525)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.OakDirectory.close(OakDirectory.java:264)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.BufferedOakDirectory.close(BufferedOakDirectory.java:217)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnReadDirectory$2.run(CopyOnReadDirectory.java:305)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.exists(MemoryNodeBuilder.java:284)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:362)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:356)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.child(MemoryNodeBuilder.java:342)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.OakDirectory.createOutput(OakDirectory.java:214)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.BufferedOakDirectory.createOutput(BufferedOakDirectory.java:178)
>   at org.apache.lucene.store.Directory.copy(Directory.java:184)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$3.call(CopyOnWriteDirectory.java:322)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$3.call(CopyOnWriteDirectory.java:1)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$2$1.call(CopyOnWriteDirectory.java:105)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$2$1.call(CopyOnWriteDirectory.java:1)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8729) Lucene Directory concurrency issue

2019-11-07 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969229#comment-16969229
 ] 

Thomas Mueller commented on OAK-8729:
-

I'm afraid I don't know currently how we could make this part more stable... 
verifying the directory is still open would be a good idea, but I'm afraid I 
don't know currently how to do that without changing a lot of code (basically, 
not use the Lucene interfaces).

> Lucene Directory concurrency issue
> --
>
> Key: OAK-8729
> URL: https://issues.apache.org/jira/browse/OAK-8729
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Affects Versions: 1.12.0, 1.14.0, 1.16.0, 1.18.0
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.20.0
>
> Attachments: OAK-8729.patch
>
>
> There is a concurrency issue in the DefaultDirectoryFactory. It is 
> reproducible sometimes using CopyOnWriteDirectoryTest.copyOnWrite(), if run 
> in a loop (1000 times). The problem is that the MemoryNodeBuilder is used 
> concurrently:
> * thread 1 is closing the directory (after writing to it)
> * thread 2 is trying to create a new file
> {noformat}
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.exists(MemoryNodeBuilder.java:284)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setProperty(MemoryNodeBuilder.java:525)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.OakDirectory.close(OakDirectory.java:264)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.BufferedOakDirectory.close(BufferedOakDirectory.java:217)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnReadDirectory$2.run(CopyOnReadDirectory.java:305)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.exists(MemoryNodeBuilder.java:284)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:362)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:356)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.child(MemoryNodeBuilder.java:342)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.OakDirectory.createOutput(OakDirectory.java:214)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.BufferedOakDirectory.createOutput(BufferedOakDirectory.java:178)
>   at org.apache.lucene.store.Directory.copy(Directory.java:184)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$3.call(CopyOnWriteDirectory.java:322)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$3.call(CopyOnWriteDirectory.java:1)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$2$1.call(CopyOnWriteDirectory.java:105)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$2$1.call(CopyOnWriteDirectory.java:1)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8729) Lucene Directory concurrency issue

2019-11-07 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969227#comment-16969227
 ] 

Thomas Mueller commented on OAK-8729:
-

> The close method for  wrapForRead [1] calls remote.close and local.close [2] 
> [2]and same instance  is being used by wrapForWrite[3].

Yes, that's true. I verified the remote is closed, but the tests don't fail due 
to that.

Unfortunately, it is hard to verify the directory is not closed: there is a 
verify method in the Directory interface, but it is not public (only protected).

> Can we perform operations even if close had been called on Directory instance?

It looks like none of the tests failed due to this. It seems like the 
operations we perform don't cause problems.

> Lucene Directory concurrency issue
> --
>
> Key: OAK-8729
> URL: https://issues.apache.org/jira/browse/OAK-8729
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Affects Versions: 1.12.0, 1.14.0, 1.16.0, 1.18.0
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.20.0
>
> Attachments: OAK-8729.patch
>
>
> There is a concurrency issue in the DefaultDirectoryFactory. It is 
> reproducible sometimes using CopyOnWriteDirectoryTest.copyOnWrite(), if run 
> in a loop (1000 times). The problem is that the MemoryNodeBuilder is used 
> concurrently:
> * thread 1 is closing the directory (after writing to it)
> * thread 2 is trying to create a new file
> {noformat}
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.exists(MemoryNodeBuilder.java:284)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setProperty(MemoryNodeBuilder.java:525)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.OakDirectory.close(OakDirectory.java:264)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.BufferedOakDirectory.close(BufferedOakDirectory.java:217)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnReadDirectory$2.run(CopyOnReadDirectory.java:305)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.exists(MemoryNodeBuilder.java:284)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:362)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:356)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.child(MemoryNodeBuilder.java:342)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.OakDirectory.createOutput(OakDirectory.java:214)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.BufferedOakDirectory.createOutput(BufferedOakDirectory.java:178)
>   at org.apache.lucene.store.Directory.copy(Directory.java:184)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$3.call(CopyOnWriteDirectory.java:322)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$3.call(CopyOnWriteDirectory.java:1)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$2$1.call(CopyOnWriteDirectory.java:105)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$2$1.call(CopyOnWriteDirectory.java:1)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-5858) Lucene index may return the wrong result if path is excluded

2019-11-06 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-5858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-5858:

Fix Version/s: (was: 1.20.0)

> Lucene index may return the wrong result if path is excluded
> 
>
> Key: OAK-5858
> URL: https://issues.apache.org/jira/browse/OAK-5858
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Reporter: Thomas Mueller
>Priority: Major
>
> If a query uses a Lucene index that has "excludedPaths", the query result may 
> be wrong (not contain all matching nodes). This is case even if there is a 
> property index available for the queried property. Example:
> {noformat}
> Indexes:
> /oak:index/resourceType/type = "property"
> /oak:index/lucene/type = "lucene"
> /oak:index/lucene/excludedPaths = ["/etc"]
> /oak:index/lucene/indexRules/nt:base/properties/resourceType
> Query:
> /jcr:root/etc//*[jcr:like(@resourceType, "x%y")]
> Index cost:
> cost for /oak:index/resourceType is 1602.0
> cost for /oak:index/lucene is 1001.0
> Result:
> (empty)
> Expected result:
> /etc/a
> /etc/b
> {noformat}
> Here, the lucene index is picked, even thought the query explicitly queries 
> for /etc, and the lucene index has this path excluded.
> I think the lucene index should not be picked in case the index does not 
> match the query path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-5980) Bad Join Query Plan Used

2019-11-06 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-5980:

Fix Version/s: (was: 1.20.0)

> Bad Join Query Plan Used
> 
>
> Key: OAK-5980
> URL: https://issues.apache.org/jira/browse/OAK-5980
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
>
> For a join query, where selectors are joined over ischildnode but also can 
> use an index,
> the selectors sometimes use the index instead of the much less
> expensive parent join. Example:
> {noformat}
> select [a].* from [nt:unstructured] as [a]
> inner join [nt:unstructured] as [b] on ischildnode([b], [a]) 
> inner join [nt:unstructured] as [c] on ischildnode([c], [b]) 
> inner join [nt:unstructured] as [d] on ischildnode([d], [c]) 
> inner join [nt:unstructured] as [e] on ischildnode([e], [d]) 
> where [a].[classname] = 'letter' 
> and isdescendantnode([a], '/content') 
> and [c].[classname] = 'chapter' 
> and localname([b]) = 'chapters' 
> and [e].[classname] = 'list' 
> and localname([d]) = 'lists' 
> and [e].[path] = cast('/content/abc' as path)
> {noformat}
> The order of selectors is sometimes wrong (not e, d, c, b, a), but
> more importantly, selectors c and a use the index on className.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-5706) Function based indexes with "like" conditions

2019-11-06 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-5706:

Fix Version/s: (was: 1.20.0)

> Function based indexes with "like" conditions
> -
>
> Key: OAK-5706
> URL: https://issues.apache.org/jira/browse/OAK-5706
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: indexing
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
>
> Currently, a function-based index is not used when using "like" conditions, 
> as follows:
> {noformat}
> /jcr:root//*[jcr:like(fn:lower-case(fn:name()), 'abc%')]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-5739) Misleading traversal warning for spellcheck queries without index

2019-11-06 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-5739:

Fix Version/s: (was: 1.20.0)

> Misleading traversal warning for spellcheck queries without index
> -
>
> Key: OAK-5739
> URL: https://issues.apache.org/jira/browse/OAK-5739
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
>
> In OAK-4313 we avoid traversal for native queries, but we see in some cases 
> traversal warnings as follows:
> {noformat}
> org.apache.jackrabbit.oak.query.QueryImpl query plan 
> [nt:base] as [a] /* traverse "" where (spellcheck([a], 'NothingToFind')) 
> and (issamenode([a], [/])) */
> org.apache.jackrabbit.oak.query.QueryImpl Traversal query (query without 
> index): 
> select [jcr:path], [jcr:score], [rep:spellcheck()] from [nt:base] as a where 
> spellcheck('NothingToFind') 
> and issamenode(a, '/') 
> /* xpath: /jcr:root
> [rep:spellcheck('NothingToFind')]/(rep:spellcheck()) */; 
> consider creating an index
> {noformat}
> This warning is misleading. If no index is available, then either the query 
> should fail, or the warning should say that the query result is not correct 
> because traversal is used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (OAK-5369) Lucene Property Index: Syntax Error, cannot parse

2019-11-06 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller resolved OAK-5369.
-
Resolution: Won't Fix

> Lucene Property Index: Syntax Error, cannot parse
> -
>
> Key: OAK-5369
> URL: https://issues.apache.org/jira/browse/OAK-5369
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.20.0
>
>
> The following query throws an exception in Apache Lucene:
> {noformat}
> /jcr:root//*[jcr:contains(., 'hello -- world')]
> 22.12.2016 16:42:54.511 *WARN* [qtp1944702753-3846] 
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex query via 
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex@1c0006db 
> failed.
> java.lang.RuntimeException: INVALID_SYNTAX_CANNOT_PARSE: Syntax Error, cannot 
> parse hello -- world:  
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex.tokenToQuery(LucenePropertyIndex.java:1450)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex.tokenToQuery(LucenePropertyIndex.java:1418)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex.access$900(LucenePropertyIndex.java:180)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$3.visitTerm(LucenePropertyIndex.java:1353)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$3.visit(LucenePropertyIndex.java:1307)
>   at 
> org.apache.jackrabbit.oak.query.fulltext.FullTextContains.accept(FullTextContains.java:63)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex.getFullTextQuery(LucenePropertyIndex.java:1303)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex.getLuceneRequest(LucenePropertyIndex.java:791)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex.access$300(LucenePropertyIndex.java:180)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$1.loadDocs(LucenePropertyIndex.java:375)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$1.computeNext(LucenePropertyIndex.java:317)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$1.computeNext(LucenePropertyIndex.java:306)
>   at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>   at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LucenePathCursor$1.hasNext(LucenePropertyIndex.java:1571)
>   at com.google.common.collect.Iterators$7.computeNext(Iterators.java:645)
>   at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>   at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>   at 
> org.apache.jackrabbit.oak.spi.query.Cursors$PathCursor.hasNext(Cursors.java:205)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LucenePathCursor.hasNext(LucenePropertyIndex.java:1595)
>   at 
> org.apache.jackrabbit.oak.query.ast.SelectorImpl.next(SelectorImpl.java:420)
>   at 
> org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.fetchNext(QueryImpl.java:828)
>   at 
> org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.hasNext(QueryImpl.java:853)
>   at 
> org.apache.jackrabbit.oak.jcr.query.QueryResultImpl$1.fetch(QueryResultImpl.java:98)
>   at 
> org.apache.jackrabbit.oak.jcr.query.QueryResultImpl$1.(QueryResultImpl.java:94)
>   at 
> org.apache.jackrabbit.oak.jcr.query.QueryResultImpl.getRows(QueryResultImpl.java:78)
> Caused by: 
> org.apache.lucene.queryparser.flexible.standard.parser.ParseException: Syntax 
> Error, cannot parse hello -- world:  
>   at 
> org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.generateParseException(StandardSyntaxParser.java:1054)
>   at 
> org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.jj_consume_token(StandardSyntaxParser.java:936)
>   at 
> org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.Clause(StandardSyntaxParser.java:486)
>   at 
> org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.ModClause(StandardSyntaxParser.java:303)
>   at 
> org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.ConjQuery(StandardSyntaxParser.java:234)
>   at 
> org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.DisjQuery(StandardSyntaxParser.java:204)
>   at 
> org.apache.lucene.queryparser.flexible.st

[jira] [Updated] (OAK-3866) Sorting on relative properties doesn't work in Solr

2019-11-06 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-3866:

Fix Version/s: (was: 1.20.0)

> Sorting on relative properties doesn't work in Solr
> ---
>
> Key: OAK-3866
> URL: https://issues.apache.org/jira/browse/OAK-3866
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: solr
>Affects Versions: 1.0.22, 1.2.9, 1.3.13
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
>Priority: Major
>
> Executing a query like 
> {noformat}
> /jcr:root/content/foo//*[(@sling:resourceType = 'x' or @sling:resourceType = 
> 'y') and jcr:contains(., 'bar*~')] order by jcr:content/@jcr:primaryType 
> descending
> {noformat}
> would assume sorting on the _jcr:primaryType_ property of resulting nodes' 
> _jcr:content_ children.
> That is currently not supported in Solr, while it is in Lucene as the latter 
> supports index time aggregation.
> We should inspect if it's possible to extend support for Solr too, most 
> probably via index time aggregation.
> The query should not fail but at least log a warning about that limitation 
> for the time being.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-3437) Regression in org.apache.jackrabbit.core.query.JoinTest#testJoinWithOR5 when enabling OAK-1617

2019-11-06 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-3437:

Fix Version/s: (was: 1.20.0)

> Regression in org.apache.jackrabbit.core.query.JoinTest#testJoinWithOR5 when 
> enabling OAK-1617
> --
>
> Key: OAK-3437
> URL: https://issues.apache.org/jira/browse/OAK-3437
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: solr
>Reporter: Davide Giannella
>Assignee: Tommaso Teofili
>Priority: Major
>
> When enabling OAK-1617 (still to be committed) there's a regression in the 
> {{oak-solr-core}} unit tests 
> - {{org.apache.jackrabbit.core.query.JoinTest#testJoinWithOR3}} 
> - {{org.apache.jackrabbit.core.query.JoinTest#testJoinWithOR4}} 
> - {{org.apache.jackrabbit.core.query.JoinTest#testJoinWithOR5}} 
> The WIP of the feature can be found in 
> https://github.com/davidegiannella/jackrabbit-oak/tree/OAK-1617 and a full 
> patch will be attached shortly for review in OAK-1617 itself.
> The feature is currently disabled, in order to enable it for unit testing an 
> approach like this can be taken 
> https://github.com/davidegiannella/jackrabbit-oak/blob/177df1a8073b1237857267e23d12a433e3d890a4/oak-core/src/test/java/org/apache/jackrabbit/oak/query/SQL2OptimiseQueryTest.java#L142
>  or setting the system property {{-Doak.query.sql2optimisation}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-6387) Building an index (new index + reindex): temporarily store blob references

2019-11-06 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-6387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-6387:

Fix Version/s: (was: 1.20.0)

> Building an index (new index + reindex): temporarily store blob references
> --
>
> Key: OAK-6387
> URL: https://issues.apache.org/jira/browse/OAK-6387
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene, query
>Reporter: Thomas Mueller
>Priority: Major
>
> If reindexing a Lucene index takes multiple days, and if datastore garbage 
> collection (DSGC) is run during that time, then DSGC may remove binaries of 
> that index because they are not referenced.
> It would be good if all binaries that are needed, and that are older than 
> (for example) one hour, are referenced during reindexing (for example in a 
> temporary location). So that DSGC will not remove them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene

2019-11-06 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-6597:

Fix Version/s: (was: 1.20.0)

> rep:excerpt not working for content indexed by aggregation in lucene
> 
>
> Key: OAK-6597
> URL: https://issues.apache.org/jira/browse/OAK-6597
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1, 1.7.6, 1.8.0
>Reporter: Dirk Rudolph
>Assignee: Chetan Mehrotra
>Priority: Major
>  Labels: excerpt
> Attachments: excerpt-with-aggregation-test.patch
>
>
> I mentioned that properties that got indexed due to an aggregation are not 
> considered for excerpts (highlighting) as they are not indexed as stored 
> fields.
> See the attached patch that implements a test for excerpts in 
> {{LuceneIndexAggregationTest2}}.
> It creates the following structure:
> {code}
> /content/foo [test:Page]
>  + bar (String)
>  - jcr:content [test:PageContent]
>   + bar (String)
> {code}
> where both strings (the _bar_ property at _foo_ and the _bar_ property at 
> _jcr:content_) contain different text. 
> Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in 
> _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the 
> former one the excerpt is properly provided for the later one it isn't.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-7166) Union with different selector names

2019-11-06 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-7166:

Fix Version/s: (was: 1.20.0)

> Union with different selector names
> ---
>
> Key: OAK-7166
> URL: https://issues.apache.org/jira/browse/OAK-7166
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: query
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
>
> The following query returns the wrong nodes:
> {noformat}
> /jcr:root/libs/(* | */* | */*/* | */*/*/* | */*/*/*/*)/install
> select b.[jcr:path] as [jcr:path], b.[jcr:score] as [jcr:score], b.* from 
> [nt:base] as a
>  inner join [nt:base] as b on ischildnode(b, a)
>  where ischildnode(a, '/libs') and name(b) = 'install' 
>  union select c.[jcr:path] as [jcr:path], c.[jcr:score] as [jcr:score], c.* 
> from [nt:base] as a
>  inner join [nt:base] as b on ischildnode(b, a)
>  inner join [nt:base] as c on ischildnode(c, b)
>  where ischildnode(a, '/libs') and name(c) = 'install' 
>  union select d.[jcr:path] as [jcr:path], d.[jcr:score] as [jcr:score], d.* 
> from [nt:base] as a
>  inner join [nt:base] as b on ischildnode(b, a)
>  inner join [nt:base] as c on ischildnode(c, b)
>  inner join [nt:base] as d on ischildnode(d, c)
>  where ischildnode(a, '/libs') and name(d) = 'install' 
> {noformat}
> If I change the selector name to "x" in each subquery, then it works. There 
> is no XPath version of this workaround:
> {noformat}
> select x.[jcr:path] as [jcr:path], x.[jcr:score] as [jcr:score], x.* from 
> [nt:base] as a
>  inner join [nt:base] as x on ischildnode(x, a)
>  where ischildnode(a, '/libs') and name(x) = 'install' 
>  union select x.[jcr:path] as [jcr:path], x.[jcr:score] as [jcr:score], x.* 
> from [nt:base] as a
>  inner join [nt:base] as b on ischildnode(b, a)
>  inner join [nt:base] as x on ischildnode(x, b)
>  where ischildnode(a, '/libs') and name(x) = 'install' 
>  union select x.[jcr:path] as [jcr:path], x.[jcr:score] as [jcr:score], x.* 
> from [nt:base] as a
>  inner join [nt:base] as b on ischildnode(b, a)
>  inner join [nt:base] as c on ischildnode(c, b)
>  inner join [nt:base] as x on ischildnode(x, c)
>  where ischildnode(a, '/libs') and name(x) = 'install' 
> {noformat}
> Need to check if this is a Oak bug, or a bug in the query tool I use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-7263) oak-lucene should not depend on oak-store-document

2019-11-06 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-7263:

Fix Version/s: (was: 1.20.0)

> oak-lucene should not depend on oak-store-document
> --
>
> Key: OAK-7263
> URL: https://issues.apache.org/jira/browse/OAK-7263
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Reporter: Robert Munteanu
>Priority: Major
>
> {{oak-lucene}} has a hard dependency on {{oak-store-document}} and that looks 
> wrong to me. 
> {noformat}[ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.7.0:compile 
> (default-compile) on project oak-lucene: Compilation failure: Compilation 
> failure: 
> [ERROR] 
> /home/robert/Documents/sources/apache/jackrabbit-oak/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/hybrid/LuceneDocumentHolder.java:[31,54]
>  package org.apache.jackrabbit.oak.plugins.document.spi does not exist
> [ERROR] 
> /home/robert/Documents/sources/apache/jackrabbit-oak/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/hybrid/LuceneDocumentHolder.java:[37,46]
>  cannot find symbol
> [ERROR]   symbol: class JournalProperty
> [ERROR] 
> /home/robert/Documents/sources/apache/jackrabbit-oak/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/hybrid/LuceneJournalPropertyBuilder.java:[33,54]
>  package org.apache.jackrabbit.oak.plugins.document.spi does not exist
> [ERROR] 
> /home/robert/Documents/sources/apache/jackrabbit-oak/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/hybrid/LuceneJournalPropertyBuilder.java:[34,54]
>  package org.apache.jackrabbit.oak.plugins.document.spi does not exist
> [ERROR] 
> /home/robert/Documents/sources/apache/jackrabbit-oak/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/hybrid/LuceneJournalPropertyBuilder.java:[38,47]
>  cannot find symbol
> [ERROR]   symbol: class JournalPropertyBuilder
> [ERROR] 
> /home/robert/Documents/sources/apache/jackrabbit-oak/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/hybrid/LuceneJournalPropertyBuilder.java:[106,12]
>  cannot find symbol
> [ERROR]   symbol:   class JournalProperty
> [ERROR]   location: class 
> org.apache.jackrabbit.oak.plugins.index.lucene.hybrid.LuceneJournalPropertyBuilder
> [ERROR] 
> /home/robert/Documents/sources/apache/jackrabbit-oak/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexProviderService.java:[55,54]
>  package org.apache.jackrabbit.oak.plugins.document.spi does not exist
> [ERROR] 
> /home/robert/Documents/sources/apache/jackrabbit-oak/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/hybrid/IndexedPaths.java:[29,54]
>  package org.apache.jackrabbit.oak.plugins.document.spi does not exist
> [ERROR] 
> /home/robert/Documents/sources/apache/jackrabbit-oak/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/hybrid/IndexedPaths.java:[33,31]
>  cannot find symbol
> [ERROR]   symbol: class JournalProperty
> [ERROR] 
> /home/robert/Documents/sources/apache/jackrabbit-oak/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/hybrid/LuceneJournalPropertyService.java:[22,54]
>  package org.apache.jackrabbit.oak.plugins.document.spi does not exist
> [ERROR] 
> /home/robert/Documents/sources/apache/jackrabbit-oak/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/hybrid/LuceneJournalPropertyService.java:[23,54]
>  package org.apache.jackrabbit.oak.plugins.document.spi does not exist
> [ERROR] 
> /home/robert/Documents/sources/apache/jackrabbit-oak/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/hybrid/LuceneJournalPropertyService.java:[25,54]
>  cannot find symbol
> [ERROR]   symbol: class JournalPropertyService
> [ERROR] 
> /home/robert/Documents/sources/apache/jackrabbit-oak/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/hybrid/LuceneJournalPropertyService.java:[33,12]
>  cannot find symbol
> [ERROR]   symbol:   class JournalPropertyBuilder
> [ERROR]   location: class 
> org.apache.jackrabbit.oak.plugins.index.lucene.hybrid.LuceneJournalPropertyService
> [ERROR] 
> /home/robert/Documents/sources/apache/jackrabbit-oak/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/hybrid/LuceneJournalPropertyBuilder.java:[50,5]
>  method does not override or implement a method from a supertype
> [ERROR] 
> /home/robert/Documents/sources/apache/jackrabbit-oak/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/hybrid/LuceneJournalPropertyBuilder.java:[61,5]
>  method does not override or implement a method from a supertype
> [ERROR] 
> /home/robert/Documents/sources/apache/jackrabbit-oak/oak-

[jira] [Commented] (OAK-7370) order by jcr:score desc doesn't work across union query created by optimizing OR clauses

2019-11-06 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-7370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968356#comment-16968356
 ] 

Thomas Mueller commented on OAK-7370:
-

Thanks [~catholicon]! I removed the fix version.

> order by jcr:score desc doesn't work across union query created by optimizing 
> OR clauses
> 
>
> Key: OAK-7370
> URL: https://issues.apache.org/jira/browse/OAK-7370
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Reporter: Vikas Saurabh
>Assignee: Thomas Mueller
>Priority: Major
>
> Merging of sub-queries created due to optimizing OR clauses doesn't work for 
> sorting on {{jcr:score}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-7370) order by jcr:score desc doesn't work across union query created by optimizing OR clauses

2019-11-06 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-7370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-7370:

Fix Version/s: (was: 1.20.0)

> order by jcr:score desc doesn't work across union query created by optimizing 
> OR clauses
> 
>
> Key: OAK-7370
> URL: https://issues.apache.org/jira/browse/OAK-7370
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Reporter: Vikas Saurabh
>Assignee: Thomas Mueller
>Priority: Major
>
> Merging of sub-queries created due to optimizing OR clauses doesn't work for 
> sorting on {{jcr:score}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8673) Determine and possibly adjust size of eagerCacheSize

2019-11-06 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968163#comment-16968163
 ] 

Thomas Mueller commented on OAK-8673:
-

[~angela] Thanks! One more question: In the issue description, you write "we 
almost never benefit from the lazy permission evaluation (compared to reading 
all permission entries right away)". I assume you mean lazy permission 
evaluation isn't _faster_ than reading all permission entries right away, 
right? If so, is it a lot _slower_? There are two points I want to make:
* We should understand why it does / does not impact performance - this is 
important to be able to have a somewhat accurate mental model
* Maybe it has an impact on memory usage? So we could say let's keep lazy 
evaluation to save memory? How much?

If the answer is: lazy evaluation doesn't save any memory and doesn't have any 
memory impact, then we can probably simplify the code (to never or always do 
lazy evaluation, whatever is simpler).

> Determine and possibly adjust size of eagerCacheSize
> 
>
> Key: OAK-8673
> URL: https://issues.apache.org/jira/browse/OAK-8673
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: core, security
>Reporter: Angela Schreiber
>Assignee: Angela Schreiber
>Priority: Major
>
> The initial results of the {{EagerCacheSizeTest}} seem to indicate that we 
> almost never benefit from the lazy permission evaluation (compared to reading 
> all permission entries right away). From my understanding of the results the 
> only exception are those cases where only very few items are being accessed 
> (e.g. reading 100 items).
> However, I am not totally sure if this is not a artifact of the random-read. 
> I therefore started extending the benchmark with an option to re-read a 
> randomly picked item more that once, which according to some analysis done 
> quite some time ago is a common scenario specially when using Oak in 
> combination with Apache Sling.
> Benchmarks with 10-times re-reading the same random item:
> As I would have expected it seems that the negative impact of lazy-loading is 
> somewhat reduced, as the re-reading will hit the cache populated while 
> reading.
> Result are attached to OAK-8662 (possibly more to come).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8162) When query with OR is divided into union of queries, options (like index tag) are not passed into subqueries.

2019-11-01 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964823#comment-16964823
 ] 

Thomas Mueller commented on OAK-8162:
-

[~reschke] you are right, it would be good to backport this to Oak 1.10 and 
1.8. I don't think Oak 1.6 is needed, as it doesn't support index tags. Do you 
want me to do this?

> When query with OR is divided into union of queries, options (like index tag) 
> are not passed into subqueries. 
> --
>
> Key: OAK-8162
> URL: https://issues.apache.org/jira/browse/OAK-8162
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.10.2, 1.8.17
>Reporter: Piotr Tajduś
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.14.0
>
>
> When query with OR is divided into union of queries, options (like index tag) 
> are not passed into subqueries - in effect alternative query  sometimes f.e. 
> uses indexes it shouldn't use.
>  {noformat}
> org.apache.jackrabbit.oak.query.QueryImpl.buildAlternativeQuery()
> org.apache.jackrabbit.oak.query.QueryImpl.copyOf()
>  
> 2019-03-21 16:32:25,600 DEBUG 
> [org.apache.jackrabbit.oak.query.QueryEngineImpl] (default task-1) Parsing 
> JCR-SQL2 statement: select distinct d.* from [crkid:document] as d where 
> ([d].[metadane/inneMetadane/*/wartosc] = 'AX' and 
> [d].[metadane/inneMetadane/*/klucz] = 'InnyKod') or 
> ([d].[metadane/inneMetadane/*/wartosc] = 'AB' and 
> [d].[metadane/inneMetadane/*/klucz] = 'InnyKod') option(index tag 
> crkid_dokument_month_2019_3)
> 2019-03-21 16:32:25,607 DEBUG [org.apache.jackrabbit.oak.query.QueryImpl] 
> (default task-1) cost using filter Filter(query=select distinct d.* from 
> [crkid:document] as d where ([d].[metadane/inneMetadane/*/wartosc] = 'AB') 
> and ([d].[metadane/inneMetadane/*/klucz] = 'InnyKod'), path=*, 
> property=[metadane/inneMetadane/*/klucz=[InnyKod], 
> metadane/inneMetadane/*/wartosc=[AB]])
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-8162) When query with OR is divided into union of queries, options (like index tag) are not passed into subqueries.

2019-11-01 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-8162:

Labels: candidate_oak_1_10 candidate_oak_1_8  (was: )

> When query with OR is divided into union of queries, options (like index tag) 
> are not passed into subqueries. 
> --
>
> Key: OAK-8162
> URL: https://issues.apache.org/jira/browse/OAK-8162
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.10.2, 1.8.17
>Reporter: Piotr Tajduś
>Assignee: Thomas Mueller
>Priority: Major
>  Labels: candidate_oak_1_10, candidate_oak_1_8
> Fix For: 1.14.0
>
>
> When query with OR is divided into union of queries, options (like index tag) 
> are not passed into subqueries - in effect alternative query  sometimes f.e. 
> uses indexes it shouldn't use.
>  {noformat}
> org.apache.jackrabbit.oak.query.QueryImpl.buildAlternativeQuery()
> org.apache.jackrabbit.oak.query.QueryImpl.copyOf()
>  
> 2019-03-21 16:32:25,600 DEBUG 
> [org.apache.jackrabbit.oak.query.QueryEngineImpl] (default task-1) Parsing 
> JCR-SQL2 statement: select distinct d.* from [crkid:document] as d where 
> ([d].[metadane/inneMetadane/*/wartosc] = 'AX' and 
> [d].[metadane/inneMetadane/*/klucz] = 'InnyKod') or 
> ([d].[metadane/inneMetadane/*/wartosc] = 'AB' and 
> [d].[metadane/inneMetadane/*/klucz] = 'InnyKod') option(index tag 
> crkid_dokument_month_2019_3)
> 2019-03-21 16:32:25,607 DEBUG [org.apache.jackrabbit.oak.query.QueryImpl] 
> (default task-1) cost using filter Filter(query=select distinct d.* from 
> [crkid:document] as d where ([d].[metadane/inneMetadane/*/wartosc] = 'AB') 
> and ([d].[metadane/inneMetadane/*/klucz] = 'InnyKod'), path=*, 
> property=[metadane/inneMetadane/*/klucz=[InnyKod], 
> metadane/inneMetadane/*/wartosc=[AB]])
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-8162) When query with OR is divided into union of queries, options (like index tag) are not passed into subqueries.

2019-11-01 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-8162:

Affects Version/s: (was: 1.6.18)

> When query with OR is divided into union of queries, options (like index tag) 
> are not passed into subqueries. 
> --
>
> Key: OAK-8162
> URL: https://issues.apache.org/jira/browse/OAK-8162
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.10.2, 1.8.17
>Reporter: Piotr Tajduś
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.14.0
>
>
> When query with OR is divided into union of queries, options (like index tag) 
> are not passed into subqueries - in effect alternative query  sometimes f.e. 
> uses indexes it shouldn't use.
>  {noformat}
> org.apache.jackrabbit.oak.query.QueryImpl.buildAlternativeQuery()
> org.apache.jackrabbit.oak.query.QueryImpl.copyOf()
>  
> 2019-03-21 16:32:25,600 DEBUG 
> [org.apache.jackrabbit.oak.query.QueryEngineImpl] (default task-1) Parsing 
> JCR-SQL2 statement: select distinct d.* from [crkid:document] as d where 
> ([d].[metadane/inneMetadane/*/wartosc] = 'AX' and 
> [d].[metadane/inneMetadane/*/klucz] = 'InnyKod') or 
> ([d].[metadane/inneMetadane/*/wartosc] = 'AB' and 
> [d].[metadane/inneMetadane/*/klucz] = 'InnyKod') option(index tag 
> crkid_dokument_month_2019_3)
> 2019-03-21 16:32:25,607 DEBUG [org.apache.jackrabbit.oak.query.QueryImpl] 
> (default task-1) cost using filter Filter(query=select distinct d.* from 
> [crkid:document] as d where ([d].[metadane/inneMetadane/*/wartosc] = 'AB') 
> and ([d].[metadane/inneMetadane/*/klucz] = 'InnyKod'), path=*, 
> property=[metadane/inneMetadane/*/klucz=[InnyKod], 
> metadane/inneMetadane/*/wartosc=[AB]])
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-8162) When query with OR is divided into union of queries, options (like index tag) are not passed into subqueries.

2019-11-01 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-8162:

Affects Version/s: 1.6.18
   1.8.17

> When query with OR is divided into union of queries, options (like index tag) 
> are not passed into subqueries. 
> --
>
> Key: OAK-8162
> URL: https://issues.apache.org/jira/browse/OAK-8162
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.10.2, 1.6.18, 1.8.17
>Reporter: Piotr Tajduś
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.14.0
>
>
> When query with OR is divided into union of queries, options (like index tag) 
> are not passed into subqueries - in effect alternative query  sometimes f.e. 
> uses indexes it shouldn't use.
>  {noformat}
> org.apache.jackrabbit.oak.query.QueryImpl.buildAlternativeQuery()
> org.apache.jackrabbit.oak.query.QueryImpl.copyOf()
>  
> 2019-03-21 16:32:25,600 DEBUG 
> [org.apache.jackrabbit.oak.query.QueryEngineImpl] (default task-1) Parsing 
> JCR-SQL2 statement: select distinct d.* from [crkid:document] as d where 
> ([d].[metadane/inneMetadane/*/wartosc] = 'AX' and 
> [d].[metadane/inneMetadane/*/klucz] = 'InnyKod') or 
> ([d].[metadane/inneMetadane/*/wartosc] = 'AB' and 
> [d].[metadane/inneMetadane/*/klucz] = 'InnyKod') option(index tag 
> crkid_dokument_month_2019_3)
> 2019-03-21 16:32:25,607 DEBUG [org.apache.jackrabbit.oak.query.QueryImpl] 
> (default task-1) cost using filter Filter(query=select distinct d.* from 
> [crkid:document] as d where ([d].[metadane/inneMetadane/*/wartosc] = 'AB') 
> and ([d].[metadane/inneMetadane/*/klucz] = 'InnyKod'), path=*, 
> property=[metadane/inneMetadane/*/klucz=[InnyKod], 
> metadane/inneMetadane/*/wartosc=[AB]])
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8673) Determine and possibly adjust size of eagerCacheSize

2019-11-01 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964764#comment-16964764
 ] 

Thomas Mueller commented on OAK-8673:
-

> 0 should be possible can run those in addition

I would probably do that, and check if it really works as expected (the cache 
is really empty). Or maybe hardcode some logic that means if 0, then don't use 
the cache (might be a bit hard).

> the lazy-loading doesn't seems to have a beneficial effect (except for 
> reading really few items, which in AEM is rarely the case)

Do you assume that with a small EagerCacheSize, lazy loading isn't used at all? 
I don't know the code, but it sounds like it's better to somehow disable the 
lazy loading logic, in order to be sure it's not used by some unexpected code 
path.

> Determine and possibly adjust size of eagerCacheSize
> 
>
> Key: OAK-8673
> URL: https://issues.apache.org/jira/browse/OAK-8673
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: core, security
>Reporter: Angela Schreiber
>Assignee: Angela Schreiber
>Priority: Major
>
> The initial results of the {{EagerCacheSizeTest}} seem to indicate that we 
> almost never benefit from the lazy permission evaluation (compared to reading 
> all permission entries right away). From my understanding of the results the 
> only exception are those cases where only very few items are being accessed 
> (e.g. reading 100 items).
> However, I am not totally sure if this is not a artifact of the random-read. 
> I therefore started extending the benchmark with an option to re-read a 
> randomly picked item more that once, which according to some analysis done 
> quite some time ago is a common scenario specially when using Oak in 
> combination with Apache Sling.
> Result are attached to OAK-8662 (possibly more to come).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8729) Lucene Directory concurrency issue

2019-11-01 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964741#comment-16964741
 ] 

Thomas Mueller commented on OAK-8729:
-

I tried writing a special test case, but it is not easy... I could sometimes 
reproduce the issue, but only if the existing test is run many times, and only 
when instrumenting the MemoryNodeBuilder.

> Lucene Directory concurrency issue
> --
>
> Key: OAK-8729
> URL: https://issues.apache.org/jira/browse/OAK-8729
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Affects Versions: 1.12.0, 1.14.0, 1.16.0, 1.18.0
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.20.0
>
> Attachments: OAK-8729.patch
>
>
> There is a concurrency issue in the DefaultDirectoryFactory. It is 
> reproducible sometimes using CopyOnWriteDirectoryTest.copyOnWrite(), if run 
> in a loop (1000 times). The problem is that the MemoryNodeBuilder is used 
> concurrently:
> * thread 1 is closing the directory (after writing to it)
> * thread 2 is trying to create a new file
> {noformat}
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.exists(MemoryNodeBuilder.java:284)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setProperty(MemoryNodeBuilder.java:525)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.OakDirectory.close(OakDirectory.java:264)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.BufferedOakDirectory.close(BufferedOakDirectory.java:217)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnReadDirectory$2.run(CopyOnReadDirectory.java:305)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.exists(MemoryNodeBuilder.java:284)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:362)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:356)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.child(MemoryNodeBuilder.java:342)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.OakDirectory.createOutput(OakDirectory.java:214)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.BufferedOakDirectory.createOutput(BufferedOakDirectory.java:178)
>   at org.apache.lucene.store.Directory.copy(Directory.java:184)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$3.call(CopyOnWriteDirectory.java:322)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$3.call(CopyOnWriteDirectory.java:1)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$2$1.call(CopyOnWriteDirectory.java:105)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$2$1.call(CopyOnWriteDirectory.java:1)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8729) Lucene Directory concurrency issue

2019-11-01 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964739#comment-16964739
 ] 

Thomas Mueller commented on OAK-8729:
-

Attached a patch for review, [~catholicon] [~nitigupt][~tihom88].

> Lucene Directory concurrency issue
> --
>
> Key: OAK-8729
> URL: https://issues.apache.org/jira/browse/OAK-8729
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Affects Versions: 1.12.0, 1.14.0, 1.16.0, 1.18.0
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.20.0
>
> Attachments: OAK-8729.patch
>
>
> There is a concurrency issue in the DefaultDirectoryFactory. It is 
> reproducible sometimes using CopyOnWriteDirectoryTest.copyOnWrite(), if run 
> in a loop (1000 times). The problem is that the MemoryNodeBuilder is used 
> concurrently:
> * thread 1 is closing the directory (after writing to it)
> * thread 2 is trying to create a new file
> {noformat}
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.exists(MemoryNodeBuilder.java:284)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setProperty(MemoryNodeBuilder.java:525)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.OakDirectory.close(OakDirectory.java:264)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.BufferedOakDirectory.close(BufferedOakDirectory.java:217)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnReadDirectory$2.run(CopyOnReadDirectory.java:305)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.exists(MemoryNodeBuilder.java:284)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:362)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:356)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.child(MemoryNodeBuilder.java:342)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.OakDirectory.createOutput(OakDirectory.java:214)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.BufferedOakDirectory.createOutput(BufferedOakDirectory.java:178)
>   at org.apache.lucene.store.Directory.copy(Directory.java:184)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$3.call(CopyOnWriteDirectory.java:322)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$3.call(CopyOnWriteDirectory.java:1)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$2$1.call(CopyOnWriteDirectory.java:105)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$2$1.call(CopyOnWriteDirectory.java:1)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-8729) Lucene Directory concurrency issue

2019-11-01 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-8729:

Attachment: OAK-8729.patch

> Lucene Directory concurrency issue
> --
>
> Key: OAK-8729
> URL: https://issues.apache.org/jira/browse/OAK-8729
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Affects Versions: 1.12.0, 1.14.0, 1.16.0, 1.18.0
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.20.0
>
> Attachments: OAK-8729.patch
>
>
> There is a concurrency issue in the DefaultDirectoryFactory. It is 
> reproducible sometimes using CopyOnWriteDirectoryTest.copyOnWrite(), if run 
> in a loop (1000 times). The problem is that the MemoryNodeBuilder is used 
> concurrently:
> * thread 1 is closing the directory (after writing to it)
> * thread 2 is trying to create a new file
> {noformat}
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.exists(MemoryNodeBuilder.java:284)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setProperty(MemoryNodeBuilder.java:525)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.OakDirectory.close(OakDirectory.java:264)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.BufferedOakDirectory.close(BufferedOakDirectory.java:217)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnReadDirectory$2.run(CopyOnReadDirectory.java:305)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.exists(MemoryNodeBuilder.java:284)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:362)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:356)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.child(MemoryNodeBuilder.java:342)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.OakDirectory.createOutput(OakDirectory.java:214)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.BufferedOakDirectory.createOutput(BufferedOakDirectory.java:178)
>   at org.apache.lucene.store.Directory.copy(Directory.java:184)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$3.call(CopyOnWriteDirectory.java:322)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$3.call(CopyOnWriteDirectory.java:1)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$2$1.call(CopyOnWriteDirectory.java:105)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$2$1.call(CopyOnWriteDirectory.java:1)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-8729) Lucene Directory concurrency issue

2019-11-01 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-8729:

Affects Version/s: 1.12.0
   1.14.0
   1.16.0
   1.18.0

> Lucene Directory concurrency issue
> --
>
> Key: OAK-8729
> URL: https://issues.apache.org/jira/browse/OAK-8729
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Affects Versions: 1.12.0, 1.14.0, 1.16.0, 1.18.0
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
>
> There is a concurrency issue in the DefaultDirectoryFactory. It is 
> reproducible sometimes using CopyOnWriteDirectoryTest.copyOnWrite(), if run 
> in a loop (1000 times). The problem is that the MemoryNodeBuilder is used 
> concurrently:
> * thread 1 is closing the directory (after writing to it)
> * thread 2 is trying to create a new file
> {noformat}
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.exists(MemoryNodeBuilder.java:284)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setProperty(MemoryNodeBuilder.java:525)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.OakDirectory.close(OakDirectory.java:264)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.BufferedOakDirectory.close(BufferedOakDirectory.java:217)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnReadDirectory$2.run(CopyOnReadDirectory.java:305)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.exists(MemoryNodeBuilder.java:284)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:362)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:356)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.child(MemoryNodeBuilder.java:342)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.OakDirectory.createOutput(OakDirectory.java:214)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.BufferedOakDirectory.createOutput(BufferedOakDirectory.java:178)
>   at org.apache.lucene.store.Directory.copy(Directory.java:184)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$3.call(CopyOnWriteDirectory.java:322)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$3.call(CopyOnWriteDirectory.java:1)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$2$1.call(CopyOnWriteDirectory.java:105)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$2$1.call(CopyOnWriteDirectory.java:1)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-8729) Lucene Directory concurrency issue

2019-11-01 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-8729:

Fix Version/s: 1.20.0

> Lucene Directory concurrency issue
> --
>
> Key: OAK-8729
> URL: https://issues.apache.org/jira/browse/OAK-8729
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Affects Versions: 1.12.0, 1.14.0, 1.16.0, 1.18.0
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.20.0
>
>
> There is a concurrency issue in the DefaultDirectoryFactory. It is 
> reproducible sometimes using CopyOnWriteDirectoryTest.copyOnWrite(), if run 
> in a loop (1000 times). The problem is that the MemoryNodeBuilder is used 
> concurrently:
> * thread 1 is closing the directory (after writing to it)
> * thread 2 is trying to create a new file
> {noformat}
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.exists(MemoryNodeBuilder.java:284)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setProperty(MemoryNodeBuilder.java:525)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.OakDirectory.close(OakDirectory.java:264)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.BufferedOakDirectory.close(BufferedOakDirectory.java:217)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnReadDirectory$2.run(CopyOnReadDirectory.java:305)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.exists(MemoryNodeBuilder.java:284)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:362)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:356)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.child(MemoryNodeBuilder.java:342)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.OakDirectory.createOutput(OakDirectory.java:214)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.BufferedOakDirectory.createOutput(BufferedOakDirectory.java:178)
>   at org.apache.lucene.store.Directory.copy(Directory.java:184)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$3.call(CopyOnWriteDirectory.java:322)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$3.call(CopyOnWriteDirectory.java:1)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$2$1.call(CopyOnWriteDirectory.java:105)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$2$1.call(CopyOnWriteDirectory.java:1)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (OAK-8729) Lucene Directory concurrency issue

2019-11-01 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller reassigned OAK-8729:
---

Assignee: Thomas Mueller

> Lucene Directory concurrency issue
> --
>
> Key: OAK-8729
> URL: https://issues.apache.org/jira/browse/OAK-8729
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Reporter: Thomas Mueller
>Assignee: Thomas Mueller
>Priority: Major
>
> There is a concurrency issue in the DefaultDirectoryFactory. It is 
> reproducible sometimes using CopyOnWriteDirectoryTest.copyOnWrite(), if run 
> in a loop (1000 times). The problem is that the MemoryNodeBuilder is used 
> concurrently:
> * thread 1 is closing the directory (after writing to it)
> * thread 2 is trying to create a new file
> {noformat}
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.exists(MemoryNodeBuilder.java:284)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setProperty(MemoryNodeBuilder.java:525)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.OakDirectory.close(OakDirectory.java:264)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.BufferedOakDirectory.close(BufferedOakDirectory.java:217)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnReadDirectory$2.run(CopyOnReadDirectory.java:305)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.exists(MemoryNodeBuilder.java:284)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:362)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:356)
>   at 
> org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.child(MemoryNodeBuilder.java:342)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.OakDirectory.createOutput(OakDirectory.java:214)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.BufferedOakDirectory.createOutput(BufferedOakDirectory.java:178)
>   at org.apache.lucene.store.Directory.copy(Directory.java:184)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$3.call(CopyOnWriteDirectory.java:322)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$3.call(CopyOnWriteDirectory.java:1)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$2$1.call(CopyOnWriteDirectory.java:105)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$2$1.call(CopyOnWriteDirectory.java:1)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8673) Determine and possibly adjust size of eagerCacheSize

2019-11-01 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964729#comment-16964729
 ] 

Thomas Mueller commented on OAK-8673:
-

> the threshold to move from eagerly-loading all permission entries to lazy 
> loading is defined by the EagerCacheSize.

So, maybe test with EagerCacheSize = 0, or (if that's not possible) 1?

> Determine and possibly adjust size of eagerCacheSize
> 
>
> Key: OAK-8673
> URL: https://issues.apache.org/jira/browse/OAK-8673
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: core, security
>Reporter: Angela Schreiber
>Assignee: Angela Schreiber
>Priority: Major
>
> The initial results of the {{EagerCacheSizeTest}} seem to indicate that we 
> almost never benefit from the lazy permission evaluation (compared to reading 
> all permission entries right away). From my understanding of the results the 
> only exception are those cases where only very few items are being accessed 
> (e.g. reading 100 items).
> However, I am not totally sure if this is not a artifact of the random-read. 
> I therefore started extending the benchmark with an option to re-read a 
> randomly picked item more that once, which according to some analysis done 
> quite some time ago is a common scenario specially when using Oak in 
> combination with Apache Sling.
> Result are attached to OAK-8662 (possibly more to come).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-8673) Determine and possibly adjust size of eagerCacheSize

2019-11-01 Thread Thomas Mueller (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964711#comment-16964711
 ] 

Thomas Mueller commented on OAK-8673:
-

> we almost never benefit from the lazy permission evaluation (compared to 
> reading all permission entries right away). 

Just to make sure: It sounds like "lazy permission evaluation disabled" means 
"reading all permission entries right away"... right? And then it sounds like 
you consider disabling lazy permission evaluation?

Which benchmark results show data for "lazy permission evaluation disabled", 
and which results show results for "lazy permission evaluation enabled"? I only 
see different settings for 

* Items to Read
* Repeat Read
* Number of ACEs
* Number of Principals
* EagerCacheSize


> Determine and possibly adjust size of eagerCacheSize
> 
>
> Key: OAK-8673
> URL: https://issues.apache.org/jira/browse/OAK-8673
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: core, security
>Reporter: Angela Schreiber
>Assignee: Angela Schreiber
>Priority: Major
>
> The initial results of the {{EagerCacheSizeTest}} seem to indicate that we 
> almost never benefit from the lazy permission evaluation (compared to reading 
> all permission entries right away). From my understanding of the results the 
> only exception are those cases where only very few items are being accessed 
> (e.g. reading 100 items).
> However, I am not totally sure if this is not a artifact of the random-read. 
> I therefore started extending the benchmark with an option to re-read a 
> randomly picked item more that once, which according to some analysis done 
> quite some time ago is a common scenario specially when using Oak in 
> combination with Apache Sling.
> Result are attached to OAK-8662 (possibly more to come).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (OAK-8729) Lucene Directory concurrency issue

2019-10-31 Thread Thomas Mueller (Jira)
Thomas Mueller created OAK-8729:
---

 Summary: Lucene Directory concurrency issue
 Key: OAK-8729
 URL: https://issues.apache.org/jira/browse/OAK-8729
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: lucene
Reporter: Thomas Mueller


There is a concurrency issue in the DefaultDirectoryFactory. It is reproducible 
sometimes using CopyOnWriteDirectoryTest.copyOnWrite(), if run in a loop (1000 
times). The problem is that the MemoryNodeBuilder is used concurrently:

* thread 1 is closing the directory (after writing to it)
* thread 2 is trying to create a new file

{noformat}
at 
org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.exists(MemoryNodeBuilder.java:284)
at 
org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setProperty(MemoryNodeBuilder.java:525)
at 
org.apache.jackrabbit.oak.plugins.index.lucene.directory.OakDirectory.close(OakDirectory.java:264)
at 
org.apache.jackrabbit.oak.plugins.index.lucene.directory.BufferedOakDirectory.close(BufferedOakDirectory.java:217)
at 
org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnReadDirectory$2.run(CopyOnReadDirectory.java:305)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

at 
org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.exists(MemoryNodeBuilder.java:284)
at 
org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:362)
at 
org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.setChildNode(MemoryNodeBuilder.java:356)
at 
org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder.child(MemoryNodeBuilder.java:342)
at 
org.apache.jackrabbit.oak.plugins.index.lucene.directory.OakDirectory.createOutput(OakDirectory.java:214)
at 
org.apache.jackrabbit.oak.plugins.index.lucene.directory.BufferedOakDirectory.createOutput(BufferedOakDirectory.java:178)
at org.apache.lucene.store.Directory.copy(Directory.java:184)
at 
org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$3.call(CopyOnWriteDirectory.java:322)
at 
org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$3.call(CopyOnWriteDirectory.java:1)
at 
org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$2$1.call(CopyOnWriteDirectory.java:105)
at 
org.apache.jackrabbit.oak.plugins.index.lucene.directory.CopyOnWriteDirectory$2$1.call(CopyOnWriteDirectory.java:1)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (OAK-8162) When query with OR is divided into union of queries, options (like index tag) are not passed into subqueries.

2019-10-31 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller resolved OAK-8162.
-
Resolution: Fixed

Yes, this is fixed. I also change the fix version to 1.14.

> When query with OR is divided into union of queries, options (like index tag) 
> are not passed into subqueries. 
> --
>
> Key: OAK-8162
> URL: https://issues.apache.org/jira/browse/OAK-8162
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.10.2
>Reporter: Piotr Tajduś
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.14.0
>
>
> When query with OR is divided into union of queries, options (like index tag) 
> are not passed into subqueries - in effect alternative query  sometimes f.e. 
> uses indexes it shouldn't use.
>  {noformat}
> org.apache.jackrabbit.oak.query.QueryImpl.buildAlternativeQuery()
> org.apache.jackrabbit.oak.query.QueryImpl.copyOf()
>  
> 2019-03-21 16:32:25,600 DEBUG 
> [org.apache.jackrabbit.oak.query.QueryEngineImpl] (default task-1) Parsing 
> JCR-SQL2 statement: select distinct d.* from [crkid:document] as d where 
> ([d].[metadane/inneMetadane/*/wartosc] = 'AX' and 
> [d].[metadane/inneMetadane/*/klucz] = 'InnyKod') or 
> ([d].[metadane/inneMetadane/*/wartosc] = 'AB' and 
> [d].[metadane/inneMetadane/*/klucz] = 'InnyKod') option(index tag 
> crkid_dokument_month_2019_3)
> 2019-03-21 16:32:25,607 DEBUG [org.apache.jackrabbit.oak.query.QueryImpl] 
> (default task-1) cost using filter Filter(query=select distinct d.* from 
> [crkid:document] as d where ([d].[metadane/inneMetadane/*/wartosc] = 'AB') 
> and ([d].[metadane/inneMetadane/*/klucz] = 'InnyKod'), path=*, 
> property=[metadane/inneMetadane/*/klucz=[InnyKod], 
> metadane/inneMetadane/*/wartosc=[AB]])
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-8162) When query with OR is divided into union of queries, options (like index tag) are not passed into subqueries.

2019-10-31 Thread Thomas Mueller (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated OAK-8162:

Fix Version/s: (was: 1.20.0)
   1.14.0

> When query with OR is divided into union of queries, options (like index tag) 
> are not passed into subqueries. 
> --
>
> Key: OAK-8162
> URL: https://issues.apache.org/jira/browse/OAK-8162
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.10.2
>Reporter: Piotr Tajduś
>Assignee: Thomas Mueller
>Priority: Major
> Fix For: 1.14.0
>
>
> When query with OR is divided into union of queries, options (like index tag) 
> are not passed into subqueries - in effect alternative query  sometimes f.e. 
> uses indexes it shouldn't use.
>  {noformat}
> org.apache.jackrabbit.oak.query.QueryImpl.buildAlternativeQuery()
> org.apache.jackrabbit.oak.query.QueryImpl.copyOf()
>  
> 2019-03-21 16:32:25,600 DEBUG 
> [org.apache.jackrabbit.oak.query.QueryEngineImpl] (default task-1) Parsing 
> JCR-SQL2 statement: select distinct d.* from [crkid:document] as d where 
> ([d].[metadane/inneMetadane/*/wartosc] = 'AX' and 
> [d].[metadane/inneMetadane/*/klucz] = 'InnyKod') or 
> ([d].[metadane/inneMetadane/*/wartosc] = 'AB' and 
> [d].[metadane/inneMetadane/*/klucz] = 'InnyKod') option(index tag 
> crkid_dokument_month_2019_3)
> 2019-03-21 16:32:25,607 DEBUG [org.apache.jackrabbit.oak.query.QueryImpl] 
> (default task-1) cost using filter Filter(query=select distinct d.* from 
> [crkid:document] as d where ([d].[metadane/inneMetadane/*/wartosc] = 'AB') 
> and ([d].[metadane/inneMetadane/*/klucz] = 'InnyKod'), path=*, 
> property=[metadane/inneMetadane/*/klucz=[InnyKod], 
> metadane/inneMetadane/*/wartosc=[AB]])
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


<    3   4   5   6   7   8   9   10   11   12   >