[jira] [Commented] (OAK-9060) IllegalArgumentException when using facets in union queries

2020-05-13 Thread Dirk Rudolph (Jira)


[ 
https://issues.apache.org/jira/browse/OAK-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106564#comment-17106564
 ] 

Dirk Rudolph commented on OAK-9060:
---

I opened [#209|https://github.com/apache/jackrabbit-oak/pull/209] that resolves 
the issue. Unfortunately I cannot easily provide a unit test for that as 
oak-core itself does not contain any index supporting facets afaik - ideas 
welcome. 

Can that be back ported to 1.10.x? 


> IllegalArgumentException when using facets in union queries
> ---
>
> Key: OAK-9060
> URL: https://issues.apache.org/jira/browse/OAK-9060
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Affects Versions: 1.10.3
>Reporter: Dirk Rudolph
>Priority: Major
>
> I get the following exception when trying to execute a JCR-SQL2 query with a 
> facet selector and 2 path constraints, being optimised to a union of 2 
> queries:
> {code}
> select s.[jcr:path], [rep:facet(jcr:content/tags)] from [cq:Page] as s where 
> ((isdescendantnode(s,'/content/pathA') or 
> isdescendantnode(s,'/content/pathB')) order by s.[jcr:content/date] desc
> {code}
> The same query works well with only one of the path constraints.
> {code}java.lang.IllegalArgumentException: Invalid path: 
> rep:facet(jcr:content/tags
> at 
> org.apache.jackrabbit.oak.query.QueryImpl.getOakPath(QueryImpl.java:1249) 
> [org.apache.jackrabbit.oak-core:1.10.3]
> at 
> org.apache.jackrabbit.oak.query.ast.AstElement.normalizePropertyName(AstElement.java:94)
>  [org.apache.jackrabbit.oak-core:1.10.3]
> at 
> org.apache.jackrabbit.oak.query.ast.SelectorImpl.currentProperty(SelectorImpl.java:566)
>  [org.apache.jackrabbit.oak-core:1.10.3]
> at 
> org.apache.jackrabbit.oak.query.ast.ColumnImpl.currentProperty(ColumnImpl.java:59)
>  [org.apache.jackrabbit.oak-core:1.10.3]
> at 
> org.apache.jackrabbit.oak.query.QueryImpl.currentRow(QueryImpl.java:892) 
> [org.apache.jackrabbit.oak-core:1.10.3]
> at 
> org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.fetchNext(QueryImpl.java:831)
>  [org.apache.jackrabbit.oak-core:1.10.3]
> at 
> org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.hasNext(QueryImpl.java:856)
>  [org.apache.jackrabbit.oak-core:1.10.3]
> at 
> org.apache.jackrabbit.oak.query.UnionQueryImpl$FacetMerger.bothHaveRows(UnionQueryImpl.java:483)
>  [org.apache.jackrabbit.oak-core:1.10.3]
> at 
> org.apache.jackrabbit.oak.query.UnionQueryImpl$FacetMerger.(UnionQueryImpl.java:436)
>  [org.apache.jackrabbit.oak-core:1.10.3]
> at 
> org.apache.jackrabbit.oak.query.UnionQueryImpl.getRows(UnionQueryImpl.java:304)
>  [org.apache.jackrabbit.oak-core:1.10.3]
> at 
> org.apache.jackrabbit.oak.query.ResultImpl$1.iterator(ResultImpl.java:72) 
> [org.apache.jackrabbit.oak-core:1.10.3]
> at 
> org.apache.jackrabbit.oak.jcr.query.QueryResultImpl$1.(QueryResultImpl.java:85)
>  [org.apache.jackrabbit.oak-jcr:1.10.3]
> at 
> org.apache.jackrabbit.oak.jcr.query.QueryResultImpl.getRows(QueryResultImpl.java:83)
>  [org.apache.jackrabbit.oak-jcr:1.10.3]
> {code}
> Apparently when copying the columns in [1] the information that the column is 
> a FacetColumnImpl is lost because FacetColumnImpl does not override the 
> copyOf().
> [1] 
> https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.10.8/oak-core/src/main/java/org/apache/jackrabbit/oak/query/QueryImpl.java#L1420



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OAK-9060) IllegalArgumentException when using facets in union queries

2020-05-13 Thread Dirk Rudolph (Jira)


 [ 
https://issues.apache.org/jira/browse/OAK-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-9060:
--
Description: 
I get the following exception when trying to execute a JCR-SQL2 query with a 
facet selector and 2 path constraints, being optimised to a union of 2 queries:

{code}
select s.[jcr:path], [rep:facet(jcr:content/tags)] from [cq:Page] as s where 
((isdescendantnode(s,'/content/pathA') or isdescendantnode(s,'/content/pathB')) 
order by s.[jcr:content/date] desc
{code}

The same query works well with only one of the path constraints.

{code}java.lang.IllegalArgumentException: Invalid path: 
rep:facet(jcr:content/tags
at 
org.apache.jackrabbit.oak.query.QueryImpl.getOakPath(QueryImpl.java:1249) 
[org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.ast.AstElement.normalizePropertyName(AstElement.java:94)
 [org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.ast.SelectorImpl.currentProperty(SelectorImpl.java:566)
 [org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.ast.ColumnImpl.currentProperty(ColumnImpl.java:59)
 [org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.QueryImpl.currentRow(QueryImpl.java:892) 
[org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.fetchNext(QueryImpl.java:831)
 [org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.hasNext(QueryImpl.java:856)
 [org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.UnionQueryImpl$FacetMerger.bothHaveRows(UnionQueryImpl.java:483)
 [org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.UnionQueryImpl$FacetMerger.(UnionQueryImpl.java:436)
 [org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.UnionQueryImpl.getRows(UnionQueryImpl.java:304) 
[org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.ResultImpl$1.iterator(ResultImpl.java:72) 
[org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.jcr.query.QueryResultImpl$1.(QueryResultImpl.java:85)
 [org.apache.jackrabbit.oak-jcr:1.10.3]
at 
org.apache.jackrabbit.oak.jcr.query.QueryResultImpl.getRows(QueryResultImpl.java:83)
 [org.apache.jackrabbit.oak-jcr:1.10.3]
{code}

Apparently when copying the columns in [1] the information that the column is a 
FacetColumnImpl is lost because FacetColumnImpl does not override the copyOf().

[1] 
https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.10.8/oak-core/src/main/java/org/apache/jackrabbit/oak/query/QueryImpl.java#L1420

  was:
I get the following exception when trying to execute a JCR-SQL2 query with a 
facet selector and 2 path constraints, being optimised to a union of 2 queries:

{code}
select s.[jcr:path], [rep:facet(jcr:content/tags)] from [cq:Page] as s where 
((isdescendantnode(s,'/content/pathA') or isdescendantnode(s,'/content/pathB')) 
order by s.[jcr:content/date] desc
{code}

The same query works well with only one of the path constraints.

{code}java.lang.IllegalArgumentException: Invalid path: 
rep:facet(jcr:content/genericComponent
at 
org.apache.jackrabbit.oak.query.QueryImpl.getOakPath(QueryImpl.java:1249) 
[org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.ast.AstElement.normalizePropertyName(AstElement.java:94)
 [org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.ast.SelectorImpl.currentProperty(SelectorImpl.java:566)
 [org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.ast.ColumnImpl.currentProperty(ColumnImpl.java:59)
 [org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.QueryImpl.currentRow(QueryImpl.java:892) 
[org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.fetchNext(QueryImpl.java:831)
 [org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.hasNext(QueryImpl.java:856)
 [org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.UnionQueryImpl$FacetMerger.bothHaveRows(UnionQueryImpl.java:483)
 [org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.UnionQueryImpl$FacetMerger.(UnionQueryImpl.java:436)
 [org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.UnionQueryImpl.getRows(UnionQueryImpl.java:304) 
[org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.ResultImpl$1.iterator(ResultImpl.java:72) 
[org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.jcr.query.QueryResultImpl$1.(QueryResultImpl.java:85)
 [org.apache.jackrabbit.oak-jcr:1.10.3]
at 
org.apache.jackrabbi

[jira] [Created] (OAK-9060) IllegalArgumentException when using facets in union queries

2020-05-13 Thread Dirk Rudolph (Jira)
Dirk Rudolph created OAK-9060:
-

 Summary: IllegalArgumentException when using facets in union 
queries
 Key: OAK-9060
 URL: https://issues.apache.org/jira/browse/OAK-9060
 Project: Jackrabbit Oak
  Issue Type: Bug
Affects Versions: 1.10.3
Reporter: Dirk Rudolph


I get the following exception when trying to execute a JCR-SQL2 query with a 
facet selector and 2 path constraints, being optimised to a union of 2 queries:

{code}
select s.[jcr:path], [rep:facet(jcr:content/tags)] from [cq:Page] as s where 
((isdescendantnode(s,'/content/pathA') or isdescendantnode(s,'/content/pathB')) 
order by s.[jcr:content/date] desc
{code}

The same query works well with only one of the path constraints.

{code}java.lang.IllegalArgumentException: Invalid path: 
rep:facet(jcr:content/genericComponent
at 
org.apache.jackrabbit.oak.query.QueryImpl.getOakPath(QueryImpl.java:1249) 
[org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.ast.AstElement.normalizePropertyName(AstElement.java:94)
 [org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.ast.SelectorImpl.currentProperty(SelectorImpl.java:566)
 [org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.ast.ColumnImpl.currentProperty(ColumnImpl.java:59)
 [org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.QueryImpl.currentRow(QueryImpl.java:892) 
[org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.fetchNext(QueryImpl.java:831)
 [org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.hasNext(QueryImpl.java:856)
 [org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.UnionQueryImpl$FacetMerger.bothHaveRows(UnionQueryImpl.java:483)
 [org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.UnionQueryImpl$FacetMerger.(UnionQueryImpl.java:436)
 [org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.UnionQueryImpl.getRows(UnionQueryImpl.java:304) 
[org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.query.ResultImpl$1.iterator(ResultImpl.java:72) 
[org.apache.jackrabbit.oak-core:1.10.3]
at 
org.apache.jackrabbit.oak.jcr.query.QueryResultImpl$1.(QueryResultImpl.java:85)
 [org.apache.jackrabbit.oak-jcr:1.10.3]
at 
org.apache.jackrabbit.oak.jcr.query.QueryResultImpl.getRows(QueryResultImpl.java:83)
 [org.apache.jackrabbit.oak-jcr:1.10.3]
{code}

Apparently when copying the columns in [1] the information that the column is a 
FacetColumnImpl is lost because FacetColumnImpl does not override the copyOf();

[1] 
https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.10.8/oak-core/src/main/java/org/apache/jackrabbit/oak/query/QueryImpl.java#L1420



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-19 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16332289#comment-16332289
 ] 

Dirk Rudolph commented on OAK-7109:
---

Thanks for the response. Regarding 1) see 
https://issues.apache.org/jira/browse/OAK-7109?focusedCommentId=16309376&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16309376

Optimisation does the following at the moment:

A and (B or not(C and D)) => (A and B) or (A and not(C and D))

To achieve an optimisation where the result is a DNF, which can then be split 
in UNIONS of exclusively conjunctions, another step needs to happen before the 
current optimisation - NNF (moving all negation down the tree of statements)

A and (B or not(C or D)) => A and (B or not(C) or not(B)) => (A and B) or (A 
and not(C)) or (A and not(B)) 

Not sure if the index supports not() but if it does, the UNION of the query 
above (3) queries would give exact facets which simply need to be deduplicated. 

 

> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>Priority: Major
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-05 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312900#comment-16312900
 ] 

Dirk Rudolph edited comment on OAK-7109 at 1/5/18 10:38 AM:


[~tmueller] so adding the feature to aggregate the current rep:facet extraction 
from the UNION alternatives has 2 drawbacks:

1) as said above, all constraints have to be passed to lucene, so the query has 
to be in DNF, which is not the case at the moment
2) even if this is the case, the disjunctive conjunctions are not mutually 
exclusive leading to inaccurate result as well

1) can be easily fixed by converting the restriction sot NNF before doing the 
optimisation. 2) would require also a deduplication between the lucene result 
sets returned from each of the unions. 




was (Author: diru):
[~tmueller] so adding the feature to aggregate the current rep:facet extraction 
from the UNION alternatives has 2 drawbacks:

1) as said above, all constraints have to be passed to lucene, so the query has 
to be in DNF, which is not the case at the moment
2) even if this is the case, the disjunctive conjunctions are not mutually 
exclusive leading to inaccurate result as well

It would require also a deduplication between the lucene results returned from 
each of the unions. 



> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-05 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312900#comment-16312900
 ] 

Dirk Rudolph edited comment on OAK-7109 at 1/5/18 10:37 AM:


[~tmueller] so adding the feature to aggregate the current rep:facet extraction 
from the UNION alternatives has 2 drawbacks:

1) as said above, all constraints have to be passed to lucene, so the query has 
to be in DNF, which is not the case at the moment
2) even if this is the case, the disjunctive conjunctions are not mutually 
exclusive leading to inaccurate result as well

It would require also a deduplication between the lucene results returned from 
each of the unions. 




was (Author: diru):
[~tmueller] so adding the feature to aggregate the current rep:facet extraction 
from the UNION alternatives has 2 drawbacks:

1) as said above, all constraints have to be passed to lucene, so the query has 
to be in DNF, which is not the case at the moment
2) even if this is the case, the disjunctive conjunctions are not mutually 
exclusive leading to inaccurate result as well



> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-05 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312900#comment-16312900
 ] 

Dirk Rudolph commented on OAK-7109:
---

[~tmueller] so adding the feature to aggregate the current rep:facet extraction 
from the UNION alternatives has 2 drawbacks:

1) as said above, all constraints have to be passed to lucene, so the query has 
to be in DNF, which is not the case at the moment
2) even if this is the case, the disjunctive conjunctions are not mutually 
exclusive leading to inaccurate result as well



> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-05 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312691#comment-16312691
 ] 

Dirk Rudolph commented on OAK-7109:
---

{quote}
I have a very pessimistic view that we should fail such queries - I mean it's 
better to fail and allow for right index def than giving incorrect results.
{quote}
+1

> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7071) PostingsHighlighter, Highlighter and SimpleExcerptProvider return all different formats for excerpts

2018-01-03 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-7071:
--
Description: 
*PostingsHighligher* returns for example 
{quote} 
[my text with any highlighting followed by more text]
{quote}
because the PostingsHighligher itself returns for each field a {{String[]}} of 
phrases limited by the beforehand given max phrases. This String[] is the 
transformed to String using {{Arrays.toString()}} at 
[LucenePropertyIndex.java#L688|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L688]
 causing the value to be wrapped in square brackets.

*Highlighter* returns 
{quote}
my text with any highlighting followed by more text 
{quote}

*SimpleExcerptProvider* returns
{quote}
my text with any highlighting followed by more 
text
{quote}

As the PostingsHighligher cannot get any custom prefix or suffix, I would 
suggest set  as default for the others as well to prevent any further 
text transformation post extracting the excerpts.


  was:
*PostingsHighligher* returns for example 
{quote} 
[my text with any highlighting followed by more text]
{quote}
because the PostingsHighligher itself returns for each field a {{String[]}} of 
phrases limited by the beforehand given max phrases. This String[] is the 
transformed to String using {{Arrays.toString()}} at 
[LucenePropertyIndex.java#L688|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L688]
 causing the value to be wrapped in square brackets.

*Highlighter* returns 
{quote}
my text with any highlighting followed by more text 
{quote}

*SimpleExcerptProvider* returns
{quote}
my text with any highlighting followed by more text 
{quote}

As the PostingsHighligher cannot get any custom prefix or suffix, I would 
suggest set  as default for the others as well to prevent any further 
text transformation post extracting the excerpts.



> PostingsHighlighter, Highlighter and SimpleExcerptProvider return all 
> different formats for excerpts
> 
>
> Key: OAK-7071
> URL: https://issues.apache.org/jira/browse/OAK-7071
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7, 1.8
>Reporter: Dirk Rudolph
>  Labels: excerpt
>
> *PostingsHighligher* returns for example 
> {quote} 
> [my text with any highlighting followed by more text]
> {quote}
> because the PostingsHighligher itself returns for each field a {{String[]}} 
> of phrases limited by the beforehand given max phrases. This String[] is the 
> transformed to String using {{Arrays.toString()}} at 
> [LucenePropertyIndex.java#L688|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L688]
>  causing the value to be wrapped in square brackets.
> *Highlighter* returns 
> {quote}
> my text with any highlighting followed by more text 
> {quote}
> *SimpleExcerptProvider* returns
> {quote}
> my text with any highlighting followed by more 
> text
> {quote}
> As the PostingsHighligher cannot get any custom prefix or suffix, I would 
> suggest set  as default for the others as well to prevent any further 
> text transformation post extracting the excerpts.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-03 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309559#comment-16309559
 ] 

Dirk Rudolph edited comment on OAK-7109 at 1/3/18 12:47 PM:


Here is an example where constraints get lost in the filter:

{code}
select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or 
([propa] = 'false' and not([propb] in('foo','bar')))
{code}

It implements kind of white-/blacklisting/xor ala "If a is set to true, b has 
to be in a configured set, if not, b has not to be in the configured set." It 
evaluates to: 

{code}
[nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where 
[nt:base].[propa] is not null */
{code}

Which doesn't contain anything of propb, so in that case facet counting will be 
wrong as well.

As you can see the query is in DNF, and querying with its disjunctive 
statements individually works, well. I attached a unit test showing it for this 
specific example 
([restrictionPropagationTest.patch|https://issues.apache.org/jira/secure/attachment/12904386/restrictionPropagationTest.patch])

Edit: I think there are 2 issues here: 
1) the OR of the query with both statements 
2) the not with the query containing only the second disjunctive statement. 


was (Author: diru):
Here is an example where constraints get lost in the filter:

{code}
select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or 
([propa] = 'false' and not([propb] in('foo','bar')))
{code}

It implements kind of white-/blacklisting/xor ala "If a is set to true, b has 
to be in a configured set, if not, b has not to be in the configured set." It 
evaluates to: 

{code}
[nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where 
[nt:base].[propa] is not null */
{code}

Which doesn't contain anything of propb, so in that case facet counting will be 
wrong as well.

As you can see the query is in DNF, and querying with its disjunctive 
statements individually works, well. I attached a unit test showing it for this 
specific example 
([restrictionPropagationTest.patch|https://issues.apache.org/jira/secure/attachment/12904386/restrictionPropagationTest.patch])

> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-03 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309559#comment-16309559
 ] 

Dirk Rudolph edited comment on OAK-7109 at 1/3/18 12:14 PM:


Here is an example where constraints get lost in the filter:

{code}
select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or 
([propa] = 'false' and not([propb] in('foo','bar')))
{code}

It implements kind of white-/blacklisting/xor ala "If a is set to true, b has 
to be in a configured set, if not, b has not to be in the configured set." It 
evaluates to: 

{code}
[nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where 
[nt:base].[propa] is not null */
{code}

Which doesn't contain anything of propb, so in that case facet counting will be 
wrong as well.

As you can see the query is in DNF, and querying with its disjunctive 
statements individually works, well. I attached a unit test showing it for this 
specific example 
([restrictionPropagationTest.patch|https://issues.apache.org/jira/secure/attachment/12904386/restrictionPropagationTest.patch])


was (Author: diru):
Here is an example where constraints get lost in the filter:

{code}
select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or 
([propa] = 'false' and not([propb] in('foo','bar')))
{code}

It implements kind of white-/blacklisting ala "If a is set to true, b has to be 
in a configured set, if not, b has not to be in the configured set." It 
evaluates to: 

{code}
[nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where 
[nt:base].[propa] is not null */
{code}

Which doesn't contain anything of propb, so in that case facet counting will be 
wrong as well.

As you can see the query is in DNF, and querying with its disjunctive 
statements individually works, well. I attached a unit test showing it for this 
specific example 
([restrictionPropagationTest.patch|https://issues.apache.org/jira/secure/attachment/12904386/restrictionPropagationTest.patch])

> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-03 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309559#comment-16309559
 ] 

Dirk Rudolph edited comment on OAK-7109 at 1/3/18 12:13 PM:


Here is an example where constraints get lost in the filter:

{code}
select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or 
([propa] = 'false' and not([propb] in('foo','bar')))
{code}

It implements kind of white-/blacklisting ala "If a is set to true, b has to be 
in a configured set, if not, b has not to be in the configured set." It 
evaluates to: 

{code}
[nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where 
[nt:base].[propa] is not null */
{code}

Which doesn't contain anything of propb, so in that case facet counting will be 
wrong as well.

As you can see the query is in DNF, and querying with its disjunctive 
statements individually works, well. I attached a unit test showing it for this 
specific example 
([restrictionPropagationTest.patch|https://issues.apache.org/jira/secure/attachment/12904386/restrictionPropagationTest.patch])


was (Author: diru):
Here is an example where constraints get lost in the filter:

{code}
select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or 
([propa] = 'false' and not([propb] in('foo','bar')))
{code}

It implements kind of white-/blacklisting ala "If a is set to true, b has to be 
in a configured set, if not, b has not to be in the configured set." It 
evaluates to: 

{code}
[nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where 
[nt:base].[propa] is not null */
{code}

Which doesn't contain anything of propb, so in that case facet counting will be 
wrong as well.

As you can see the query is in DNF, and querying with its disjunctive 
statements individually works, well. I attached a unit test showing it for this 
specific example (restrictionPropagationTest.patch)

> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-03 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309559#comment-16309559
 ] 

Dirk Rudolph edited comment on OAK-7109 at 1/3/18 12:13 PM:


Here is an example where constraints get lost in the filter:

{code}
select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or 
([propa] = 'false' and not([propb] in('foo','bar')))
{code}

It implements kind of white-/blacklisting ala "If a is set to true, b has to be 
in a configured set, if not, b has not to be in the configured set." It 
evaluates to: 

{code}
[nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where 
[nt:base].[propa] is not null */
{code}

Which doesn't contain anything of propb, so in that case facet counting will be 
wrong as well.

As you can see the query is in DNF, and querying with its disjunctive 
statements individually works, well. I attached a unit test showing it for this 
specific example (restrictionPropagationTest.patch)


was (Author: diru):
Here is an example where constraints get lost in the filter:

{code}
select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or 
([propa] = 'false' and not([propb] in('foo','bar')))
{code}

It implements kind of white-/blacklisting ala "If a is set to true, b has to be 
in a configured set, if not, b has not to be in the configured set." It 
evaluates to: 

{code}
[nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where 
[nt:base].[propa] is not null */
{code}

Which doesn't contain anything of propb, so in that case facet counting will be 
wrong as well.

As you can see the query is in DNF, and querying with its disjunctive 
statements individually works, well. I attached a unit test showing it for this 
specific example.

> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-03 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309559#comment-16309559
 ] 

Dirk Rudolph edited comment on OAK-7109 at 1/3/18 12:12 PM:


Here is an example where constraints get lost in the filter:

{code}
select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or 
([propa] = 'false' and not([propb] in('foo','bar')))
{code}

It implements kind of white-/blacklisting ala "If a is set to true, b has to be 
in a configured set, if not, b has not to be in the configured set." It 
evaluates to: 

{code}
[nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where 
[nt:base].[propa] is not null */
{code}

Which doesn't contain anything of propb, so in that case facet counting will be 
wrong as well.

As you can see the query is in DNF, and querying with its disjunctive 
statements individually works, well. I attached a unit test showing it for this 
specific example.


was (Author: diru):
Here is an example where constraints get lost in the filter:

{code}
select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or 
([propa] = 'false' and not([propb] in('foo','bar')))
{code}

It implements kind of white-/blacklisting ala "If a is set to true, b has to be 
in a configured set, if not, b has not to be in the configured set." It 
evaluates to: 

{code}
[nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where 
[nt:base].[propa] is not null */
{code}

Which doesn't contain anything of propb, so in that case facet counting will be 
wrong as well.



> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-03 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-7109:
--
Attachment: restrictionPropagationTest.patch

> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-03 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309559#comment-16309559
 ] 

Dirk Rudolph commented on OAK-7109:
---

Here is an example where constraints get lost in the filter:

{code}
select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or 
([propa] = 'false' and not([propb] in('foo','bar')))
{code}

It implements kind of white-/blacklisting ala "If a is set to true, b has to be 
in a configured set, if not, b has not to be in the configured set." It 
evaluates to: 

{code}
[nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where 
[nt:base].[propa] is not null */
{code}

Which doesn't contain anything of propb, so in that case facet counting will be 
wrong as well.



> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries

2018-01-03 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309376#comment-16309376
 ] 

Dirk Rudolph commented on OAK-7109:
---

Hi [~catholicon] somehow the mail agent doesn't accept my mailings to oak-dev 
(I'm subscribed and receiving mail but sending doesn't work ... anyway).

I checked the implementation of the optimisation and its not in dnf, as the 
optimisation is not done on the negation normal form of the query (so not(a or 
b) are not properly expanded to not(a) and not(b). For example (based on 
org.apache.jackrabbit.oak.query.SQL2OptimiseQueryTest#optimiseAndOrAnd()):

{code}
given ([a]=1 or [b]=2 or ([c]=3 and not([d]=4 or [e]=5))) and [x]=6 <=> ([a]=1 
or [b]=2 or ([c]=3 and [d]<>4 and [e]<>5))) and [x]=6
expected ([a]=1 and [x]=6), ([b]=2 and [x]=6), ([c]=3 and [d]<>4 and [e]<>5 and 
[x]=6)
actual ((c = 3) and (not ((d = 4) or (e = 5 and (x = 6), (b = 2) and (x = 
6), (a = 1) and (x = 6)
{code}

And even, assuming we would have the alternative being a DNF and facet counting 
across unions would be supported merging the results from each of the queries 
given to lucene, the result will still be wrong as each of the disjunctive 
statements will not be mutually exclusive (as it would be with xor). So from my 
perspective there is not way to get proper facet counts in that case from 
consumer side and only the option of 

b) filtering the documents based on the filter 
c) passing all constraints to lucene

would work. 

Regarding b) as from what I can see in the code base the nodes are not actually 
read but only the permissions on their path are checked in 
[FilteredSortedSetDocValuesFacetCounts.java#L91|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/util/FilteredSortedSetDocValuesFacetCounts.java#L91]

I will check further why our specific query doesn't get entirely passed to 
lucene (or better which constraints are not taken into account beside the path 
constraints). Anyway as a user of the jcr api I would expect a 
RepositoryException (or any other) when I try to run a query with facet 
extraction that no index can provide - similar to the exception I get when the 
field I extract facets on is not stored. 


> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7078) NullPointerException in FilteredSortedSetDocValuesFacetCounts during query evaluation

2017-12-22 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16301390#comment-16301390
 ] 

Dirk Rudolph commented on OAK-7078:
---

[~catholicon] any thought on that one? you can apply only the provided unit 
test to see the exception happening without the null check. 

> NullPointerException in FilteredSortedSetDocValuesFacetCounts during query 
> evaluation
> -
>
> Key: OAK-7078
> URL: https://issues.apache.org/jira/browse/OAK-7078
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
>
> Running the following query {{select \[rep:facet(simple/tags)] from 
> \[nt:base] where contains(\[text], 'ipsum')}} with the following content 
> {code}
> /content/foo
>  - text = "lorem lorem"
>  + simple/
>- tags = ["tag1", "tag2"]
> /content/bar
>  - text = "lorem ipsum"
> {code}
> runs in the following NPE
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.util.FilteredSortedSetDocValuesFacetCounts.getTopChildren(FilteredSortedSetDocValuesFacetCounts.java:63)
>   at 
> org.apache.lucene.facet.MultiFacets.getTopChildren(MultiFacets.java:52)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LucenePathCursor$2.getValue(LucenePropertyIndex.java:1646)
>   ... 38 more
> {code}
> This is because the result set for the query only contains {{/content/bar}} 
> and with that the count of the dimension {{simple/tag}} is 0. For that case 
> [SortedSetDocValuesFacetCounts#getDim()|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.7.1/lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java#L108]
>  returns {{null}} and so does {{getTopChildren}}.
> This expected behaviour is properly handled in 
> [LucenePropertyIndex.java#L1647|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L1647]
>  but not in 
> [FilteredSortedSetDocValuesFacetCounts.java#L63|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/util/FilteredSortedSetDocValuesFacetCounts.java#L63]
>  where {{topChildren}} is dereferenced without null check.
> To workaround that secure facets can be set to false, though the default 
> value is true.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries

2017-12-22 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16301325#comment-16301325
 ] 

Dirk Rudolph edited comment on OAK-7109 at 12/22/17 12:58 PM:
--

Yeah support of unions with facets doesn't work well, as facets are extracted 
on each row, though they related to the result not the rows. Will open an 
improvement for that as well as this has some costs: basically calling 
getTopChildren() for each row while iterating the result set. 

With splitting the result I didn't mean running the query in a union but 
running individual queries merging their RowIterators sets manually and 
extracting facets only from the first hit of each merging them together as 
well. That basically works but as I said I would have to rewrite the query in 
DNF like in the example:

{code:title=distribute and over or}
contains(a.[*], 'ipsum') and (isdescendantnode(a,'/content1') or 
isdescendantnode(a,'/content2'))
<=>
(contains(a.[*], 'ipsum') and isdescendantnode(a,'/content1')) or 
(contains(a.[*], 'ipsum')  and isdescendantnode(a,'/content2')))
{code}
{code:title=split and run query for each disjunctive statement}
contains(a.[*], 'ipsum') and isdescendantnode(a,'/content1')
contains(a.[*], 'ipsum') and isdescendantnode(a,'/content2')
{code}

That basically works, but only in the case that both queries hit the same index 
as only then TF/IDF score is comparable (also across multiple queries). So the 
solutions I see are:
a) creating DNF disjunctive statements of a query as alternatives (not sure if 
the alternative currently created is DNF) and support proper counting over 
union queries
b) filtering the results in the using the query plans filter while counting 
facets, similar to the way its done for ACLs
c) implementing a mode which translates any query as it is to its lucene 
equivalent

Both a) and b) come probably with a drawback on performance. c) might not even 
be feasible. 

For our real world case the complexity is not only given by the path 
restriction but there are more restrictions conjunct to it. We tried already 
running one query for each path, but even with that the individual queries are 
too complex to be passed to lucene with all constraints. (not entirely sure why 
though ...)

Edit: opened OAK-7110 for counting facets only once per result, not once per 
row.


was (Author: diru):
Yeah support of unions with facets doesn't work well, as facets are extracted 
on each row, though they related to the result not the rows. Will open an 
improvement for that as well as this has some costs: basically calling 
getTopChildren() for each row while iterating the result set. 

With splitting the result I didn't mean running the query in a union but 
running individual queries merging their RowIterators sets manually and 
extracting facets only from the first hit of each merging them together as 
well. That basically works but as I said I would have to rewrite the query in 
DNF like in the example:

{code:title=distribute and over or}
contains(a.[*], 'ipsum') and (isdescendantnode(a,'/content1') or 
isdescendantnode(a,'/content2'))
<=>
(contains(a.[*], 'ipsum') and isdescendantnode(a,'/content1')) or 
(contains(a.[*], 'ipsum')  and isdescendantnode(a,'/content2')))
{code}
{code:title=split and run query for each disjunctive statement}
contains(a.[*], 'ipsum') and isdescendantnode(a,'/content1')
contains(a.[*], 'ipsum') and isdescendantnode(a,'/content2')
{code}

That basically works, but only in the case that both queries hit the same index 
as only then TF/IDF score is comparable (also across multiple queries). So the 
solutions I see are:
a) creating DNF disjunctive statements of a query as alternatives (not sure if 
the alternative currently created is DNF) and support proper counting over 
union queries
b) filtering the results in the using the query plans filter while counting 
facets, similar to the way its done for ACLs
c) implementing a mode which translates any query as it is to its lucene 
equivalent

Both a) and b) come probably with a drawback on performance. c) might not even 
be feasible. 

Edit: opened OAK-7110 for counting facets only once per result, not once per 
row.

> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a whe

[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries

2017-12-22 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16301325#comment-16301325
 ] 

Dirk Rudolph edited comment on OAK-7109 at 12/22/17 12:55 PM:
--

Yeah support of unions with facets doesn't work well, as facets are extracted 
on each row, though they related to the result not the rows. Will open an 
improvement for that as well as this has some costs: basically calling 
getTopChildren() for each row while iterating the result set. 

With splitting the result I didn't mean running the query in a union but 
running individual queries merging their RowIterators sets manually and 
extracting facets only from the first hit of each merging them together as 
well. That basically works but as I said I would have to rewrite the query in 
DNF like in the example:

{code:title=distribute and over or}
contains(a.[*], 'ipsum') and (isdescendantnode(a,'/content1') or 
isdescendantnode(a,'/content2'))
<=>
(contains(a.[*], 'ipsum') and isdescendantnode(a,'/content1')) or 
(contains(a.[*], 'ipsum')  and isdescendantnode(a,'/content2')))
{code}
{code:title=split and run query for each disjunctive statement}
contains(a.[*], 'ipsum') and isdescendantnode(a,'/content1')
contains(a.[*], 'ipsum') and isdescendantnode(a,'/content2')
{code}

That basically works, but only in the case that both queries hit the same index 
as only then TF/IDF score is comparable (also across multiple queries). So the 
solutions I see are:
a) creating DNF disjunctive statements of a query as alternatives (not sure if 
the alternative currently created is DNF) and support proper counting over 
union queries
b) filtering the results in the using the query plans filter while counting 
facets, similar to the way its done for ACLs
c) implementing a mode which translates any query as it is to its lucene 
equivalent

Both a) and b) come probably with a drawback on performance. c) might not even 
be feasible. 

Edit: opened OAK-7110 for counting facets only once per result, not once per 
row.


was (Author: diru):
Yeah support of unions with facets doesn't work well, as facets are extracted 
on each row, though they related to the result not the rows. Will open an 
improvement for that as well as this has some costs: basically calling 
getTopChildren() for each row while iterating the result set. 

With splitting the result I didn't mean running the query in a union but 
running individual queries merging their RowIterators sets manually and 
extracting facets only from the first hit of each merging them together as 
well. That basically works but as I said I would have to rewrite the query in 
DNF like in the example:

{code}
select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
'ipsum') and isdescendantnode(a,'/content1')
select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
'ipsum') and isdescendantnode(a,'/content2')
{code}

That basically works, but only in the case that both queries hit the same index 
as only then TF/IDF score is comparable (also across multiple queries). So the 
solutions I see are:
a) creating DNF disjunctive statements of a query as alternatives (not sure if 
the alternative currently created is DNF) and support proper counting over 
union queries
b) filtering the results in the using the query plans filter while counting 
facets, similar to the way its done for ACLs
c) implementing a mode which translates any query as it is to its lucene 
equivalent

Both a) and b) come probably with a drawback on performance. c) might not even 
be feasible. 

Edit: opened OAK-7110 for counting facets only once per result, not once per 
row.

> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  -

[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries

2017-12-22 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16301325#comment-16301325
 ] 

Dirk Rudolph edited comment on OAK-7109 at 12/22/17 12:43 PM:
--

Yeah support of unions with facets doesn't work well, as facets are extracted 
on each row, though they related to the result not the rows. Will open an 
improvement for that as well as this has some costs: basically calling 
getTopChildren() for each row while iterating the result set. 

With splitting the result I didn't mean running the query in a union but 
running individual queries merging their RowIterators sets manually and 
extracting facets only from the first hit of each merging them together as 
well. That basically works but as I said I would have to rewrite the query in 
DNF like in the example:

{code}
select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
'ipsum') and isdescendantnode(a,'/content1')
select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
'ipsum') and isdescendantnode(a,'/content2')
{code}

That basically works, but only in the case that both queries hit the same index 
as only then TF/IDF score is comparable (also across multiple queries). So the 
solutions I see are:
a) creating DNF disjunctive statements of a query as alternatives (not sure if 
the alternative currently created is DNF) and support proper counting over 
union queries
b) filtering the results in the using the query plans filter while counting 
facets, similar to the way its done for ACLs
c) implementing a mode which translates any query as it is to its lucene 
equivalent

Both a) and b) come probably with a drawback on performance. c) might not even 
be feasible. 

Edit: opened OAK-7110 for counting facets only once per result, not once per 
row.


was (Author: diru):
Yeah support of unions with facets doesn't work well, as facets are extracted 
on each row, though they related to the result not the rows. Will open an 
improvement for that as well as this has some costs: basically calling 
getTopChildren() for each row while iterating the result set. 

With splitting the result I didn't mean running the query in a union but 
running individual queries merging their RowIterators sets manually and 
extracting facets only from the first hit of each merging them together as 
well. That basically works but as I said I would have to rewrite the query in 
DNF like in the example:

{code}
select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
'ipsum') and isdescendantnode(a,'/content1')
select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
'ipsum') and isdescendantnode(a,'/content2')
{code}

That basically works, but only in the case that both queries hit the same index 
as only then TF/IDF score is comparable (also across multiple queries). So the 
solutions I see are:
a) creating DNF disjunctive statements of a query as alternatives (not sure if 
the alternative currently created is DNF) and support proper counting over 
union queries
b) filtering the results in the using the query plans filter while counting 
facets, similar to the way its done for ACLs
c) implementing a mode which translates any query as it is to its lucene 
equivalent

Both a) and b) come probably with a drawback on performance. c) might not even 
be feasible. 


> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The 

[jira] [Created] (OAK-7110) Run rep:facet counting only once per lucene result

2017-12-22 Thread Dirk Rudolph (JIRA)
Dirk Rudolph created OAK-7110:
-

 Summary: Run rep:facet counting only once per lucene result
 Key: OAK-7110
 URL: https://issues.apache.org/jira/browse/OAK-7110
 Project: Jackrabbit Oak
  Issue Type: Improvement
  Components: lucene
Affects Versions: 1.6.7
Reporter: Dirk Rudolph
Priority: Minor


Currently facet counting [(calling 
Facets#getTopChildren)|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L1752]
 is called for each facet field for each row. This is because constructing 
[QueryImpl|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/QueryImpl.java#L876]
 reads all columns of each row and so it read the facets as well.

This might have a negative impact on performance extracting facets (not proven) 
and can be optimised by caching the counted topChildren for each field in the 
scope of the result, returning the cache result for subsequent calls. 




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7109) rep:facet returns wrong results for complex queries

2017-12-22 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-7109:
--
Description: 
eComplex queries in that case are queries, which are passed to lucene not 
containing all original constraints. For example queries with multiple path 
restrictions like:

{code}
select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
'ipsum') and (isdescendantnode(a,'/content1') or 
isdescendantnode(a,'/content2'))
{code}

In that particular case the index planer gives ":fulltext:ipsum" to lucene even 
though the index supports evaluating path constraints. 

As counting the facets happens on the raw result of lucene, the returned facets 
are incorrect. For example having the following content 

{code}
/content1/test/foo
 + text = lorem ipsum
 - simple/
  + tags = tag1, tag2
/content2/test/bar
 + text = lorem ipsum
 - simple/
  + tags = tag1, tag2
/content3/test/bar
 + text = lorem ipsum
 - simple/
   + tags = tag1, tag2
{code}

the expected result for the dimensions of simple/tags and the query above is 
- tag1: 2
- tag2: 2

as the result set is 2 results long and all documents are equal. The actual 
result set is 
- tag1: 3
- tag2: 3

as the path constraint is not handled by lucene.

To workaround that the only solution that came to my mind is building the 
[disjunctive normal form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] 
of my complex query and executing a query for each of the disjunctive 
statements. As this is expanding exponentially its only a theoretical solution, 
nothing for production. 

  was:
eComplex queries in that case are queries, which are passed to lucene not 
containing all original constraints. For example queries with multiple path 
restrictions like:

{code}
select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
'ipsum') and (isdescendantnode(a,'/content1') or 
isdescendantnode(a,'/content2'))
{code}

In that particular case the index planer gives ":fulltext:ipsum" to lucene even 
though the index supports evaluating path constraints. 

As counting the facets happens on the raw result of lucene, the returned facets 
are incorrect. For example having the following content 

{code}
/content1/test/foo
 + text = lorem ipsum
 - simple/
  + tags = tag1, tag2
/content2/test/bar
 + text = lorem ipsum
 - simple/
  + tags = tag1, tag2
/content3/test/bar
 + text = lorem ipsum
 - simple/
   + tags = tag1, tag2
{code}

the expected result for the dimensions of simple/tags and the query above is 
- tag1: 2
- tag2: 2

as the result set is 2 results long and all documents are equal. The actual 
result set is 
- tag1: 3
- tag2: 3

as the path constraint is not handled by lucene.

To workaround that the only solution that came to my mind is building the DNF 
of my complex query and executing a query for each of the disjunctive 
statements. As this is expanding exponentially its only a theoretical solution, 
nothing for production. 


> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solu

[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries

2017-12-22 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16301325#comment-16301325
 ] 

Dirk Rudolph commented on OAK-7109:
---

Yeah support of unions with facets doesn't work well, as facets are extracted 
on each row, though they related to the result not the rows. Will open an 
improvement for that as well as this has some costs: basically calling 
getTopChildren() for each row while iterating the result set. 

With splitting the result I didn't mean running the query in a union but 
running individual queries merging their RowIterators sets manually and 
extracting facets only from the first hit of each merging them together as 
well. That basically works but as I said I would have to rewrite the query in 
DNF like in the example:

{code}
select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
'ipsum') and isdescendantnode(a,'/content1')
select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
'ipsum') and isdescendantnode(a,'/content2')
{code}

That basically works, but only in the case that both queries hit the same index 
as only then TF/IDF score is comparable (also across multiple queries). So the 
solutions I see are:
a) creating DNF disjunctive statements of a query as alternatives (not sure if 
the alternative currently created is DNF) and support proper counting over 
union queries
b) filtering the results in the using the query plans filter while counting 
facets, similar to the way its done for ACLs
c) implementing a mode which translates any query as it is to its lucene 
equivalent

Both a) and b) come probably with a drawback on performance. c) might not even 
be feasible. 


> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the DNF 
> of my complex query and executing a query for each of the disjunctive 
> statements. As this is expanding exponentially its only a theoretical 
> solution, nothing for production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7109) rep:facet returns wrong results for complex queries

2017-12-22 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-7109:
--
Labels: facet  (was: )

> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
> Attachments: facetsInMultipleRoots.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the DNF 
> of my complex query and executing a query for each of the disjunctive 
> statements. As this is expanding exponentially its only a theoretical 
> solution, nothing for production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7109) rep:facet returns wrong results for complex queries

2017-12-22 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-7109:
--
Description: 
eComplex queries in that case are queries, which are passed to lucene not 
containing all original constraints. For example queries with multiple path 
restrictions like:

{code}
select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
'ipsum') and (isdescendantnode(a,'/content1') or 
isdescendantnode(a,'/content2'))
{code}

In that particular case the index planer gives ":fulltext:ipsum" to lucene even 
though the index supports evaluating path constraints. 

As counting the facets happens on the raw result of lucene, the returned facets 
are incorrect. For example having the following content 

{code}
/content1/test/foo
 + text = lorem ipsum
 - simple/
  + tags = tag1, tag2
/content2/test/bar
 + text = lorem ipsum
 - simple/
  + tags = tag1, tag2
/content3/test/bar
 + text = lorem ipsum
 - simple/
   + tags = tag1, tag2
{code}

the expected result for the dimensions of simple/tags and the query above is 
- tag1: 2
- tag2: 2

as the result set is 2 results long and all documents are equal. The actual 
result set is 
- tag1: 3
- tag2: 3

as the path constraint is not handled by lucene.

To workaround that the only solution that came to my mind is building the DNF 
of my complex query and executing a query for each of the disjunctive 
statements. As this is expanding exponentially its only a theoretical solution, 
nothing for production. 

  was:
Complex queries in that case are queries, which are passed to lucene not 
containing all original constraints. For example queries with multiple path 
restrictions like:

{code}
select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
'ipsum') and (isdescendantnode(a,'/content1') or 
isdescendantnode(a,'/content2'))
{code}

In that particular case the index planer gives ":fulltext:ipsum" to lucene even 
though the index supports evaluating path constraints. 

As counting the facets happens on the raw result of lucene, the returned facets 
are incorrect. For example having the following content 

{code}
/content1/test/foo
 + text = lorem ipsum
 - simple/
  + tags = tag1, tag2
/content2/test/bar
 + text = lorem ipsum
 - simple/
  + tags = tag1, tag2
/content1/test/bar
 + text = lorem ipsum
 - simple/
   + tags = tag1, tag2
{code}

the expected result for the dimensions of simple/tags and the query above is 
- tag1: 2
- tag2: 2

as the result set is 2 results long and all documents are equal. The actual 
result set is 
- tag1: 3
- tag2: 3

as the path constraint is not handled by lucene.

To workaround that the only solution that came to my mind is building the DNF 
of my complex query and executing a query for each of the disjunctive 
statements. As this is expanding exponentially its only a theoretical solution, 
nothing for production. 


> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
> Attachments: facetsInMultipleRoots.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the DNF 
> of my complex query and executing a query for each of the disjunctive 
> statements. As this is expanding exponentially its only a theoretical 
> solution, nothing for production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7109) rep:facet returns wrong results for complex queries

2017-12-22 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-7109:
--
Attachment: facetsInMultipleRoots.patch

> rep:facet returns wrong results for complex queries
> ---
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
> Attachments: facetsInMultipleRoots.patch
>
>
> Complex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content1/test/bar
>  + text = lorem ipsum
>  - simple/
>+ tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the DNF 
> of my complex query and executing a query for each of the disjunctive 
> statements. As this is expanding exponentially its only a theoretical 
> solution, nothing for production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7109) rep:facet returns wrong results for complex queries

2017-12-22 Thread Dirk Rudolph (JIRA)
Dirk Rudolph created OAK-7109:
-

 Summary: rep:facet returns wrong results for complex queries
 Key: OAK-7109
 URL: https://issues.apache.org/jira/browse/OAK-7109
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: lucene
Affects Versions: 1.6.7
Reporter: Dirk Rudolph


Complex queries in that case are queries, which are passed to lucene not 
containing all original constraints. For example queries with multiple path 
restrictions like:

{code}
select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
'ipsum') and (isdescendantnode(a,'/content1') or 
isdescendantnode(a,'/content2'))
{code}

In that particular case the index planer gives ":fulltext:ipsum" to lucene even 
though the index supports evaluating path constraints. 

As counting the facets happens on the raw result of lucene, the returned facets 
are incorrect. For example having the following content 

{code}
/content1/test/foo
 + text = lorem ipsum
 - simple/
  + tags = tag1, tag2
/content2/test/bar
 + text = lorem ipsum
 - simple/
  + tags = tag1, tag2
/content1/test/bar
 + text = lorem ipsum
 - simple/
   + tags = tag1, tag2
{code}

the expected result for the dimensions of simple/tags and the query above is 
- tag1: 2
- tag2: 2

as the result set is 2 results long and all documents are equal. The actual 
result set is 
- tag1: 3
- tag2: 3

as the path constraint is not handled by lucene.

To workaround that the only solution that came to my mind is building the DNF 
of my complex query and executing a query for each of the disjunctive 
statements. As this is expanding exponentially its only a theoretical solution, 
nothing for production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750

2017-12-19 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296560#comment-16296560
 ] 

Dirk Rudolph edited comment on OAK-7070 at 12/19/17 9:55 AM:
-

Thanks [~catholicon] Your understanding is correct. And yes, I will move the 
comment to OAK-6597.

To give a bit more context: I'm working currently on an AEM 6.3 project 
implementing fulltext search and we recently upgraded to 1.6.7 to make use of 
the changes in OAK-6750. Though our requirements also ask for excerpts and 
that's why I investigated in OAK-6597 as well and asked there for a backport it 
to 1.6. As this is blocking OAK-6597 and if we agree on making OAK-6597 
available in 1.6 I would still like to backport it. The risk should be minimal 
and afaik I applied those changes to my for of 1.6 without any problems. Don't 
doing so opens the risk for us to use an unoffical port of oak for our project 
- or rejecting some of the customers requirements. This also comes together 
with OAK-7078 and OAK-7071.


was (Author: diru):
Thanks [~catholicon] Your understanding is correct. And yes, I will move the 
comment to OAK-6597.

To give a bit more context: I'm working currently on an AEM 6.3 project 
implementing fulltext search and we recently upgraded to 1.6.7 to make use of 
the changes in OAK-6750. Though our requirements also ask for excerpts and 
that's why I investigated in OAK-6597 as well and ask there for a backport to 
1.6 too. As this is blocking OAK-6597 and if we agree on making OAK-6597 
available in 1.6 I would still like to backport it. The risk should be minimal 
and afaik I applied those changes to my for of 1.6 without any problems. Don't 
doing so opens the risk for us to use an unoffical port of oak for our project 
- or rejecting some of the customers requirements. This also comes together 
with OAK-7078 and OAK-7071.

> rep:excerpt selector broken as regression of OAK-6750
> -
>
> Key: OAK-7070
> URL: https://issues.apache.org/jira/browse/OAK-7070
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7, 1.8
>Reporter: Dirk Rudolph
>Assignee: Vikas Saurabh
>  Labels: excerpt
>
> The change made here:
> https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114
> breaks the logic in line 676:
> {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}}
> This statement doesn't make much sense considering a query like {{select 
> \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or 
> even {{select \[rep:excerpt(text)] from \[test:Page] as page where 
> contains(page.\[text], 'term\*')}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Issue Comment Deleted] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750

2017-12-19 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-7070:
--
Comment: was deleted

(was: There is still the risk, that duplication appear in the excerpt because 
there is a highlighting hit in {{:fulltext}} and one for example in 
{{full:bar}}. To prevent that, it probably makes sense to first do the 
highlighting on {{:fulltext}} fields when analyzeFulltext is enabled and only 
if that hasn't been success full we fallback to the logic of highlighting 
{{full:}} fields. wdyt?)

> rep:excerpt selector broken as regression of OAK-6750
> -
>
> Key: OAK-7070
> URL: https://issues.apache.org/jira/browse/OAK-7070
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7, 1.8
>Reporter: Dirk Rudolph
>Assignee: Vikas Saurabh
>  Labels: excerpt
>
> The change made here:
> https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114
> breaks the logic in line 676:
> {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}}
> This statement doesn't make much sense considering a query like {{select 
> \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or 
> even {{select \[rep:excerpt(text)] from \[test:Page] as page where 
> contains(page.\[text], 'term\*')}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene

2017-12-19 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296561#comment-16296561
 ] 

Dirk Rudolph commented on OAK-6597:
---

There is still the risk, that duplication appear in the excerpt because there 
is a highlighting hit in :fulltext and one for example in full:bar. To prevent 
that, it probably makes sense to first do the highlighting on :fulltext fields 
when analyzeFulltext is enabled and only if that hasn't been success full we 
fallback to the logic of highlighting full: fields. wdyt?

> rep:excerpt not working for content indexed by aggregation in lucene
> 
>
> Key: OAK-6597
> URL: https://issues.apache.org/jira/browse/OAK-6597
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1, 1.7.6, 1.8
>Reporter: Dirk Rudolph
>Assignee: Chetan Mehrotra
>  Labels: excerpt
> Fix For: 1.10
>
> Attachments: excerpt-with-aggregation-test.patch
>
>
> I mentioned that properties that got indexed due to an aggregation are not 
> considered for excerpts (highlighting) as they are not indexed as stored 
> fields.
> See the attached patch that implements a test for excerpts in 
> {{LuceneIndexAggregationTest2}}.
> It creates the following structure:
> {code}
> /content/foo [test:Page]
>  + bar (String)
>  - jcr:content [test:PageContent]
>   + bar (String)
> {code}
> where both strings (the _bar_ property at _foo_ and the _bar_ property at 
> _jcr:content_) contain different text. 
> Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in 
> _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the 
> former one the excerpt is properly provided for the later one it isn't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene

2017-12-19 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296561#comment-16296561
 ] 

Dirk Rudolph edited comment on OAK-6597 at 12/19/17 9:51 AM:
-

There is still the risk, that duplications appear in the excerpt because there 
is a highlighting hit in :fulltext and one for example in full:bar. To prevent 
that, it probably makes sense to first do the highlighting on :fulltext fields 
when analyzeFulltext is enabled and only if that hasn't been successful we 
fallback to the logic of highlighting full: fields. wdyt?


was (Author: diru):
There is still the risk, that duplications appear in the excerpt because there 
is a highlighting hit in :fulltext and one for example in full:bar. To prevent 
that, it probably makes sense to first do the highlighting on :fulltext fields 
when analyzeFulltext is enabled and only if that hasn't been success full we 
fallback to the logic of highlighting full: fields. wdyt?

> rep:excerpt not working for content indexed by aggregation in lucene
> 
>
> Key: OAK-6597
> URL: https://issues.apache.org/jira/browse/OAK-6597
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1, 1.7.6, 1.8
>Reporter: Dirk Rudolph
>Assignee: Chetan Mehrotra
>  Labels: excerpt
> Fix For: 1.10
>
> Attachments: excerpt-with-aggregation-test.patch
>
>
> I mentioned that properties that got indexed due to an aggregation are not 
> considered for excerpts (highlighting) as they are not indexed as stored 
> fields.
> See the attached patch that implements a test for excerpts in 
> {{LuceneIndexAggregationTest2}}.
> It creates the following structure:
> {code}
> /content/foo [test:Page]
>  + bar (String)
>  - jcr:content [test:PageContent]
>   + bar (String)
> {code}
> where both strings (the _bar_ property at _foo_ and the _bar_ property at 
> _jcr:content_) contain different text. 
> Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in 
> _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the 
> former one the excerpt is properly provided for the later one it isn't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene

2017-12-19 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296561#comment-16296561
 ] 

Dirk Rudolph edited comment on OAK-6597 at 12/19/17 9:51 AM:
-

There is still the risk, that duplications appear in the excerpt because there 
is a highlighting hit in :fulltext and one for example in full:bar. To prevent 
that, it probably makes sense to first do the highlighting on :fulltext fields 
when analyzeFulltext is enabled and only if that hasn't been success full we 
fallback to the logic of highlighting full: fields. wdyt?


was (Author: diru):
There is still the risk, that duplication appear in the excerpt because there 
is a highlighting hit in :fulltext and one for example in full:bar. To prevent 
that, it probably makes sense to first do the highlighting on :fulltext fields 
when analyzeFulltext is enabled and only if that hasn't been success full we 
fallback to the logic of highlighting full: fields. wdyt?

> rep:excerpt not working for content indexed by aggregation in lucene
> 
>
> Key: OAK-6597
> URL: https://issues.apache.org/jira/browse/OAK-6597
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1, 1.7.6, 1.8
>Reporter: Dirk Rudolph
>Assignee: Chetan Mehrotra
>  Labels: excerpt
> Fix For: 1.10
>
> Attachments: excerpt-with-aggregation-test.patch
>
>
> I mentioned that properties that got indexed due to an aggregation are not 
> considered for excerpts (highlighting) as they are not indexed as stored 
> fields.
> See the attached patch that implements a test for excerpts in 
> {{LuceneIndexAggregationTest2}}.
> It creates the following structure:
> {code}
> /content/foo [test:Page]
>  + bar (String)
>  - jcr:content [test:PageContent]
>   + bar (String)
> {code}
> where both strings (the _bar_ property at _foo_ and the _bar_ property at 
> _jcr:content_) contain different text. 
> Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in 
> _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the 
> former one the excerpt is properly provided for the later one it isn't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750

2017-12-19 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296560#comment-16296560
 ] 

Dirk Rudolph commented on OAK-7070:
---

Thanks [~catholicon] Your understanding is correct. And yes, I will move the 
comment to OAK-6597.

To give a bit more context: I'm working currently on an AEM 6.3 project 
implementing fulltext search and we recently upgraded to 1.6.7 to make use of 
the changes in OAK-6750. Though our requirements also ask for excerpts and 
that's why I investigated in OAK-6597 as well and ask there for a backport to 
1.6 too. As this is blocking OAK-6597 and if we agree on making OAK-6597 
available in 1.6 I would still like to backport it. The risk should be minimal 
and afaik I applied those changes to my for of 1.6 without any problems. Don't 
doing so opens the risk for us to use an unoffical port of oak for our project 
- or rejecting some of the customers requirements. This also comes together 
with OAK-7078 and OAK-7071.

> rep:excerpt selector broken as regression of OAK-6750
> -
>
> Key: OAK-7070
> URL: https://issues.apache.org/jira/browse/OAK-7070
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7, 1.8
>Reporter: Dirk Rudolph
>Assignee: Vikas Saurabh
>  Labels: excerpt
>
> The change made here:
> https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114
> breaks the logic in line 676:
> {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}}
> This statement doesn't make much sense considering a query like {{select 
> \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or 
> even {{select \[rep:excerpt(text)] from \[test:Page] as page where 
> contains(page.\[text], 'term\*')}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7078) NullPointerException in FilteredSortedSetDocValuesFacetCounts during query evaluation

2017-12-18 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-7078:
--
Description: 
Running the following query {{select \[rep:facet(simple/tags)] from \[nt:base] 
where contains(\[text], 'ipsum')}} with the following content 

{code}
/content/foo
 - text = "lorem lorem"
 + simple/
   - tags = ["tag1", "tag2"]
/content/bar
 - text = "lorem ipsum"
{code}

runs in the following NPE

{code}
java.lang.NullPointerException
at 
org.apache.jackrabbit.oak.plugins.index.lucene.util.FilteredSortedSetDocValuesFacetCounts.getTopChildren(FilteredSortedSetDocValuesFacetCounts.java:63)
at 
org.apache.lucene.facet.MultiFacets.getTopChildren(MultiFacets.java:52)
at 
org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LucenePathCursor$2.getValue(LucenePropertyIndex.java:1646)
... 38 more
{code}

This is because the result set for the query only contains {{/content/bar}} and 
with that the count of the dimension {{simple/tag}} is 0. For that case 
[SortedSetDocValuesFacetCounts#getDim()|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.7.1/lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java#L108]
 returns {{null}} and so does {{getTopChildren}}.

This expected behaviour is properly handled in 
[LucenePropertyIndex.java#L1647|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L1647]
 but not in 
[FilteredSortedSetDocValuesFacetCounts.java#L63|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/util/FilteredSortedSetDocValuesFacetCounts.java#L63]
 where {{topChildren}} is dereferenced without null check.

To workaround that secure facets can be set to false, though the default value 
is true.
 

  was:
Running the following query {{select \[rep:facet(simple/tags)] from \[nt:base] 
where contains(\[text], 'ipsum')}} with the following content 

{code}
/content/foo
 - text = "lorem lorem"
 + simple/
   - tags = ["tag1", "tag2"]
/content/bar
 - text = "lorem 
{code}

runs in the following NPE

{code}
java.lang.NullPointerException
at 
org.apache.jackrabbit.oak.plugins.index.lucene.util.FilteredSortedSetDocValuesFacetCounts.getTopChildren(FilteredSortedSetDocValuesFacetCounts.java:63)
at 
org.apache.lucene.facet.MultiFacets.getTopChildren(MultiFacets.java:52)
at 
org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LucenePathCursor$2.getValue(LucenePropertyIndex.java:1646)
... 38 more
{code}

This is because the result set for the query only contains {{/content/bar}} and 
with that the count of the dimension {{simple/tag}} is 0. For that case 
[SortedSetDocValuesFacetCounts#getDim()|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.7.1/lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java#L108]
 returns {{null}} and so does {{getTopChildren}}.

This expected behaviour is properly handled in 
[LucenePropertyIndex.java#L1647|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L1647]
 but not in 
[FilteredSortedSetDocValuesFacetCounts.java#L63|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/util/FilteredSortedSetDocValuesFacetCounts.java#L63]
 where {{topChildren}} is dereferenced without null check.

To workaround that secure facets can be set to false, though the default value 
is true.
 


> NullPointerException in FilteredSortedSetDocValuesFacetCounts during query 
> evaluation
> -
>
> Key: OAK-7078
> URL: https://issues.apache.org/jira/browse/OAK-7078
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
>
> Running the following query {{select \[rep:facet(simple/tags)] from 
> \[nt:base] where contains(\[text], 'ipsum')}} with the following content 
> {code}
> /content/foo
>  - text = "lorem lorem"
>  + simple/
>- tags = ["tag1", "tag2"]
> /content/bar
>  - text = "lorem ipsum"
> {code}
> runs in the following NPE
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.util.FilteredSortedSetDocValuesFacetCounts.getTopChildren(FilteredSortedSetDocValuesFacetCounts.java:63)
>   at 
> org.apache.lucene.facet.MultiFacets.getTopChildren(MultiFacets.java:52)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.Luc

[jira] [Updated] (OAK-7078) NullPointerException in FilteredSortedSetDocValuesFacetCounts during query evaluation

2017-12-18 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-7078:
--
Labels: facet  (was: )

> NullPointerException in FilteredSortedSetDocValuesFacetCounts during query 
> evaluation
> -
>
> Key: OAK-7078
> URL: https://issues.apache.org/jira/browse/OAK-7078
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>  Labels: facet
>
> Running the following query {{select \[rep:facet(simple/tags)] from 
> \[nt:base] where contains(\[text], 'ipsum')}} with the following content 
> {code}
> /content/foo
>  - text = "lorem lorem"
>  + simple/
>- tags = ["tag1", "tag2"]
> /content/bar
>  - text = "lorem 
> {code}
> runs in the following NPE
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.util.FilteredSortedSetDocValuesFacetCounts.getTopChildren(FilteredSortedSetDocValuesFacetCounts.java:63)
>   at 
> org.apache.lucene.facet.MultiFacets.getTopChildren(MultiFacets.java:52)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LucenePathCursor$2.getValue(LucenePropertyIndex.java:1646)
>   ... 38 more
> {code}
> This is because the result set for the query only contains {{/content/bar}} 
> and with that the count of the dimension {{simple/tag}} is 0. For that case 
> [SortedSetDocValuesFacetCounts#getDim()|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.7.1/lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java#L108]
>  returns {{null}} and so does {{getTopChildren}}.
> This expected behaviour is properly handled in 
> [LucenePropertyIndex.java#L1647|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L1647]
>  but not in 
> [FilteredSortedSetDocValuesFacetCounts.java#L63|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/util/FilteredSortedSetDocValuesFacetCounts.java#L63]
>  where {{topChildren}} is dereferenced without null check.
> To workaround that secure facets can be set to false, though the default 
> value is true.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7078) NullPointerException in FilteredSortedSetDocValuesFacetCounts during query evaluation

2017-12-18 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295793#comment-16295793
 ] 

Dirk Rudolph edited comment on OAK-7078 at 12/18/17 10:52 PM:
--

I created [#77|https://github.com/apache/jackrabbit-oak/pull/77] which contains 
a unit test and the necessary null check for derlerencing topChildren in 
FilteredSortedSetDocValuesFacetCounts. 

In case thats ok, I would like to ask for backporting that to 1.6 branch at 
least (for backwards compatibility in AEM 6.3)


was (Author: diru):
I created [#77|https://github.com/apache/jackrabbit-oak/pull/77] which contains 
a unit test and the necessary null check for derlerencing topChildren in 
FilteredSortedSetDocValuesFacetCounts.

> NullPointerException in FilteredSortedSetDocValuesFacetCounts during query 
> evaluation
> -
>
> Key: OAK-7078
> URL: https://issues.apache.org/jira/browse/OAK-7078
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>
> Running the following query {{select \[rep:facet(simple/tags)] from 
> \[nt:base] where contains(\[text], 'ipsum')}} with the following content 
> {code}
> /content/foo
>  - text = "lorem lorem"
>  + simple/
>- tags = ["tag1", "tag2"]
> /content/bar
>  - text = "lorem 
> {code}
> runs in the following NPE
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.util.FilteredSortedSetDocValuesFacetCounts.getTopChildren(FilteredSortedSetDocValuesFacetCounts.java:63)
>   at 
> org.apache.lucene.facet.MultiFacets.getTopChildren(MultiFacets.java:52)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LucenePathCursor$2.getValue(LucenePropertyIndex.java:1646)
>   ... 38 more
> {code}
> This is because the result set for the query only contains {{/content/bar}} 
> and with that the count of the dimension {{simple/tag}} is 0. For that case 
> [SortedSetDocValuesFacetCounts#getDim()|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.7.1/lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java#L108]
>  returns {{null}} and so does {{getTopChildren}}.
> This expected behaviour is properly handled in 
> [LucenePropertyIndex.java#L1647|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L1647]
>  but not in 
> [FilteredSortedSetDocValuesFacetCounts.java#L63|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/util/FilteredSortedSetDocValuesFacetCounts.java#L63]
>  where {{topChildren}} is dereferenced without null check.
> To workaround that secure facets can be set to false, though the default 
> value is true.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7078) NullPointerException in FilteredSortedSetDocValuesFacetCounts during query evaluation

2017-12-18 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295793#comment-16295793
 ] 

Dirk Rudolph commented on OAK-7078:
---

I created [#77|https://github.com/apache/jackrabbit-oak/pull/77] which contains 
a unit test and the necessary null check for derlerencing topChildren in 
FilteredSortedSetDocValuesFacetCounts.

> NullPointerException in FilteredSortedSetDocValuesFacetCounts during query 
> evaluation
> -
>
> Key: OAK-7078
> URL: https://issues.apache.org/jira/browse/OAK-7078
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7
>Reporter: Dirk Rudolph
>
> Running the following query {{select \[rep:facet(simple/tags)] from 
> \[nt:base] where contains(\[text], 'ipsum')}} with the following content 
> {code}
> /content/foo
>  - text = "lorem lorem"
>  + simple/
>- tags = ["tag1", "tag2"]
> /content/bar
>  - text = "lorem 
> {code}
> runs in the following NPE
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.util.FilteredSortedSetDocValuesFacetCounts.getTopChildren(FilteredSortedSetDocValuesFacetCounts.java:63)
>   at 
> org.apache.lucene.facet.MultiFacets.getTopChildren(MultiFacets.java:52)
>   at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LucenePathCursor$2.getValue(LucenePropertyIndex.java:1646)
>   ... 38 more
> {code}
> This is because the result set for the query only contains {{/content/bar}} 
> and with that the count of the dimension {{simple/tag}} is 0. For that case 
> [SortedSetDocValuesFacetCounts#getDim()|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.7.1/lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java#L108]
>  returns {{null}} and so does {{getTopChildren}}.
> This expected behaviour is properly handled in 
> [LucenePropertyIndex.java#L1647|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L1647]
>  but not in 
> [FilteredSortedSetDocValuesFacetCounts.java#L63|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/util/FilteredSortedSetDocValuesFacetCounts.java#L63]
>  where {{topChildren}} is dereferenced without null check.
> To workaround that secure facets can be set to false, though the default 
> value is true.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7078) NullPointerException in FilteredSortedSetDocValuesFacetCounts during query evaluation

2017-12-18 Thread Dirk Rudolph (JIRA)
Dirk Rudolph created OAK-7078:
-

 Summary: NullPointerException in 
FilteredSortedSetDocValuesFacetCounts during query evaluation
 Key: OAK-7078
 URL: https://issues.apache.org/jira/browse/OAK-7078
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: lucene
Affects Versions: 1.6.7
Reporter: Dirk Rudolph


Running the following query {{select \[rep:facet(simple/tags)] from \[nt:base] 
where contains(\[text], 'ipsum')}} with the following content 

{code}
/content/foo
 - text = "lorem lorem"
 + simple/
   - tags = ["tag1", "tag2"]
/content/bar
 - text = "lorem 
{code}

runs in the following NPE

{code}
java.lang.NullPointerException
at 
org.apache.jackrabbit.oak.plugins.index.lucene.util.FilteredSortedSetDocValuesFacetCounts.getTopChildren(FilteredSortedSetDocValuesFacetCounts.java:63)
at 
org.apache.lucene.facet.MultiFacets.getTopChildren(MultiFacets.java:52)
at 
org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LucenePathCursor$2.getValue(LucenePropertyIndex.java:1646)
... 38 more
{code}

This is because the result set for the query only contains {{/content/bar}} and 
with that the count of the dimension {{simple/tag}} is 0. For that case 
[SortedSetDocValuesFacetCounts#getDim()|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.7.1/lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java#L108]
 returns {{null}} and so does {{getTopChildren}}.

This expected behaviour is properly handled in 
[LucenePropertyIndex.java#L1647|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L1647]
 but not in 
[FilteredSortedSetDocValuesFacetCounts.java#L63|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/util/FilteredSortedSetDocValuesFacetCounts.java#L63]
 where {{topChildren}} is dereferenced without null check.

To workaround that secure facets can be set to false, though the default value 
is true.
 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750

2017-12-18 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294646#comment-16294646
 ] 

Dirk Rudolph commented on OAK-7070:
---

There is still the risk, that duplication appear in the excerpt because there 
is a highlighting hit in {{:fulltext}} and one for example in {{full:bar}}. To 
prevent that, it probably makes sense to first do the highlighting on 
{{:fulltext}} fields when analyzeFulltext is enabled and only if that hasn't 
been success full we fallback to the logic of highlighting {{full:}} fields. 
wdyt?

> rep:excerpt selector broken as regression of OAK-6750
> -
>
> Key: OAK-7070
> URL: https://issues.apache.org/jira/browse/OAK-7070
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7, 1.8
>Reporter: Dirk Rudolph
>Assignee: Vikas Saurabh
>  Labels: excerpt
>
> The change made here:
> https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114
> breaks the logic in line 676:
> {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}}
> This statement doesn't make much sense considering a query like {{select 
> \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or 
> even {{select \[rep:excerpt(text)] from \[test:Page] as page where 
> contains(page.\[text], 'term\*')}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7071) PostingsHighlighter, Highlighter and SimpleExcerptProvider return all different formats for excerpts

2017-12-17 Thread Dirk Rudolph (JIRA)
Dirk Rudolph created OAK-7071:
-

 Summary: PostingsHighlighter, Highlighter and 
SimpleExcerptProvider return all different formats for excerpts
 Key: OAK-7071
 URL: https://issues.apache.org/jira/browse/OAK-7071
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: lucene
Affects Versions: 1.6.7, 1.8
Reporter: Dirk Rudolph


*PostingsHighligher* returns for example 
{quote} 
[my text with any highlighting followed by more text]
{quote}
because the PostingsHighligher itself returns for each field a {{String[]}} of 
phrases limited by the beforehand given max phrases. This String[] is the 
transformed to String using {{Arrays.toString()}} at 
[LucenePropertyIndex.java#L688|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L688]
 causing the value to be wrapped in square brackets.

*Highlighter* returns 
{quote}
my text with any highlighting followed by more text 
{quote}

*SimpleExcerptProvider* returns
{quote}
my text with any highlighting followed by more text 
{quote}

As the PostingsHighligher cannot get any custom prefix or suffix, I would 
suggest set  as default for the others as well to prevent any further 
text transformation post extracting the excerpts.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750

2017-12-17 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294092#comment-16294092
 ] 

Dirk Rudolph edited comment on OAK-7070 at 12/17/17 1:18 PM:
-

Following up with OAK-4401 it looks like rep:excerpt(propertyName) and 
rep:excerpt(.) are not meant to be used as columns in jcr-sql2 but to get an 
excerpt based on a result row. 

This is at least also the same behaviour as implemented in 
[XPathToSQL2Converter.java|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/xpath/XPathToSQL2Converter.java]


was (Author: diru):
Following up with OAK-4401 it looks like rep:excerpt(propertyName) and 
rep:excerpt(.) are not meant to be used as columns in jcr-sql2 but to get an 
excerpt based on a result row. 

> rep:excerpt selector broken as regression of OAK-6750
> -
>
> Key: OAK-7070
> URL: https://issues.apache.org/jira/browse/OAK-7070
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7, 1.8
>Reporter: Dirk Rudolph
>Assignee: Vikas Saurabh
>  Labels: excerpt
>
> The change made here:
> https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114
> breaks the logic in line 676:
> {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}}
> This statement doesn't make much sense considering a query like {{select 
> \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or 
> even {{select \[rep:excerpt(text)] from \[test:Page] as page where 
> contains(page.\[text], 'term\*')}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene

2017-12-17 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294133#comment-16294133
 ] 

Dirk Rudolph edited comment on OAK-6597 at 12/17/17 1:10 PM:
-

I opened #76 for that, though that requires 
https://github.com/apache/jackrabbit-oak/pull/75 to be merged first. It would 
be great if that could be backported to 1.6 for compatibility with AEM 6.3


was (Author: diru):
I opened #76 for that, though that requires 
https://github.com/apache/jackrabbit-oak/pull/75 to be merged first.

> rep:excerpt not working for content indexed by aggregation in lucene
> 
>
> Key: OAK-6597
> URL: https://issues.apache.org/jira/browse/OAK-6597
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1, 1.7.6, 1.8
>Reporter: Dirk Rudolph
>Assignee: Chetan Mehrotra
>  Labels: excerpt
> Fix For: 1.10
>
> Attachments: excerpt-with-aggregation-test.patch
>
>
> I mentioned that properties that got indexed due to an aggregation are not 
> considered for excerpts (highlighting) as they are not indexed as stored 
> fields.
> See the attached patch that implements a test for excerpts in 
> {{LuceneIndexAggregationTest2}}.
> It creates the following structure:
> {code}
> /content/foo [test:Page]
>  + bar (String)
>  - jcr:content [test:PageContent]
>   + bar (String)
> {code}
> where both strings (the _bar_ property at _foo_ and the _bar_ property at 
> _jcr:content_) contain different text. 
> Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in 
> _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the 
> former one the excerpt is properly provided for the later one it isn't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene

2017-12-17 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294133#comment-16294133
 ] 

Dirk Rudolph edited comment on OAK-6597 at 12/17/17 1:06 PM:
-

I opened #76 for that, though that requires 
https://github.com/apache/jackrabbit-oak/pull/75 to be merged first.


was (Author: diru):
I opened #76 for that, though that requires 
https://github.com/apache/jackrabbit-oak/pull/75 to be merged. 

> rep:excerpt not working for content indexed by aggregation in lucene
> 
>
> Key: OAK-6597
> URL: https://issues.apache.org/jira/browse/OAK-6597
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1, 1.7.6, 1.8
>Reporter: Dirk Rudolph
>Assignee: Chetan Mehrotra
>  Labels: excerpt
> Fix For: 1.10
>
> Attachments: excerpt-with-aggregation-test.patch
>
>
> I mentioned that properties that got indexed due to an aggregation are not 
> considered for excerpts (highlighting) as they are not indexed as stored 
> fields.
> See the attached patch that implements a test for excerpts in 
> {{LuceneIndexAggregationTest2}}.
> It creates the following structure:
> {code}
> /content/foo [test:Page]
>  + bar (String)
>  - jcr:content [test:PageContent]
>   + bar (String)
> {code}
> where both strings (the _bar_ property at _foo_ and the _bar_ property at 
> _jcr:content_) contain different text. 
> Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in 
> _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the 
> former one the excerpt is properly provided for the later one it isn't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750

2017-12-17 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294066#comment-16294066
 ] 

Dirk Rudolph edited comment on OAK-7070 at 12/17/17 12:52 PM:
--

To answer my own question: there is [longRepExcerpt 
test|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndexTest.java#L2394]
 which queries for for [rep:excerpt] but doesn't asserts its value from the 
result. 

The issue caused by that is that the rep:except in the result is null, not that 
the query is failing.

... this is because executeQuery(java.lang.String, java.lang.String, boolean, 
boolean) is called with pathsOnly.


was (Author: diru):
To answer my own question: there is [longRepExcerpt 
test|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndexTest.java#L2394]
 which queries for for [rep:excerpt] but doesn't assets its value from the 
result. 

The issue caused by that is that the rep:except in the result is null, not that 
the query is failing.

... this is because executeQuery(java.lang.String, java.lang.String, boolean, 
boolean) is called with pathsOnly.

> rep:excerpt selector broken as regression of OAK-6750
> -
>
> Key: OAK-7070
> URL: https://issues.apache.org/jira/browse/OAK-7070
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7, 1.8
>Reporter: Dirk Rudolph
>Assignee: Vikas Saurabh
>  Labels: excerpt
>
> The change made here:
> https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114
> breaks the logic in line 676:
> {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}}
> This statement doesn't make much sense considering a query like {{select 
> \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or 
> even {{select \[rep:excerpt(text)] from \[test:Page] as page where 
> contains(page.\[text], 'term\*')}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750

2017-12-17 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-7070:
--
Labels: excerpt  (was: )

> rep:excerpt selector broken as regression of OAK-6750
> -
>
> Key: OAK-7070
> URL: https://issues.apache.org/jira/browse/OAK-7070
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7, 1.8
>Reporter: Dirk Rudolph
>Assignee: Vikas Saurabh
>  Labels: excerpt
>
> The change made here:
> https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114
> breaks the logic in line 676:
> {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}}
> This statement doesn't make much sense considering a query like {{select 
> \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or 
> even {{select \[rep:excerpt(text)] from \[test:Page] as page where 
> contains(page.\[text], 'term\*')}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750

2017-12-17 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-7070:
--
Component/s: lucene

> rep:excerpt selector broken as regression of OAK-6750
> -
>
> Key: OAK-7070
> URL: https://issues.apache.org/jira/browse/OAK-7070
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.7, 1.8
>Reporter: Dirk Rudolph
>Assignee: Vikas Saurabh
>  Labels: excerpt
>
> The change made here:
> https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114
> breaks the logic in line 676:
> {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}}
> This statement doesn't make much sense considering a query like {{select 
> \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or 
> even {{select \[rep:excerpt(text)] from \[test:Page] as page where 
> contains(page.\[text], 'term\*')}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene

2017-12-17 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-6597:
--
Affects Version/s: 1.8

> rep:excerpt not working for content indexed by aggregation in lucene
> 
>
> Key: OAK-6597
> URL: https://issues.apache.org/jira/browse/OAK-6597
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1, 1.7.6, 1.8
>Reporter: Dirk Rudolph
>Assignee: Chetan Mehrotra
>  Labels: excerpt
> Fix For: 1.10
>
> Attachments: excerpt-with-aggregation-test.patch
>
>
> I mentioned that properties that got indexed due to an aggregation are not 
> considered for excerpts (highlighting) as they are not indexed as stored 
> fields.
> See the attached patch that implements a test for excerpts in 
> {{LuceneIndexAggregationTest2}}.
> It creates the following structure:
> {code}
> /content/foo [test:Page]
>  + bar (String)
>  - jcr:content [test:PageContent]
>   + bar (String)
> {code}
> where both strings (the _bar_ property at _foo_ and the _bar_ property at 
> _jcr:content_) contain different text. 
> Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in 
> _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the 
> former one the excerpt is properly provided for the later one it isn't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750

2017-12-17 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-7070:
--
Affects Version/s: 1.6.7

> rep:excerpt selector broken as regression of OAK-6750
> -
>
> Key: OAK-7070
> URL: https://issues.apache.org/jira/browse/OAK-7070
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Affects Versions: 1.6.7, 1.8
>Reporter: Dirk Rudolph
>Assignee: Vikas Saurabh
>
> The change made here:
> https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114
> breaks the logic in line 676:
> {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}}
> This statement doesn't make much sense considering a query like {{select 
> \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or 
> even {{select \[rep:excerpt(text)] from \[test:Page] as page where 
> contains(page.\[text], 'term\*')}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene

2017-12-17 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294106#comment-16294106
 ] 

Dirk Rudolph commented on OAK-6597:
---

[~chetanm] reconsidering what you said above, I looks like you are completely 
right with you proposed approach. I missed with my assumption that fulltext 
fields are stored as multivalue fields instead of a concatenated state. So I 
don't expect any weird behaviour as described above. I will try to come up with 
a PR implementing your proposed solution. 

> rep:excerpt not working for content indexed by aggregation in lucene
> 
>
> Key: OAK-6597
> URL: https://issues.apache.org/jira/browse/OAK-6597
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1, 1.7.6
>Reporter: Dirk Rudolph
>Assignee: Chetan Mehrotra
>  Labels: excerpt
> Fix For: 1.10
>
> Attachments: excerpt-with-aggregation-test.patch
>
>
> I mentioned that properties that got indexed due to an aggregation are not 
> considered for excerpts (highlighting) as they are not indexed as stored 
> fields.
> See the attached patch that implements a test for excerpts in 
> {{LuceneIndexAggregationTest2}}.
> It creates the following structure:
> {code}
> /content/foo [test:Page]
>  + bar (String)
>  - jcr:content [test:PageContent]
>   + bar (String)
> {code}
> where both strings (the _bar_ property at _foo_ and the _bar_ property at 
> _jcr:content_) contain different text. 
> Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in 
> _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the 
> former one the excerpt is properly provided for the later one it isn't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750

2017-12-17 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294104#comment-16294104
 ] 

Dirk Rudolph commented on OAK-7070:
---

I opened a PR here: https://github.com/apache/jackrabbit-oak/pull/75. It would 
be great to get that applied to 1.6 and 1.7 branches as well (for backward 
compatibility in AEM 6.3)

> rep:excerpt selector broken as regression of OAK-6750
> -
>
> Key: OAK-7070
> URL: https://issues.apache.org/jira/browse/OAK-7070
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Affects Versions: 1.8
>Reporter: Dirk Rudolph
>Assignee: Vikas Saurabh
>
> The change made here:
> https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114
> breaks the logic in line 676:
> {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}}
> This statement doesn't make much sense considering a query like {{select 
> \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or 
> even {{select \[rep:excerpt(text)] from \[test:Page] as page where 
> contains(page.\[text], 'term\*')}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750

2017-12-17 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294092#comment-16294092
 ] 

Dirk Rudolph commented on OAK-7070:
---

Following up with OAK-4401 it looks like rep:excerpt(propertyName) and 
rep:excerpt(.) are not meant to be used as columns in jcr-sql2 but to get an 
excerpt based on a result row. 

> rep:excerpt selector broken as regression of OAK-6750
> -
>
> Key: OAK-7070
> URL: https://issues.apache.org/jira/browse/OAK-7070
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Affects Versions: 1.8
>Reporter: Dirk Rudolph
>Assignee: Vikas Saurabh
>
> The change made here:
> https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114
> breaks the logic in line 676:
> {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}}
> This statement doesn't make much sense considering a query like {{select 
> \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or 
> even {{select \[rep:excerpt(text)] from \[test:Page] as page where 
> contains(page.\[text], 'term\*')}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750

2017-12-17 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294082#comment-16294082
 ] 

Dirk Rudolph edited comment on OAK-7070 at 12/17/17 11:23 AM:
--

Ok in the query parsing there is a different issue: 

The SQL2Parser doesn't properly parse rep:excerpt(.) and 
rep:excerpt(propertyName), but expects EXCERPT as keyword, see 
[SQL2Parser.java#L902|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/SQL2Parser.java#L902].
 So for example rep:excerpt(.) is interpreted as normal propertyName and goes 
to 
[SelectorImpl.java#L392|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/ast/SelectorImpl.java#L392]
 where its not taken into account due to checking for equality. 

I cannot find anything about the EXCERPT in the jcr-sql2 specs, but for facets 
there is nothing special in the SQL2Parser, so I assume we keep it like that as 
it is and adapt the SelectorImpl further instead. 


was (Author: diru):
Ok in the query parsing there is a different issue: 

The SQL2Parser doesn't properly parse rep:excerpt(.) and 
rep:excerpt(propertyName), but expects EXCERPT as keyword, see 
[SQL2Parser.java#L902|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/SQL2Parser.java#L902].
 So for example rep:excerpt(.) is interpreted as normal propertyName and goes 
to 
[SelectorImpl.java#L392|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/ast/SelectorImpl.java#L392]
 where its not taken into account due to checking for equality. 

I cannot find anything about the EXCERPT in the jcr specs, but for facets there 
is nothing special in the SQL2Parser, so I assume we keep it like that as it is 
and adapt the SelectorImpl further instead. 

> rep:excerpt selector broken as regression of OAK-6750
> -
>
> Key: OAK-7070
> URL: https://issues.apache.org/jira/browse/OAK-7070
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Affects Versions: 1.8
>Reporter: Dirk Rudolph
>Assignee: Vikas Saurabh
>
> The change made here:
> https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114
> breaks the logic in line 676:
> {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}}
> This statement doesn't make much sense considering a query like {{select 
> \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or 
> even {{select \[rep:excerpt(text)] from \[test:Page] as page where 
> contains(page.\[text], 'term\*')}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750

2017-12-17 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294082#comment-16294082
 ] 

Dirk Rudolph commented on OAK-7070:
---

Ok in the query parsing there is a different issue: 

The SQL2Parser doesn't properly parse rep:excerpt(.) and 
rep:excerpt(propertyName), but expects EXCERPT as keyword, see 
[SQL2Parser.java#L902|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/SQL2Parser.java#L902].
 So for example rep:excerpt(.) is interpreted as normal propertyName and goes 
to 
[SelectorImpl.java#L392|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/ast/SelectorImpl.java#L392]
 where its not taken into account due to checking for equality. 

I cannot find anything about the EXCERPT in the jcr specs, but for facets there 
is nothing special in the SQL2Parser, so I assume we keep it like that as it is 
and adapt the SelectorImpl further instead. 

> rep:excerpt selector broken as regression of OAK-6750
> -
>
> Key: OAK-7070
> URL: https://issues.apache.org/jira/browse/OAK-7070
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Affects Versions: 1.8
>Reporter: Dirk Rudolph
>Assignee: Vikas Saurabh
>
> The change made here:
> https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114
> breaks the logic in line 676:
> {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}}
> This statement doesn't make much sense considering a query like {{select 
> \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or 
> even {{select \[rep:excerpt(text)] from \[test:Page] as page where 
> contains(page.\[text], 'term\*')}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750

2017-12-17 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294069#comment-16294069
 ] 

Dirk Rudolph commented on OAK-7070:
---

I have a PR almost ready (fixing the Test and the issue) - If you want to?

> rep:excerpt selector broken as regression of OAK-6750
> -
>
> Key: OAK-7070
> URL: https://issues.apache.org/jira/browse/OAK-7070
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Affects Versions: 1.8
>Reporter: Dirk Rudolph
>Assignee: Vikas Saurabh
>
> The change made here:
> https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114
> breaks the logic in line 676:
> {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}}
> This statement doesn't make much sense considering a query like {{select 
> \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or 
> even {{select \[rep:excerpt(text)] from \[test:Page] as page where 
> contains(page.\[text], 'term\*')}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750

2017-12-17 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294066#comment-16294066
 ] 

Dirk Rudolph edited comment on OAK-7070 at 12/17/17 10:13 AM:
--

To answer my own question: there is [longRepExcerpt 
test|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndexTest.java#L2394]
 which queries for for [rep:excerpt] but doesn't assets its value from the 
result. 

The issue caused by that is that the rep:except in the result is null, not that 
the query is failing.

... this is because executeQuery(java.lang.String, java.lang.String, boolean, 
boolean) is called with pathsOnly.


was (Author: diru):
To answer my own question: there is [longRepExcerpt 
test|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndexTest.java#L2394]
 which queries for for [rep:excerpt] but doesn't assets its value from the 
result. 

The issue caused by that is that the rep:except in the result is null, not that 
the query is failing.

> rep:excerpt selector broken as regression of OAK-6750
> -
>
> Key: OAK-7070
> URL: https://issues.apache.org/jira/browse/OAK-7070
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Affects Versions: 1.8
>Reporter: Dirk Rudolph
>
> The change made here:
> https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114
> breaks the logic in line 676:
> {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}}
> This statement doesn't make much sense considering a query like {{select 
> \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or 
> even {{select \[rep:excerpt(text)] from \[test:Page] as page where 
> contains(page.\[text], 'term\*')}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750

2017-12-17 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294066#comment-16294066
 ] 

Dirk Rudolph commented on OAK-7070:
---

To answer my own question: there is [longRepExcerpt 
test|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndexTest.java#L2394]
 which queries for for [rep:excerpt] but doesn't assets its value from the 
result. 

The issue caused by that is that the rep:except in the result is null, not that 
the query is failing.

> rep:excerpt selector broken as regression of OAK-6750
> -
>
> Key: OAK-7070
> URL: https://issues.apache.org/jira/browse/OAK-7070
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Affects Versions: 1.8
>Reporter: Dirk Rudolph
>
> The change made here:
> https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114
> breaks the logic in line 676:
> {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}}
> This statement doesn't make much sense considering a query like {{select 
> \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or 
> even {{select \[rep:excerpt(text)] from \[test:Page] as page where 
> contains(page.\[text], 'term\*')}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750

2017-12-17 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294063#comment-16294063
 ] 

Dirk Rudolph commented on OAK-7070:
---

[~catholicon] see the case in OAK-6597 (i''m currently working on) - it links 
to a ticket where a (currently disabled) test was added which was ok for 1 of 
the cases but failing for the second one. Now its failing for both. Though 
thats not 100% related so let me propose one.

Which test case are you referring to?

> rep:excerpt selector broken as regression of OAK-6750
> -
>
> Key: OAK-7070
> URL: https://issues.apache.org/jira/browse/OAK-7070
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Affects Versions: 1.8
>Reporter: Dirk Rudolph
>
> The change made here:
> https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114
> breaks the logic in line 676:
> {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}}
> This statement doesn't make much sense considering a query like {{select 
> \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or 
> even {{select \[rep:excerpt(text)] from \[test:Page] as page where 
> contains(page.\[text], 'term\*')}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750

2017-12-17 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294060#comment-16294060
 ] 

Dirk Rudolph commented on OAK-7070:
---

As far as I can see there are the following selectors in sql2 queries for 
excerpts supported:

1) [rep:excerpt]
2) [rep:excerpt(.)]
3) [rep:excerpt(propertyName)]

The expression previous to OAK-6750 includes only 1), where as a change to 
{{startsWith}} with trailing opening bracket (similar as to the facets) would 
only include 2) and 3). Though without the trailing opening bracket a bit to 
much might get consumed there so my suggestion is to use a disjunction of the 
both above. 

> rep:excerpt selector broken as regression of OAK-6750
> -
>
> Key: OAK-7070
> URL: https://issues.apache.org/jira/browse/OAK-7070
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Affects Versions: 1.8
>Reporter: Dirk Rudolph
>
> The change made here:
> https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114
> breaks the logic in line 676:
> {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}}
> This statement doesn't make much sense considering a query like {{select 
> \[rep:excerpt] from \[test:Page] as page where contains(*, 'term*')}} or even 
> {{select \[rep:excerpt(text)] from \[test:Page] as page where 
> contains(page.\[text], 'term*')}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750

2017-12-17 Thread Dirk Rudolph (JIRA)
Dirk Rudolph created OAK-7070:
-

 Summary: rep:excerpt selector broken as regression of OAK-6750
 Key: OAK-7070
 URL: https://issues.apache.org/jira/browse/OAK-7070
 Project: Jackrabbit Oak
  Issue Type: Bug
Affects Versions: 1.8
Reporter: Dirk Rudolph


The change made here:
https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114

breaks the logic in line 676:

{{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}}

This statement doesn't make much sense considering a query like {{select 
\[rep:excerpt] from \[test:Page] as page where contains(*, 'term*')}} or even 
{{select \[rep:excerpt(text)] from \[test:Page] as page where 
contains(page.\[text], 'term*')}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-6676) rep:facet doesn't work in combination with aliases in JCR-SQL2

2017-09-15 Thread Dirk Rudolph (JIRA)
Dirk Rudolph created OAK-6676:
-

 Summary: rep:facet doesn't work in combination with aliases in 
JCR-SQL2
 Key: OAK-6676
 URL: https://issues.apache.org/jira/browse/OAK-6676
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: core
Affects Versions: 1.6.1
Reporter: Dirk Rudolph
Priority: Minor


Within 
[SelectorImpl#createFilter()|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/ast/SelectorImpl.java?view=markup#l389]
 the columnName is used to determine wether to add a indicating restriction for 
facets or not. 

So using query like

{code}
select [rep:facet(tags)] as facets from ...
{code}

Will not contain facets.

Same applies for {{rep:excerpt}}.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6676) rep:facet doesn't work in combination with aliases in JCR-SQL2

2017-09-15 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167678#comment-16167678
 ] 

Dirk Rudolph commented on OAK-6676:
---

I'm going to provide a patch with a unit test soon.

> rep:facet doesn't work in combination with aliases in JCR-SQL2
> --
>
> Key: OAK-6676
> URL: https://issues.apache.org/jira/browse/OAK-6676
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.6.1
>Reporter: Dirk Rudolph
>Priority: Minor
>
> Within 
> [SelectorImpl#createFilter()|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/ast/SelectorImpl.java?view=markup#l389]
>  the columnName is used to determine wether to add a indicating restriction 
> for facets or not. 
> So using query like
> {code}
> select [rep:facet(tags)] as facets from ...
> {code}
> Will not contain facets.
> Same applies for {{rep:excerpt}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6643) Return a common format of excerpts independent of the highlighter used

2017-09-11 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-6643:
--
Labels: excerpt  (was: )

> Return a common format of excerpts independent of the highlighter used
> --
>
> Key: OAK-6643
> URL: https://issues.apache.org/jira/browse/OAK-6643
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Affects Versions: 1.6.1
>Reporter: Dirk Rudolph
>Priority: Minor
>  Labels: excerpt
>
> While using {{rep:excerpt}} functionality we mentioned that the format of the 
> {{PostingsHighlighter}} differs to the one of {{Highlighter}}. See the 
> example below:
> {{PostingsHighlighter}} 
> {quote}
> [In Central & Eastern Europe and Asia Pacific Allianz is one of the 
> leading international insurance companies. ]
> {quote}
> {{Highlighter}}
> {quote}
> "Life Risk Insurance"
> {quote}
> It would be great to have one single format, so that application doesn't have 
> to handle those differences. 
> Additionally the {{Arrays.toString(...)}} used to generated an excerpt string 
> from the {{PostingsHighlighter}} causes the excerpt text to be wrapped in 
> "[...]", I guess thats not intended.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene

2017-09-11 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-6597:
--
Labels: excerpt  (was: )

> rep:excerpt not working for content indexed by aggregation in lucene
> 
>
> Key: OAK-6597
> URL: https://issues.apache.org/jira/browse/OAK-6597
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1, 1.7.6
>Reporter: Dirk Rudolph
>  Labels: excerpt
> Fix For: 1.8
>
> Attachments: excerpt-with-aggregation-test.patch
>
>
> I mentioned that properties that got indexed due to an aggregation are not 
> considered for excerpts (highlighting) as they are not indexed as stored 
> fields.
> See the attached patch that implements a test for excerpts in 
> {{LuceneIndexAggregationTest2}}.
> It creates the following structure:
> {code}
> /content/foo [test:Page]
>  + bar (String)
>  - jcr:content [test:PageContent]
>   + bar (String)
> {code}
> where both strings (the _bar_ property at _foo_ and the _bar_ property at 
> _jcr:content_) contain different text. 
> Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in 
> _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the 
> former one the excerpt is properly provided for the later one it isn't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6643) Return a common format of excerpts independent of the highlighter used

2017-09-11 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-6643:
--
Component/s: lucene

> Return a common format of excerpts independent of the highlighter used
> --
>
> Key: OAK-6643
> URL: https://issues.apache.org/jira/browse/OAK-6643
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: lucene
>Affects Versions: 1.6.1
>Reporter: Dirk Rudolph
>Priority: Minor
>  Labels: excerpt
>
> While using {{rep:excerpt}} functionality we mentioned that the format of the 
> {{PostingsHighlighter}} differs to the one of {{Highlighter}}. See the 
> example below:
> {{PostingsHighlighter}} 
> {quote}
> [In Central & Eastern Europe and Asia Pacific Allianz is one of the 
> leading international insurance companies. ]
> {quote}
> {{Highlighter}}
> {quote}
> "Life Risk Insurance"
> {quote}
> It would be great to have one single format, so that application doesn't have 
> to handle those differences. 
> Additionally the {{Arrays.toString(...)}} used to generated an excerpt string 
> from the {{PostingsHighlighter}} causes the excerpt text to be wrapped in 
> "[...]", I guess thats not intended.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-6643) Return a common format of excerpts independent of the highlighter used

2017-09-11 Thread Dirk Rudolph (JIRA)
Dirk Rudolph created OAK-6643:
-

 Summary: Return a common format of excerpts independent of the 
highlighter used
 Key: OAK-6643
 URL: https://issues.apache.org/jira/browse/OAK-6643
 Project: Jackrabbit Oak
  Issue Type: Improvement
Reporter: Dirk Rudolph
Priority: Minor


While using {{rep:excerpt}} functionality we mentioned that the format of the 
{{PostingsHighlighter}} differs to the one of {{Highlighter}}. See the example 
below:

{{PostingsHighlighter}} 
{quote}
[In Central & Eastern Europe and Asia Pacific Allianz is one of the leading 
international insurance companies. ]
{quote}

{{Highlighter}}
{quote}
"Life Risk Insurance"
{quote}

It would be great to have one single format, so that application doesn't have 
to handle those differences. 

Additionally the {{Arrays.toString(...)}} used to generated an excerpt string 
from the {{PostingsHighlighter}} causes the excerpt text to be wrapped in 
"[...]", I guess thats not intended.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6643) Return a common format of excerpts independent of the highlighter used

2017-09-11 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-6643:
--
Affects Version/s: 1.6.1

> Return a common format of excerpts independent of the highlighter used
> --
>
> Key: OAK-6643
> URL: https://issues.apache.org/jira/browse/OAK-6643
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>Affects Versions: 1.6.1
>Reporter: Dirk Rudolph
>Priority: Minor
>
> While using {{rep:excerpt}} functionality we mentioned that the format of the 
> {{PostingsHighlighter}} differs to the one of {{Highlighter}}. See the 
> example below:
> {{PostingsHighlighter}} 
> {quote}
> [In Central & Eastern Europe and Asia Pacific Allianz is one of the 
> leading international insurance companies. ]
> {quote}
> {{Highlighter}}
> {quote}
> "Life Risk Insurance"
> {quote}
> It would be great to have one single format, so that application doesn't have 
> to handle those differences. 
> Additionally the {{Arrays.toString(...)}} used to generated an excerpt string 
> from the {{PostingsHighlighter}} causes the excerpt text to be wrapped in 
> "[...]", I guess thats not intended.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-6600) queries on Date type results empty search results

2017-08-31 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149122#comment-16149122
 ] 

Dirk Rudolph edited comment on OAK-6600 at 8/31/17 4:17 PM:


What you are showing here simply means that the format used to render your node 
doesn't expose its timezone. 

Still the timezone is stored as part of the calendar/date object within oak and 
date queries obviously don't handle dates as strings. What you can try is 
adding a timezone (your current servers timezone) to the date string within 
your query. Alternatively try to subtract 1d from the time in your >= query to 
check if its found afterwards - if so you most likely have a timezone issue.


was (Author: diru):
What you are showing here simply means that the format used to render your node 
doesn't expose its timezone. 

Still the timezone is stored as part of the calendar/date object within oak and 
ate queries obviously don't handle dates as strings. What you can try is adding 
a timezone (your current servers timezone) to the date string within your 
query. Alternatively try to subtract 1d from the time in your >= query to check 
if its found afterwards - if so you most likely have a timezone issue.

>  queries on Date type results empty search results
> --
>
> Key: OAK-6600
> URL: https://issues.apache.org/jira/browse/OAK-6600
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.6.0
>Reporter: Mouli 
>Priority: Blocker
>  Labels: jcr, oak
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> there are two issues here 
> 1) by default when we try to store date in jcr it saves in below format 
> 2017-08-21 21:35:33 when i perform query on this date it is showing empty 
> search results.
> select [body/dataNum] from [cas:article] where [jcr:lastModified] = 
> '2017-08-29 16:36:39' order by [jcr:created] DESC
> 2) when i try to use 
> select [body/dataNum] from [cas:article] where [jcr:lastModified] 
> cast('2017-08-29 16:36:39' as date)  order by [jcr:created] DESC it is 
> throwing an error not a date string, after some investigation i found that it 
> will accept only ISO8601 format (-MM-dd'T'HH:mm:ss.SSSZ). when i try to 
> store date in above format it automatically converts to .MM.dd HH:mm:ss . 
> now my questions are 
> 1) how to change default date format of jcr to  -MM-dd'T'HH:mm:ss.SSSZ
> 2) how to perform queries on Dates



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6607) Oak facet indexes seems to only work for nt:base

2017-08-31 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149133#comment-16149133
 ] 

Dirk Rudolph commented on OAK-6607:
---

Facets don't work in combination with aggregation (see OAK-6597), though for me 
they work on {{cq:PageContent}} with property {{cq:tags}} quite well.

> Oak facet indexes seems to only work for nt:base
> 
>
> Key: OAK-6607
> URL: https://issues.apache.org/jira/browse/OAK-6607
> Project: Jackrabbit Oak
>  Issue Type: Bug
>Reporter: Van MOHAMED
>
> We are working in AEM and want to implement a Lucene facet index based on the 
> definition found here: 
> https://jackrabbit.apache.org/oak/docs/query/lucene.html. However, it only 
> works if you limit the node type to nt:base. Here's a snippet of a working 
> facet index definition.
> {code:xml}
>  jcr:primaryType="oak:QueryIndexDefinition"
> compatVersion="{Long}2"
> reindex="{Boolean}false"
> reindexCount="{Long}1"
> type="lucene"
> evaluatePathRestrictions="{Boolean}true"
> async="async" >
> 
> 
> 
>  jcr:primaryType="nt:unstructured"
> propertyIndex="{Boolean}true"
> facets="{Boolean}true"
> analyzed="{Boolean}true"
> nodeScopeIndex="{Boolean}true"
> name="contentType" />
> 
> 
> 
> 
> {code}
> If we were to replace "nt:base" by "dam:Asset" for instance, and update the 
> contentType name property accordingly (in our case, updated in 
> jcr:content/metadata/contentType), then the facet wouldn't work anymore. In 
> the logs, we would get the message "facets for {} not yet indexed".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6600) queries on Date type results empty search results

2017-08-31 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149122#comment-16149122
 ] 

Dirk Rudolph commented on OAK-6600:
---

What you are showing here simply means that the format used to render your node 
doesn't expose its timezone. 

Still the timezone is stored as part of the calendar/date object within oak and 
ate queries obviously don't handle dates as strings. What you can try is adding 
a timezone (your current servers timezone) to the date string within your 
query. Alternatively try to subtract 1d from the time in your >= query to check 
if its found afterwards - if so you most likely have a timezone issue.

>  queries on Date type results empty search results
> --
>
> Key: OAK-6600
> URL: https://issues.apache.org/jira/browse/OAK-6600
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.6.0
>Reporter: Mouli 
>Priority: Blocker
>  Labels: jcr, oak
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> there are two issues here 
> 1) by default when we try to store date in jcr it saves in below format 
> 2017-08-21 21:35:33 when i perform query on this date it is showing empty 
> search results.
> select [body/dataNum] from [cas:article] where [jcr:lastModified] = 
> '2017-08-29 16:36:39' order by [jcr:created] DESC
> 2) when i try to use 
> select [body/dataNum] from [cas:article] where [jcr:lastModified] 
> cast('2017-08-29 16:36:39' as date)  order by [jcr:created] DESC it is 
> throwing an error not a date string, after some investigation i found that it 
> will accept only ISO8601 format (-MM-dd'T'HH:mm:ss.SSSZ). when i try to 
> store date in above format it automatically converts to .MM.dd HH:mm:ss . 
> now my questions are 
> 1) how to change default date format of jcr to  -MM-dd'T'HH:mm:ss.SSSZ
> 2) how to perform queries on Dates



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6600) queries on Date type results empty search results

2017-08-31 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16148600#comment-16148600
 ] 

Dirk Rudolph commented on OAK-6600:
---

Its stored as {{Date}} field. The format shouldn't matter.

>  queries on Date type results empty search results
> --
>
> Key: OAK-6600
> URL: https://issues.apache.org/jira/browse/OAK-6600
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.6.0
>Reporter: Mouli 
>Priority: Blocker
>  Labels: jcr, oak
>
> there are two issues here 
> 1) by default when we try to store date in jcr it saves in below format 
> 2017-08-21 21:35:33 when i perform query on this date it is showing empty 
> search results.
> select [body/dataNum] from [cas:article] where [jcr:lastModified] = 
> '2017-08-29 16:36:39' order by [jcr:created] DESC
> 2) when i try to use 
> select [body/dataNum] from [cas:article] where [jcr:lastModified] 
> cast('2017-08-29 16:36:39' as date)  order by [jcr:created] DESC it is 
> throwing an error not a date string, after some investigation i found that it 
> will accept only ISO8601 format (-MM-dd'T'HH:mm:ss.SSSZ). when i try to 
> store date in above format it automatically converts to .MM.dd HH:mm:ss . 
> now my questions are 
> 1) how to change default date format of jcr to  -MM-dd'T'HH:mm:ss.SSSZ
> 2) how to perform queries on Dates



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6600) queries on Date type results empty search results

2017-08-31 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16148571#comment-16148571
 ] 

Dirk Rudolph commented on OAK-6600:
---

The folllowing works quite well for me:

{code}
select [jcr:path], [jcr:score], * from [nt:unstructured] as a where 
[jcr:lastModified] = cast('2017-08-30T21:18:42.917Z' as date) and 
isdescendantnode(a, '/content') /* xpath: 
/jcr:root/content//element(*,nt:unstructured)[@jcr:lastModified = 
xs:dateTime('2017-08-30T21:18:42.917Z')] */
{code}

{code}
select [jcr:path], [jcr:score], * from [nt:unstructured] as a where 
[jcr:lastModified] = cast('2017-08-30T21:18:42.917+02:00' as date) and 
isdescendantnode(a, '/content') /* xpath: 
/jcr:root/content//element(*,nt:unstructured)[@jcr:lastModified = 
xs:dateTime('2017-08-30T21:18:42.917+02:00')] */
{code}


>  queries on Date type results empty search results
> --
>
> Key: OAK-6600
> URL: https://issues.apache.org/jira/browse/OAK-6600
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.6.0
>Reporter: Mouli 
>Priority: Blocker
>  Labels: jcr, oak
>
> there are two issues here 
> 1) by default when we try to store date in jcr it saves in below format 
> 2017-08-21 21:35:33 when i perform query on this date it is showing empty 
> search results.
> select [body/dataNum] from [cas:article] where [jcr:lastModified] = 
> '2017-08-29 16:36:39' order by [jcr:created] DESC
> 2) when i try to use 
> select [body/dataNum] from [cas:article] where [jcr:lastModified] 
> cast('2017-08-29 16:36:39' as date)  order by [jcr:created] DESC it is 
> throwing an error not a date string, after some investigation i found that it 
> will accept only ISO8601 format (-MM-dd'T'HH:mm:ss.SSSZ). when i try to 
> store date in above format it automatically converts to .MM.dd HH:mm:ss . 
> now my questions are 
> 1) how to change default date format of jcr to  -MM-dd'T'HH:mm:ss.SSSZ
> 2) how to perform queries on Dates



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene

2017-08-30 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146942#comment-16146942
 ] 

Dirk Rudolph commented on OAK-6597:
---

We should also double check spellcheck, suggestion and facets. From what I can 
see those are not taken into account for aggregated nodes either.

> rep:excerpt not working for content indexed by aggregation in lucene
> 
>
> Key: OAK-6597
> URL: https://issues.apache.org/jira/browse/OAK-6597
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1, 1.7.6
>Reporter: Dirk Rudolph
> Fix For: 1.8
>
> Attachments: excerpt-with-aggregation-test.patch
>
>
> I mentioned that properties that got indexed due to an aggregation are not 
> considered for excerpts (highlighting) as they are not indexed as stored 
> fields.
> See the attached patch that implements a test for excerpts in 
> {{LuceneIndexAggregationTest2}}.
> It creates the following structure:
> {code}
> /content/foo [test:Page]
>  + bar (String)
>  - jcr:content [test:PageContent]
>   + bar (String)
> {code}
> where both strings (the _bar_ property at _foo_ and the _bar_ property at 
> _jcr:content_) contain different text. 
> Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in 
> _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the 
> former one the excerpt is properly provided for the later one it isn't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene

2017-08-30 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146835#comment-16146835
 ] 

Dirk Rudolph edited comment on OAK-6597 at 8/30/17 7:59 AM:


{quote}
which if enabled would enable storage of ":fulltext" field created in any of of 
the above way
{quote}

That would mean that the excerpt is created from a stored field containing all 
indexed properties of all nested nodes right? If so there could be the corner 
case that the excerpt would contain weird text on the boundaries of a single 
property value, no?

Example:

{code}
/content/foo
 + jcr:content 
  - text1 = "My fancy text"
  - text2 = "This isn't so fancy"
{code}

If I'm right that would cause an excerpt like "My fancy text This isn't 
so fancy" or even worse without the space: "My fancy textThis isn't so 
fancy". Wouldn't it make sense to store each and every nested property in its 
own analyzed field (full:_jcr_content/text1) or similar?

Do we have any insights what will be the impact on the index size and with that 
the impact on query performance against one index that has that feature 
enabled? 


was (Author: diru):
Do we have any insights what will be the impact on the index size and with that 
the impact on query performance against one index that has that feature 
enabled? 

> rep:excerpt not working for content indexed by aggregation in lucene
> 
>
> Key: OAK-6597
> URL: https://issues.apache.org/jira/browse/OAK-6597
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1, 1.7.6
>Reporter: Dirk Rudolph
> Fix For: 1.8
>
> Attachments: excerpt-with-aggregation-test.patch
>
>
> I mentioned that properties that got indexed due to an aggregation are not 
> considered for excerpts (highlighting) as they are not indexed as stored 
> fields.
> See the attached patch that implements a test for excerpts in 
> {{LuceneIndexAggregationTest2}}.
> It creates the following structure:
> {code}
> /content/foo [test:Page]
>  + bar (String)
>  - jcr:content [test:PageContent]
>   + bar (String)
> {code}
> where both strings (the _bar_ property at _foo_ and the _bar_ property at 
> _jcr:content_) contain different text. 
> Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in 
> _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the 
> former one the excerpt is properly provided for the later one it isn't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene

2017-08-30 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146835#comment-16146835
 ] 

Dirk Rudolph commented on OAK-6597:
---

Do we have any insights what will be the impact on the index size and with that 
the impact on query performance against one index that has that feature 
enabled? 

> rep:excerpt not working for content indexed by aggregation in lucene
> 
>
> Key: OAK-6597
> URL: https://issues.apache.org/jira/browse/OAK-6597
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1, 1.7.6
>Reporter: Dirk Rudolph
> Fix For: 1.8
>
> Attachments: excerpt-with-aggregation-test.patch
>
>
> I mentioned that properties that got indexed due to an aggregation are not 
> considered for excerpts (highlighting) as they are not indexed as stored 
> fields.
> See the attached patch that implements a test for excerpts in 
> {{LuceneIndexAggregationTest2}}.
> It creates the following structure:
> {code}
> /content/foo [test:Page]
>  + bar (String)
>  - jcr:content [test:PageContent]
>   + bar (String)
> {code}
> where both strings (the _bar_ property at _foo_ and the _bar_ property at 
> _jcr:content_) contain different text. 
> Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in 
> _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the 
> former one the excerpt is properly provided for the later one it isn't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6598) LuceneIndexAggregationTest2 doesn't get executed by mvn test

2017-08-30 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146816#comment-16146816
 ] 

Dirk Rudolph commented on OAK-6598:
---

Thanks, [~chetanm]. It might be worth checking 
{{org.apache.jackrabbit.oak.plugins.document.ClusterTest2}} which has the same 
naming pattern.

> LuceneIndexAggregationTest2 doesn't get executed by mvn test
> 
>
> Key: OAK-6598
> URL: https://issues.apache.org/jira/browse/OAK-6598
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1, 1.7.6
>Reporter: Dirk Rudolph
>Assignee: Chetan Mehrotra
>Priority: Minor
> Fix For: 1.8, 1.7.7
>
>
> I cannot find the results of 
> [LuceneIndexAggregationTest2|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexAggregationTest2.java]
>  on 
> [Jenkins|https://builds.apache.org/view/All/job/Apache%20Jackrabbit%20Oak%20matrix/]
>  nor am I able to execute them using {{mvn clean test}}. 
> It looks like this being related to {{...Test2.java}} not matching any 
> pattern and might effect other tests as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6598) LuceneIndexAggregationTest2 doesn't get executed by mvn test

2017-08-29 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-6598:
--
Description: 
I cannot find the results of 
[LuceneIndexAggregationTest2|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexAggregationTest2.java]
 on 
[Jenkins|https://builds.apache.org/view/All/job/Apache%20Jackrabbit%20Oak%20matrix/]
 nor am I able to execute them using {{mvn clean test}}. 

It looks like this being related to {{...Test2.java}} not matching any pattern 
and might effect other tests as well.

  was:
I cannot find the results of 
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexAggregationTest2.java
 here 
https://builds.apache.org/view/All/job/Apache%20Jackrabbit%20Oak%20matrix/ nor 
am I able to execute them using {{mvn clean test}}. 

It looks like this being related to {{...Test2.java}} not matching any pattern 
and might effect other tests as well.


> LuceneIndexAggregationTest2 doesn't get executed by mvn test
> 
>
> Key: OAK-6598
> URL: https://issues.apache.org/jira/browse/OAK-6598
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1, 1.7.6
>Reporter: Dirk Rudolph
>
> I cannot find the results of 
> [LuceneIndexAggregationTest2|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexAggregationTest2.java]
>  on 
> [Jenkins|https://builds.apache.org/view/All/job/Apache%20Jackrabbit%20Oak%20matrix/]
>  nor am I able to execute them using {{mvn clean test}}. 
> It looks like this being related to {{...Test2.java}} not matching any 
> pattern and might effect other tests as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene

2017-08-29 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146266#comment-16146266
 ] 

Dirk Rudolph edited comment on OAK-6597 at 8/29/17 10:38 PM:
-

This is blocked by OAK-6598 as long as {{LuceneIndexAggregationTest2}} is not 
running and/or failing.


was (Author: diru):
This is blocked as long as {{LuceneIndexAggregationTest2}} OAK-6598 is not 
running and/or failing.

> rep:excerpt not working for content indexed by aggregation in lucene
> 
>
> Key: OAK-6597
> URL: https://issues.apache.org/jira/browse/OAK-6597
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1, 1.7.6
>Reporter: Dirk Rudolph
> Attachments: excerpt-with-aggregation-test.patch
>
>
> I mentioned that properties that got indexed due to an aggregation are not 
> considered for excerpts (highlighting) as they are not indexed as stored 
> fields.
> See the attached patch that implements a test for excerpts in 
> {{LuceneIndexAggregationTest2}}.
> It creates the following structure:
> {code}
> /content/foo [test:Page]
>  + bar (String)
>  - jcr:content [test:PageContent]
>   + bar (String)
> {code}
> where both strings (the _bar_ property at _foo_ and the _bar_ property at 
> _jcr:content_) contain different text. 
> Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in 
> _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the 
> former one the excerpt is properly provided for the later one it isn't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene

2017-08-29 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146266#comment-16146266
 ] 

Dirk Rudolph commented on OAK-6597:
---

This is blocked as long as {{LuceneIndexAggregationTest2}} OAK-6598 is not 
running and/or failing.

> rep:excerpt not working for content indexed by aggregation in lucene
> 
>
> Key: OAK-6597
> URL: https://issues.apache.org/jira/browse/OAK-6597
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1, 1.7.6
>Reporter: Dirk Rudolph
> Attachments: excerpt-with-aggregation-test.patch
>
>
> I mentioned that properties that got indexed due to an aggregation are not 
> considered for excerpts (highlighting) as they are not indexed as stored 
> fields.
> See the attached patch that implements a test for excerpts in 
> {{LuceneIndexAggregationTest2}}.
> It creates the following structure:
> {code}
> /content/foo [test:Page]
>  + bar (String)
>  - jcr:content [test:PageContent]
>   + bar (String)
> {code}
> where both strings (the _bar_ property at _foo_ and the _bar_ property at 
> _jcr:content_) contain different text. 
> Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in 
> _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the 
> former one the excerpt is properly provided for the later one it isn't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6598) LuceneIndexAggregationTest2 doesn't get executed by mvn test

2017-08-29 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-6598:
--
Component/s: lucene

> LuceneIndexAggregationTest2 doesn't get executed by mvn test
> 
>
> Key: OAK-6598
> URL: https://issues.apache.org/jira/browse/OAK-6598
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1, 1.7.6
>Reporter: Dirk Rudolph
>
> I cannot find the results of 
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexAggregationTest2.java
>  here 
> https://builds.apache.org/view/All/job/Apache%20Jackrabbit%20Oak%20matrix/ 
> nor am I able to execute them using {{mvn clean test}}. 
> It looks like this being related to {{...Test2.java}} not matching any 
> pattern and might effect other tests as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-6598) LuceneIndexAggregationTest2 doesn't get executed by maven

2017-08-29 Thread Dirk Rudolph (JIRA)
Dirk Rudolph created OAK-6598:
-

 Summary: LuceneIndexAggregationTest2 doesn't get executed by maven
 Key: OAK-6598
 URL: https://issues.apache.org/jira/browse/OAK-6598
 Project: Jackrabbit Oak
  Issue Type: Bug
Affects Versions: 1.7.6, 1.6.1
Reporter: Dirk Rudolph


I cannot find the results of 
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexAggregationTest2.java
 here 
https://builds.apache.org/view/All/job/Apache%20Jackrabbit%20Oak%20matrix/ nor 
am I able to execute them using {{mvn clean test}}. 

It looks like this being related to {{...Test2.java}} not matching any pattern 
and might effect other tests as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6598) LuceneIndexAggregationTest2 doesn't get executed by mvn test

2017-08-29 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-6598:
--
Summary: LuceneIndexAggregationTest2 doesn't get executed by mvn test  
(was: LuceneIndexAggregationTest2 doesn't get executed by maven)

> LuceneIndexAggregationTest2 doesn't get executed by mvn test
> 
>
> Key: OAK-6598
> URL: https://issues.apache.org/jira/browse/OAK-6598
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1, 1.7.6
>Reporter: Dirk Rudolph
>
> I cannot find the results of 
> https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexAggregationTest2.java
>  here 
> https://builds.apache.org/view/All/job/Apache%20Jackrabbit%20Oak%20matrix/ 
> nor am I able to execute them using {{mvn clean test}}. 
> It looks like this being related to {{...Test2.java}} not matching any 
> pattern and might effect other tests as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene

2017-08-29 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-6597:
--
Affects Version/s: 1.7.6

> rep:excerpt not working for content indexed by aggregation in lucene
> 
>
> Key: OAK-6597
> URL: https://issues.apache.org/jira/browse/OAK-6597
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1, 1.7.6
>Reporter: Dirk Rudolph
> Attachments: excerpt-with-aggregation-test.patch
>
>
> I mentioned that properties that got indexed due to an aggregation are not 
> considered for excerpts (highlighting) as they are not indexed as stored 
> fields.
> See the attached patch that implements a test for excerpts in 
> {{LuceneIndexAggregationTest2}}.
> It creates the following structure:
> {code}
> /content/foo [test:Page]
>  + bar (String)
>  - jcr:content [test:PageContent]
>   + bar (String)
> {code}
> where both strings (the _bar_ property at _foo_ and the _bar_ property at 
> _jcr:content_) contain different text. 
> Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in 
> _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the 
> former one the excerpt is properly provided for the later one it isn't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene

2017-08-29 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146126#comment-16146126
 ] 

Dirk Rudolph commented on OAK-6597:
---

This is because the property of node _/content/foo_, which is of the node type 
the index definition defines rules for, are added as stored fields using 
[LuceneDocumentMaker#indexProperty()|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneDocumentMaker.java#L247]
 (See [LuceneDocumentMaker.java line 
112-129|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneDocumentMaker.java#L112])
 and the properties of _/content/foo/jcr:content_ are added non-stored in 
[LuceneDocumentMaker#indexAggregatedNode()|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneDocumentMaker.java#L599]
 (See [LuceneDocumentMaker.java line 
652-658|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneDocumentMaker.java#L652])

Is there any particular reason not to use {{indexProperty()}} for properties of 
the aggregated node?

> rep:excerpt not working for content indexed by aggregation in lucene
> 
>
> Key: OAK-6597
> URL: https://issues.apache.org/jira/browse/OAK-6597
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1
>Reporter: Dirk Rudolph
> Attachments: excerpt-with-aggregation-test.patch
>
>
> I mentioned that properties that got indexed due to an aggregation are not 
> considered for excerpts (highlighting) as they are not indexed as stored 
> fields.
> See the attached patch that implements a test for excerpts in 
> {{LuceneIndexAggregationTest2}}.
> It creates the following structure:
> {code}
> /content/foo [test:Page]
>  + bar (String)
>  - jcr:content [test:PageContent]
>   + bar (String)
> {code}
> where both strings (the _bar_ property at _foo_ and the _bar_ property at 
> _jcr:content_) contain different text. 
> Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in 
> _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the 
> former one the excerpt is properly provided for the later one it isn't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene

2017-08-29 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-6597:
--
Attachment: excerpt-with-aggregation-test.patch

> rep:excerpt not working for content indexed by aggregation in lucene
> 
>
> Key: OAK-6597
> URL: https://issues.apache.org/jira/browse/OAK-6597
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1
>Reporter: Dirk Rudolph
> Attachments: excerpt-with-aggregation-test.patch
>
>
> I mentioned that properties that got indexed due to an aggregation are not 
> considered for excerpts (highlighting) as they are not indexed as stored 
> fields.
> See the attached patch that implements a test for excerpts in 
> {{LuceneIndexAggregationTest2}}.
> It creates the following structure:
> {code}
> /content/foo [test:Page]
>  + bar (String)
>  - jcr:content [test:PageContent]
>   + bar (String)
> {code}
> where both strings (the _bar_ property at _foo_ and the _bar_ property at 
> _jcr:content_) contain different text. 
> Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in 
> _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the 
> former one the excerpt is properly provided for the later one it isn't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene

2017-08-29 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-6597:
--
Component/s: lucene

> rep:excerpt not working for content indexed by aggregation in lucene
> 
>
> Key: OAK-6597
> URL: https://issues.apache.org/jira/browse/OAK-6597
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.6.1
>Reporter: Dirk Rudolph
>
> I mentioned that properties that got indexed due to an aggregation are not 
> considered for excerpts (highlighting) as they are not indexed as stored 
> fields.
> See the attached patch that implements a test for excerpts in 
> {{LuceneIndexAggregationTest2}}.
> It creates the following structure:
> {code}
> /content/foo [test:Page]
>  + bar (String)
>  - jcr:content [test:PageContent]
>   + bar (String)
> {code}
> where both strings (the _bar_ property at _foo_ and the _bar_ property at 
> _jcr:content_) contain different text. 
> Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in 
> _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the 
> former one the excerpt is properly provided for the later one it isn't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene

2017-08-29 Thread Dirk Rudolph (JIRA)
Dirk Rudolph created OAK-6597:
-

 Summary: rep:excerpt not working for content indexed by 
aggregation in lucene
 Key: OAK-6597
 URL: https://issues.apache.org/jira/browse/OAK-6597
 Project: Jackrabbit Oak
  Issue Type: Bug
Affects Versions: 1.6.1
Reporter: Dirk Rudolph


I mentioned that properties that got indexed due to an aggregation are not 
considered for excerpts (highlighting) as they are not indexed as stored fields.

See the attached patch that implements a test for excerpts in 
{{LuceneIndexAggregationTest2}}.

It creates the following structure:

{code}
/content/foo [test:Page]
 + bar (String)
 - jcr:content [test:PageContent]
  + bar (String)
{code}

where both strings (the _bar_ property at _foo_ and the _bar_ property at 
_jcr:content_) contain different text. 

Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in 
_/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the 
former one the excerpt is properly provided for the later one it isn't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-5995) Lucene indexing with copyonread/write holding unexpectedly much files open

2017-05-02 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992541#comment-15992541
 ] 

Dirk Rudolph commented on OAK-5995:
---

Thanks for your support [~chetanm]. It looks like the customer setup the pre 
production instance without clearing the local FS copy of the index so Lucene 
was working the wrong files and had issues with that. Clearing the local copy 
of the indexes and letting them be recreated from repo resolved the issue as 
far as I know.

> Lucene indexing with copyonread/write holding unexpectedly much files open
> --
>
> Key: OAK-5995
> URL: https://issues.apache.org/jira/browse/OAK-5995
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.4.1
>Reporter: Dirk Rudolph
>Assignee: Chetan Mehrotra
> Attachments: lsofout2.txt
>
>
> We recently faced the issue that our Oak based enterprise content management 
> system run into failures due to too much open files. Monitoring the lsof 
> output we found out that most of the opened files of the process are the 
> files within the configured localIndexDir of the LuceneIndexProviderService. 
> {code}
> enableCopyOnReadSupport="true"
> localIndexDir="tmp/index"
> enableCopyOnWriteSupport="true"
> {code}
> See attached the lsof output:
> {code}
> ~ wc -l lsofout2.txt
>20388 lsofout2.txt
> ~ grep "tmp/index" lsofout2.txt | wc -l
>13499
> {code}
> where more then 60% of open files are "tmp/index" ones as configured as 
> {{localIndexDir}} shortly after a restart of the process.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (OAK-5995) Lucene indexing with copyonread/write holding unexpectedly much files open

2017-03-29 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948506#comment-15948506
 ] 

Dirk Rudolph edited comment on OAK-5995 at 3/30/17 6:51 AM:


We will try to do so, thanks.

In our specific AEM 6.2 setup the following lucene index are not configured 
with {{indexPath}}: 

* /content/oak:index/enablementResourceName
* /oak:index/socialLucene and
* /oak:index/damAssetLucene

May be worth to be report that to daycare as well, as even with 6.2 SP1 oak is 
shipped in version 1.4.6


was (Author: diru):
We will try to do so, thanks.

In our specific AEM 6.2 setup its: 

* /content/oak:index/enablementResourceName
* /oak:index/socialLucene and
* /oak:index/damAssetLucene

May be worth to be report that to daycare as well, as even with 6.2 SP1 oak is 
shipped in version 1.4.6

> Lucene indexing with copyonread/write holding unexpectedly much files open
> --
>
> Key: OAK-5995
> URL: https://issues.apache.org/jira/browse/OAK-5995
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.4.1
>Reporter: Dirk Rudolph
>Assignee: Chetan Mehrotra
> Attachments: lsofout2.txt
>
>
> We recently faced the issue that our Oak based enterprise content management 
> system run into failures due to too much open files. Monitoring the lsof 
> output we found out that most of the opened files of the process are the 
> files within the configured localIndexDir of the LuceneIndexProviderService. 
> {code}
> enableCopyOnReadSupport="true"
> localIndexDir="tmp/index"
> enableCopyOnWriteSupport="true"
> {code}
> See attached the lsof output:
> {code}
> ~ wc -l lsofout2.txt
>20388 lsofout2.txt
> ~ grep "tmp/index" lsofout2.txt | wc -l
>13499
> {code}
> where more then 60% of open files are "tmp/index" ones as configured as 
> {{localIndexDir}} shortly after a restart of the process.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (OAK-5995) Lucene indexing with copyonread/write holding unexpectedly much files open

2017-03-29 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948506#comment-15948506
 ] 

Dirk Rudolph commented on OAK-5995:
---

We will try to do so, thanks.

In our specific AEM 6.2 setup its: 

* /content/oak:index/enablementResourceName
* /oak:index/socialLucene and
* /oak:index/damAssetLucene

May be worth to be report that to daycare as well, as even with 6.2 SP1 oak is 
shipped in version 1.4.6

> Lucene indexing with copyonread/write holding unexpectedly much files open
> --
>
> Key: OAK-5995
> URL: https://issues.apache.org/jira/browse/OAK-5995
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.4.1
>Reporter: Dirk Rudolph
>Assignee: Chetan Mehrotra
> Attachments: lsofout2.txt
>
>
> We recently faced the issue that our Oak based enterprise content management 
> system run into failures due to too much open files. Monitoring the lsof 
> output we found out that most of the opened files of the process are the 
> files within the configured localIndexDir of the LuceneIndexProviderService. 
> {code}
> enableCopyOnReadSupport="true"
> localIndexDir="tmp/index"
> enableCopyOnWriteSupport="true"
> {code}
> See attached the lsof output:
> {code}
> ~ wc -l lsofout2.txt
>20388 lsofout2.txt
> ~ grep "tmp/index" lsofout2.txt | wc -l
>13499
> {code}
> where more then 60% of open files are "tmp/index" ones as configured as 
> {{localIndexDir}} shortly after a restart of the process.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (OAK-5995) Lucene indexing with copyonread/write holding unexpectedly much files open

2017-03-29 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947030#comment-15947030
 ] 

Dirk Rudolph edited comment on OAK-5995 at 3/29/17 12:34 PM:
-

Thanks. I'm not sure about the frequency with which the system is writing to 
the index. Anyway, I got feedback form the operations team. We have quite a 
couple of exceptions including the IndexCopier:
{code}
grep -Hirn "IndexCopier" logs/ | wc -l
grep: logs/._funionfs_control~: Permission denied
825370
{code}
 
 {code}
28.03.2017 13:20:22.381 *WARN* [172.19.48.185 [1490700022254] GET 
/libs/granite/ui/references/clientlibs/coral/references.css HTTP/1.1] 
org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopier 
[/oak:index/ntBaseLucene] Found local copy for _2.si in 
MMapDirectory@/appl/aem/tmp/index/9322909280ba43419b97546267900f301b5258987a41f4d535a3489a5ee602a7/2
 
lockFactory=NativeFSLockFactory@/appl/aem/tmp/index/9322909280ba43419b97546267900f301b5258987a41f4d535a3489a5ee602a7/2
 but size of local 237 differs from remote 0. Content would be read from remote 
file only
28.03.2017 13:20:22.383 *WARN* [172.19.48.185 [1490700022254] GET 
/libs/granite/ui/references/clientlibs/coral/references.css HTTP/1.1] 
org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopier 
[/oak:index/ntBaseLucene] Found local copy for _2.cfe in 
MMapDirectory@/appl/aem/tmp/index/9322909280ba43419b97546267900f301b5258987a41f4d535a3489a5ee602a7/2
 
lockFactory=NativeFSLockFactory@/appl/aem/tmp/index/9322909280ba43419b97546267900f301b5258987a41f4d535a3489a5ee602a7/2
 but size of local 258 differs from remote 0. Content would be read from remote 
file only
28.03.2017 13:20:22.386 *ERROR* [172.19.48.185 [1490700022254] GET 
/libs/granite/ui/references/clientlibs/coral/references.css HTTP/1.1] 
org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker Could not access 
the Lucene index at /oak:index/ntBaseLucene
java.io.FileNotFoundException: [tags(/oak:index/ntBaseLucene)] _2.si
 at 
org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory.openInput(OakDirectory.java:180)
 at 
org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopier$CopyOnReadDirectory.openInput(IndexCopier.java:355)
 at 
org.apache.lucene.codecs.lucene46.Lucene46SegmentInfoReader.read(Lucene46SegmentInfoReader.java:49)
 at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:340)
 at 
org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56)
 at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:843)
 at 
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
 at 
org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:66)
 at 
org.apache.jackrabbit.oak.plugins.index.lucene.IndexNode.(IndexNode.java:105)
 at 
org.apache.jackrabbit.oak.plugins.index.lucene.IndexNode.open(IndexNode.java:69)
 at 
org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.findIndexNode(IndexTracker.java:162)
 at 
org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.acquireIndexNode(IndexTracker.java:137)
 at 
org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex.getPlans(LucenePropertyIndex.java:250)
 at 
org.apache.jackrabbit.oak.query.QueryImpl.getBestSelectorExecutionPlan(QueryImpl.java:1016)
 at 
org.apache.jackrabbit.oak.query.QueryImpl.getBestSelectorExecutionPlan(QueryImpl.java:949)
 at 
org.apache.jackrabbit.oak.query.ast.SelectorImpl.prepare(SelectorImpl.java:288)
 at 
org.apache.jackrabbit.oak.query.QueryImpl.prepare(QueryImpl.java:631)
 at 
org.apache.jackrabbit.oak.query.QueryEngineImpl.prepareAndSelect(QueryEngineImpl.java:298)
 at 
org.apache.jackrabbit.oak.query.QueryEngineImpl.executeQuery(QueryEngineImpl.java:273)
 at 
org.apache.jackrabbit.oak.query.QueryEngineImpl.executeQuery(QueryEngineImpl.java:233)
 at 
org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.resolveUUID(IdentifierManager.java:314)
 at 
org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.resolveUUID(IdentifierManager.java:308)
 at 
org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.resolveUUID(IdentifierManager.java:304)
 at 
org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.getTree(IdentifierManager.java:133)
 at 
org.apache.jackrabbit.oak.security.user.AuthorizableBaseProvider.getByContentID(AuthorizableBaseProvider.java:56)
 at 
org.apache.jackrabbit.oak.security.user.AuthorizableBaseProvider.getByID(AuthorizableBaseProvider.java:51)
 at 
org.apache.jackrabbit.oak.security.user.UserProvider.getAuthorizable(UserProvider.java:211)
 at 
org.apache.jackrabbit.oak.security.user.UserPrincipalProvider.getPrincip

[jira] [Commented] (OAK-5995) Lucene indexing with copyonread/write holding unexpectedly much files open

2017-03-29 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947030#comment-15947030
 ] 

Dirk Rudolph commented on OAK-5995:
---

Thanks. I'm not sure about the frequency with which the system is writing to 
the index. Anyway, I got feedback form the operations team. We have quite a 
couple of exceptions including the IndexCopier:
{code}
grep -Hirn "IndexCopier" logs/ | wc -l
grep: logs/._funionfs_control~: Permission denied
825370
{code}
 
 {code}
28.03.2017 13:20:22.381 *WARN* [172.19.48.185 [1490700022254] GET 
/libs/granite/ui/references/clientlibs/coral/references.css HTTP/1.1] 
org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopier 
[/oak:index/ntBaseLucene] Found local copy for _2.si in 
MMapDirectory@/appl/aem/tmp/index/9322909280ba43419b97546267900f301b5258987a41f4d535a3489a5ee602a7/2
 
lockFactory=NativeFSLockFactory@/appl/aem/tmp/index/9322909280ba43419b97546267900f301b5258987a41f4d535a3489a5ee602a7/2
 but size of local 237 differs from remote 0. Content would be read from remote 
file only
28.03.2017 13:20:22.383 *WARN* [172.19.48.185 [1490700022254] GET 
/libs/granite/ui/references/clientlibs/coral/references.css HTTP/1.1] 
org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopier 
[/oak:index/ntBaseLucene] Found local copy for _2.cfe in 
MMapDirectory@/appl/aem/tmp/index/9322909280ba43419b97546267900f301b5258987a41f4d535a3489a5ee602a7/2
 
lockFactory=NativeFSLockFactory@/appl/aem/tmp/index/9322909280ba43419b97546267900f301b5258987a41f4d535a3489a5ee602a7/2
 but size of local 258 differs from remote 0. Content would be read from remote 
file only
28.03.2017 13:20:22.386 *ERROR* [172.19.48.185 [1490700022254] GET 
/libs/granite/ui/references/clientlibs/coral/references.css HTTP/1.1] 
org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker Could not access 
the Lucene index at /oak:index/ntBaseLucene
java.io.FileNotFoundException: [tags(/oak:index/ntBaseLucene)] _2.si
 at 
org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory.openInput(OakDirectory.java:180)
 at 
org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopier$CopyOnReadDirectory.openInput(IndexCopier.java:355)
 at 
org.apache.lucene.codecs.lucene46.Lucene46SegmentInfoReader.read(Lucene46SegmentInfoReader.java:49)
 at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:340)
 at 
org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56)
 at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:843)
 at 
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
 at 
org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:66)
 at 
org.apache.jackrabbit.oak.plugins.index.lucene.IndexNode.(IndexNode.java:105)
 at 
org.apache.jackrabbit.oak.plugins.index.lucene.IndexNode.open(IndexNode.java:69)
 at 
org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.findIndexNode(IndexTracker.java:162)
 at 
org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.acquireIndexNode(IndexTracker.java:137)
 at 
org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex.getPlans(LucenePropertyIndex.java:250)
 at 
org.apache.jackrabbit.oak.query.QueryImpl.getBestSelectorExecutionPlan(QueryImpl.java:1016)
 at 
org.apache.jackrabbit.oak.query.QueryImpl.getBestSelectorExecutionPlan(QueryImpl.java:949)
 at 
org.apache.jackrabbit.oak.query.ast.SelectorImpl.prepare(SelectorImpl.java:288)
 at 
org.apache.jackrabbit.oak.query.QueryImpl.prepare(QueryImpl.java:631)
 at 
org.apache.jackrabbit.oak.query.QueryEngineImpl.prepareAndSelect(QueryEngineImpl.java:298)
 at 
org.apache.jackrabbit.oak.query.QueryEngineImpl.executeQuery(QueryEngineImpl.java:273)
 at 
org.apache.jackrabbit.oak.query.QueryEngineImpl.executeQuery(QueryEngineImpl.java:233)
 at 
org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.resolveUUID(IdentifierManager.java:314)
 at 
org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.resolveUUID(IdentifierManager.java:308)
 at 
org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.resolveUUID(IdentifierManager.java:304)
 at 
org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.getTree(IdentifierManager.java:133)
 at 
org.apache.jackrabbit.oak.security.user.AuthorizableBaseProvider.getByContentID(AuthorizableBaseProvider.java:56)
 at 
org.apache.jackrabbit.oak.security.user.AuthorizableBaseProvider.getByID(AuthorizableBaseProvider.java:51)
 at 
org.apache.jackrabbit.oak.security.user.UserProvider.getAuthorizable(UserProvider.java:211)
 at 
org.apache.jackrabbit.oak.security.user.UserPrincipalProvider.getPrincipals(UserPrincipalProvider.java:134)
 at 
or

[jira] [Commented] (OAK-5995) Lucene indexing with copyonread/write holding unexpectedly much files open

2017-03-29 Thread Dirk Rudolph (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946729#comment-15946729
 ] 

Dirk Rudolph commented on OAK-5995:
---

Thanks Chetan, I will investigate with the Operations team and will let you 
know. In the meanwhile I checked the index definitions and there are indeed 
some which don't have the {{indexPath}} set. Though those are only small ones 
compared to others. Does the size matter? If this is about leaking file 
handles, what are the circumstances which impact that behaviour? (Index size, # 
of queries against the index, # of reads/writes)?

> Lucene indexing with copyonread/write holding unexpectedly much files open
> --
>
> Key: OAK-5995
> URL: https://issues.apache.org/jira/browse/OAK-5995
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: lucene
>Affects Versions: 1.4.1
>Reporter: Dirk Rudolph
>Assignee: Chetan Mehrotra
> Attachments: lsofout2.txt
>
>
> We recently faced the issue that our Oak based enterprise content management 
> system run into failures due to too much open files. Monitoring the lsof 
> output we found out that most of the opened files of the process are the 
> files within the configured localIndexDir of the LuceneIndexProviderService. 
> {code}
> enableCopyOnReadSupport="true"
> localIndexDir="tmp/index"
> enableCopyOnWriteSupport="true"
> {code}
> See attached the lsof output:
> {code}
> ~ wc -l lsofout2.txt
>20388 lsofout2.txt
> ~ grep "tmp/index" lsofout2.txt | wc -l
>13499
> {code}
> where more then 60% of open files are "tmp/index" ones as configured as 
> {{localIndexDir}} shortly after a restart of the process.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-5995) Lucene indexing with copyonread/write holding unexpectedly much files open

2017-03-28 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-5995:
--
Attachment: lsofout2.txt

> Lucene indexing with copyonread/write holding unexpectedly much files open
> --
>
> Key: OAK-5995
> URL: https://issues.apache.org/jira/browse/OAK-5995
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: indexing
>Affects Versions: 1.4.1
>Reporter: Dirk Rudolph
> Attachments: lsofout2.txt
>
>
> We recently faced the issue that our Oak based enterprise content management 
> system run into failures due to too much open files. Monitoring the lsof 
> output we found out that most of the opened files of the process are the 
> files within the configured localIndexDir of the LuceneIndexProviderService. 
> {code}
> enableCopyOnReadSupport="true"
> localIndexDir="tmp/index"
> enableCopyOnWriteSupport="true"
> {code}
> See attached the lsof output:
> {code}
> ~ wc -l lsofout2.txt
>20388 lsofout2.txt
> ~ grep "tmp/index" lsofout2.txt | wc -l
>13499
> {code}
> where more then 60% of open files are "tmp/index" ones as configured as 
> {{localIndexDir}}. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (OAK-5995) Lucene indexing with copyonread/write holding unexpectedly much files open

2017-03-28 Thread Dirk Rudolph (JIRA)

 [ 
https://issues.apache.org/jira/browse/OAK-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dirk Rudolph updated OAK-5995:
--
Description: 
We recently faced the issue that our Oak based enterprise content management 
system run into failures due to too much open files. Monitoring the lsof output 
we found out that most of the opened files of the process are the files within 
the configured localIndexDir of the LuceneIndexProviderService. 

{code}
enableCopyOnReadSupport="true"
localIndexDir="tmp/index"
enableCopyOnWriteSupport="true"
{code}

See attached the lsof output:

{code}
~ wc -l lsofout2.txt
   20388 lsofout2.txt
~ grep "tmp/index" lsofout2.txt | wc -l
   13499
{code}

where more then 60% of open files are "tmp/index" ones as configured as 
{{localIndexDir}} shortly after a restart of the process.

  was:
We recently faced the issue that our Oak based enterprise content management 
system run into failures due to too much open files. Monitoring the lsof output 
we found out that most of the opened files of the process are the files within 
the configured localIndexDir of the LuceneIndexProviderService. 

{code}
enableCopyOnReadSupport="true"
localIndexDir="tmp/index"
enableCopyOnWriteSupport="true"
{code}

See attached the lsof output:

{code}
~ wc -l lsofout2.txt
   20388 lsofout2.txt
~ grep "tmp/index" lsofout2.txt | wc -l
   13499
{code}

where more then 60% of open files are "tmp/index" ones as configured as 
{{localIndexDir}}. 


> Lucene indexing with copyonread/write holding unexpectedly much files open
> --
>
> Key: OAK-5995
> URL: https://issues.apache.org/jira/browse/OAK-5995
> Project: Jackrabbit Oak
>  Issue Type: Bug
>  Components: indexing
>Affects Versions: 1.4.1
>Reporter: Dirk Rudolph
> Attachments: lsofout2.txt
>
>
> We recently faced the issue that our Oak based enterprise content management 
> system run into failures due to too much open files. Monitoring the lsof 
> output we found out that most of the opened files of the process are the 
> files within the configured localIndexDir of the LuceneIndexProviderService. 
> {code}
> enableCopyOnReadSupport="true"
> localIndexDir="tmp/index"
> enableCopyOnWriteSupport="true"
> {code}
> See attached the lsof output:
> {code}
> ~ wc -l lsofout2.txt
>20388 lsofout2.txt
> ~ grep "tmp/index" lsofout2.txt | wc -l
>13499
> {code}
> where more then 60% of open files are "tmp/index" ones as configured as 
> {{localIndexDir}} shortly after a restart of the process.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (OAK-5995) Lucene indexing with copyonread/write holding unexpectedly much files open

2017-03-28 Thread Dirk Rudolph (JIRA)
Dirk Rudolph created OAK-5995:
-

 Summary: Lucene indexing with copyonread/write holding 
unexpectedly much files open
 Key: OAK-5995
 URL: https://issues.apache.org/jira/browse/OAK-5995
 Project: Jackrabbit Oak
  Issue Type: Bug
  Components: indexing
Affects Versions: 1.4.1
Reporter: Dirk Rudolph
 Attachments: lsofout2.txt

We recently faced the issue that our Oak based enterprise content management 
system run into failures due to too much open files. Monitoring the lsof output 
we found out that most of the opened files of the process are the files within 
the configured localIndexDir of the LuceneIndexProviderService. 

{code}
enableCopyOnReadSupport="true"
localIndexDir="tmp/index"
enableCopyOnWriteSupport="true"
{code}

See attached the lsof output:

{code}
~ wc -l lsofout2.txt
   20388 lsofout2.txt
~ grep "tmp/index" lsofout2.txt | wc -l
   13499
{code}

where more then 60% of open files are "tmp/index" ones as configured as 
{{localIndexDir}}. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)