[jira] [Commented] (OAK-9060) IllegalArgumentException when using facets in union queries
[ https://issues.apache.org/jira/browse/OAK-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106564#comment-17106564 ] Dirk Rudolph commented on OAK-9060: --- I opened [#209|https://github.com/apache/jackrabbit-oak/pull/209] that resolves the issue. Unfortunately I cannot easily provide a unit test for that as oak-core itself does not contain any index supporting facets afaik - ideas welcome. Can that be back ported to 1.10.x? > IllegalArgumentException when using facets in union queries > --- > > Key: OAK-9060 > URL: https://issues.apache.org/jira/browse/OAK-9060 > Project: Jackrabbit Oak > Issue Type: Bug >Affects Versions: 1.10.3 >Reporter: Dirk Rudolph >Priority: Major > > I get the following exception when trying to execute a JCR-SQL2 query with a > facet selector and 2 path constraints, being optimised to a union of 2 > queries: > {code} > select s.[jcr:path], [rep:facet(jcr:content/tags)] from [cq:Page] as s where > ((isdescendantnode(s,'/content/pathA') or > isdescendantnode(s,'/content/pathB')) order by s.[jcr:content/date] desc > {code} > The same query works well with only one of the path constraints. > {code}java.lang.IllegalArgumentException: Invalid path: > rep:facet(jcr:content/tags > at > org.apache.jackrabbit.oak.query.QueryImpl.getOakPath(QueryImpl.java:1249) > [org.apache.jackrabbit.oak-core:1.10.3] > at > org.apache.jackrabbit.oak.query.ast.AstElement.normalizePropertyName(AstElement.java:94) > [org.apache.jackrabbit.oak-core:1.10.3] > at > org.apache.jackrabbit.oak.query.ast.SelectorImpl.currentProperty(SelectorImpl.java:566) > [org.apache.jackrabbit.oak-core:1.10.3] > at > org.apache.jackrabbit.oak.query.ast.ColumnImpl.currentProperty(ColumnImpl.java:59) > [org.apache.jackrabbit.oak-core:1.10.3] > at > org.apache.jackrabbit.oak.query.QueryImpl.currentRow(QueryImpl.java:892) > [org.apache.jackrabbit.oak-core:1.10.3] > at > org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.fetchNext(QueryImpl.java:831) > [org.apache.jackrabbit.oak-core:1.10.3] > at > org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.hasNext(QueryImpl.java:856) > [org.apache.jackrabbit.oak-core:1.10.3] > at > org.apache.jackrabbit.oak.query.UnionQueryImpl$FacetMerger.bothHaveRows(UnionQueryImpl.java:483) > [org.apache.jackrabbit.oak-core:1.10.3] > at > org.apache.jackrabbit.oak.query.UnionQueryImpl$FacetMerger.(UnionQueryImpl.java:436) > [org.apache.jackrabbit.oak-core:1.10.3] > at > org.apache.jackrabbit.oak.query.UnionQueryImpl.getRows(UnionQueryImpl.java:304) > [org.apache.jackrabbit.oak-core:1.10.3] > at > org.apache.jackrabbit.oak.query.ResultImpl$1.iterator(ResultImpl.java:72) > [org.apache.jackrabbit.oak-core:1.10.3] > at > org.apache.jackrabbit.oak.jcr.query.QueryResultImpl$1.(QueryResultImpl.java:85) > [org.apache.jackrabbit.oak-jcr:1.10.3] > at > org.apache.jackrabbit.oak.jcr.query.QueryResultImpl.getRows(QueryResultImpl.java:83) > [org.apache.jackrabbit.oak-jcr:1.10.3] > {code} > Apparently when copying the columns in [1] the information that the column is > a FacetColumnImpl is lost because FacetColumnImpl does not override the > copyOf(). > [1] > https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.10.8/oak-core/src/main/java/org/apache/jackrabbit/oak/query/QueryImpl.java#L1420 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OAK-9060) IllegalArgumentException when using facets in union queries
[ https://issues.apache.org/jira/browse/OAK-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-9060: -- Description: I get the following exception when trying to execute a JCR-SQL2 query with a facet selector and 2 path constraints, being optimised to a union of 2 queries: {code} select s.[jcr:path], [rep:facet(jcr:content/tags)] from [cq:Page] as s where ((isdescendantnode(s,'/content/pathA') or isdescendantnode(s,'/content/pathB')) order by s.[jcr:content/date] desc {code} The same query works well with only one of the path constraints. {code}java.lang.IllegalArgumentException: Invalid path: rep:facet(jcr:content/tags at org.apache.jackrabbit.oak.query.QueryImpl.getOakPath(QueryImpl.java:1249) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.ast.AstElement.normalizePropertyName(AstElement.java:94) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.ast.SelectorImpl.currentProperty(SelectorImpl.java:566) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.ast.ColumnImpl.currentProperty(ColumnImpl.java:59) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.QueryImpl.currentRow(QueryImpl.java:892) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.fetchNext(QueryImpl.java:831) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.hasNext(QueryImpl.java:856) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.UnionQueryImpl$FacetMerger.bothHaveRows(UnionQueryImpl.java:483) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.UnionQueryImpl$FacetMerger.(UnionQueryImpl.java:436) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.UnionQueryImpl.getRows(UnionQueryImpl.java:304) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.ResultImpl$1.iterator(ResultImpl.java:72) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.jcr.query.QueryResultImpl$1.(QueryResultImpl.java:85) [org.apache.jackrabbit.oak-jcr:1.10.3] at org.apache.jackrabbit.oak.jcr.query.QueryResultImpl.getRows(QueryResultImpl.java:83) [org.apache.jackrabbit.oak-jcr:1.10.3] {code} Apparently when copying the columns in [1] the information that the column is a FacetColumnImpl is lost because FacetColumnImpl does not override the copyOf(). [1] https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.10.8/oak-core/src/main/java/org/apache/jackrabbit/oak/query/QueryImpl.java#L1420 was: I get the following exception when trying to execute a JCR-SQL2 query with a facet selector and 2 path constraints, being optimised to a union of 2 queries: {code} select s.[jcr:path], [rep:facet(jcr:content/tags)] from [cq:Page] as s where ((isdescendantnode(s,'/content/pathA') or isdescendantnode(s,'/content/pathB')) order by s.[jcr:content/date] desc {code} The same query works well with only one of the path constraints. {code}java.lang.IllegalArgumentException: Invalid path: rep:facet(jcr:content/genericComponent at org.apache.jackrabbit.oak.query.QueryImpl.getOakPath(QueryImpl.java:1249) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.ast.AstElement.normalizePropertyName(AstElement.java:94) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.ast.SelectorImpl.currentProperty(SelectorImpl.java:566) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.ast.ColumnImpl.currentProperty(ColumnImpl.java:59) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.QueryImpl.currentRow(QueryImpl.java:892) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.fetchNext(QueryImpl.java:831) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.hasNext(QueryImpl.java:856) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.UnionQueryImpl$FacetMerger.bothHaveRows(UnionQueryImpl.java:483) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.UnionQueryImpl$FacetMerger.(UnionQueryImpl.java:436) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.UnionQueryImpl.getRows(UnionQueryImpl.java:304) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.ResultImpl$1.iterator(ResultImpl.java:72) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.jcr.query.QueryResultImpl$1.(QueryResultImpl.java:85) [org.apache.jackrabbit.oak-jcr:1.10.3] at org.apache.jackrabbi
[jira] [Created] (OAK-9060) IllegalArgumentException when using facets in union queries
Dirk Rudolph created OAK-9060: - Summary: IllegalArgumentException when using facets in union queries Key: OAK-9060 URL: https://issues.apache.org/jira/browse/OAK-9060 Project: Jackrabbit Oak Issue Type: Bug Affects Versions: 1.10.3 Reporter: Dirk Rudolph I get the following exception when trying to execute a JCR-SQL2 query with a facet selector and 2 path constraints, being optimised to a union of 2 queries: {code} select s.[jcr:path], [rep:facet(jcr:content/tags)] from [cq:Page] as s where ((isdescendantnode(s,'/content/pathA') or isdescendantnode(s,'/content/pathB')) order by s.[jcr:content/date] desc {code} The same query works well with only one of the path constraints. {code}java.lang.IllegalArgumentException: Invalid path: rep:facet(jcr:content/genericComponent at org.apache.jackrabbit.oak.query.QueryImpl.getOakPath(QueryImpl.java:1249) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.ast.AstElement.normalizePropertyName(AstElement.java:94) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.ast.SelectorImpl.currentProperty(SelectorImpl.java:566) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.ast.ColumnImpl.currentProperty(ColumnImpl.java:59) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.QueryImpl.currentRow(QueryImpl.java:892) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.fetchNext(QueryImpl.java:831) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.hasNext(QueryImpl.java:856) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.UnionQueryImpl$FacetMerger.bothHaveRows(UnionQueryImpl.java:483) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.UnionQueryImpl$FacetMerger.(UnionQueryImpl.java:436) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.UnionQueryImpl.getRows(UnionQueryImpl.java:304) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.query.ResultImpl$1.iterator(ResultImpl.java:72) [org.apache.jackrabbit.oak-core:1.10.3] at org.apache.jackrabbit.oak.jcr.query.QueryResultImpl$1.(QueryResultImpl.java:85) [org.apache.jackrabbit.oak-jcr:1.10.3] at org.apache.jackrabbit.oak.jcr.query.QueryResultImpl.getRows(QueryResultImpl.java:83) [org.apache.jackrabbit.oak-jcr:1.10.3] {code} Apparently when copying the columns in [1] the information that the column is a FacetColumnImpl is lost because FacetColumnImpl does not override the copyOf(); [1] https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.10.8/oak-core/src/main/java/org/apache/jackrabbit/oak/query/QueryImpl.java#L1420 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16332289#comment-16332289 ] Dirk Rudolph commented on OAK-7109: --- Thanks for the response. Regarding 1) see https://issues.apache.org/jira/browse/OAK-7109?focusedCommentId=16309376&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16309376 Optimisation does the following at the moment: A and (B or not(C and D)) => (A and B) or (A and not(C and D)) To achieve an optimisation where the result is a DNF, which can then be split in UNIONS of exclusively conjunctions, another step needs to happen before the current optimisation - NNF (moving all negation down the tree of statements) A and (B or not(C or D)) => A and (B or not(C) or not(B)) => (A and B) or (A and not(C)) or (A and not(B)) Not sure if the index supports not() but if it does, the UNION of the query above (3) queries would give exact facets which simply need to be deduplicated. > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph >Priority: Major > Labels: facet > Attachments: facetsInMultipleRoots.patch, > restrictionPropagationTest.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the > [disjunctive normal > form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex > query and executing a query for each of the disjunctive statements. As this > is expanding exponentially its only a theoretical solution, nothing for > production. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312900#comment-16312900 ] Dirk Rudolph edited comment on OAK-7109 at 1/5/18 10:38 AM: [~tmueller] so adding the feature to aggregate the current rep:facet extraction from the UNION alternatives has 2 drawbacks: 1) as said above, all constraints have to be passed to lucene, so the query has to be in DNF, which is not the case at the moment 2) even if this is the case, the disjunctive conjunctions are not mutually exclusive leading to inaccurate result as well 1) can be easily fixed by converting the restriction sot NNF before doing the optimisation. 2) would require also a deduplication between the lucene result sets returned from each of the unions. was (Author: diru): [~tmueller] so adding the feature to aggregate the current rep:facet extraction from the UNION alternatives has 2 drawbacks: 1) as said above, all constraints have to be passed to lucene, so the query has to be in DNF, which is not the case at the moment 2) even if this is the case, the disjunctive conjunctions are not mutually exclusive leading to inaccurate result as well It would require also a deduplication between the lucene results returned from each of the unions. > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch, > restrictionPropagationTest.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the > [disjunctive normal > form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex > query and executing a query for each of the disjunctive statements. As this > is expanding exponentially its only a theoretical solution, nothing for > production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312900#comment-16312900 ] Dirk Rudolph edited comment on OAK-7109 at 1/5/18 10:37 AM: [~tmueller] so adding the feature to aggregate the current rep:facet extraction from the UNION alternatives has 2 drawbacks: 1) as said above, all constraints have to be passed to lucene, so the query has to be in DNF, which is not the case at the moment 2) even if this is the case, the disjunctive conjunctions are not mutually exclusive leading to inaccurate result as well It would require also a deduplication between the lucene results returned from each of the unions. was (Author: diru): [~tmueller] so adding the feature to aggregate the current rep:facet extraction from the UNION alternatives has 2 drawbacks: 1) as said above, all constraints have to be passed to lucene, so the query has to be in DNF, which is not the case at the moment 2) even if this is the case, the disjunctive conjunctions are not mutually exclusive leading to inaccurate result as well > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch, > restrictionPropagationTest.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the > [disjunctive normal > form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex > query and executing a query for each of the disjunctive statements. As this > is expanding exponentially its only a theoretical solution, nothing for > production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312900#comment-16312900 ] Dirk Rudolph commented on OAK-7109: --- [~tmueller] so adding the feature to aggregate the current rep:facet extraction from the UNION alternatives has 2 drawbacks: 1) as said above, all constraints have to be passed to lucene, so the query has to be in DNF, which is not the case at the moment 2) even if this is the case, the disjunctive conjunctions are not mutually exclusive leading to inaccurate result as well > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch, > restrictionPropagationTest.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the > [disjunctive normal > form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex > query and executing a query for each of the disjunctive statements. As this > is expanding exponentially its only a theoretical solution, nothing for > production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312691#comment-16312691 ] Dirk Rudolph commented on OAK-7109: --- {quote} I have a very pessimistic view that we should fail such queries - I mean it's better to fail and allow for right index def than giving incorrect results. {quote} +1 > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch, > restrictionPropagationTest.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the > [disjunctive normal > form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex > query and executing a query for each of the disjunctive statements. As this > is expanding exponentially its only a theoretical solution, nothing for > production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-7071) PostingsHighlighter, Highlighter and SimpleExcerptProvider return all different formats for excerpts
[ https://issues.apache.org/jira/browse/OAK-7071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-7071: -- Description: *PostingsHighligher* returns for example {quote} [my text with any highlighting followed by more text] {quote} because the PostingsHighligher itself returns for each field a {{String[]}} of phrases limited by the beforehand given max phrases. This String[] is the transformed to String using {{Arrays.toString()}} at [LucenePropertyIndex.java#L688|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L688] causing the value to be wrapped in square brackets. *Highlighter* returns {quote} my text with any highlighting followed by more text {quote} *SimpleExcerptProvider* returns {quote} my text with any highlighting followed by more text {quote} As the PostingsHighligher cannot get any custom prefix or suffix, I would suggest set as default for the others as well to prevent any further text transformation post extracting the excerpts. was: *PostingsHighligher* returns for example {quote} [my text with any highlighting followed by more text] {quote} because the PostingsHighligher itself returns for each field a {{String[]}} of phrases limited by the beforehand given max phrases. This String[] is the transformed to String using {{Arrays.toString()}} at [LucenePropertyIndex.java#L688|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L688] causing the value to be wrapped in square brackets. *Highlighter* returns {quote} my text with any highlighting followed by more text {quote} *SimpleExcerptProvider* returns {quote} my text with any highlighting followed by more text {quote} As the PostingsHighligher cannot get any custom prefix or suffix, I would suggest set as default for the others as well to prevent any further text transformation post extracting the excerpts. > PostingsHighlighter, Highlighter and SimpleExcerptProvider return all > different formats for excerpts > > > Key: OAK-7071 > URL: https://issues.apache.org/jira/browse/OAK-7071 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7, 1.8 >Reporter: Dirk Rudolph > Labels: excerpt > > *PostingsHighligher* returns for example > {quote} > [my text with any highlighting followed by more text] > {quote} > because the PostingsHighligher itself returns for each field a {{String[]}} > of phrases limited by the beforehand given max phrases. This String[] is the > transformed to String using {{Arrays.toString()}} at > [LucenePropertyIndex.java#L688|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L688] > causing the value to be wrapped in square brackets. > *Highlighter* returns > {quote} > my text with any highlighting followed by more text > {quote} > *SimpleExcerptProvider* returns > {quote} > my text with any highlighting followed by more > text > {quote} > As the PostingsHighligher cannot get any custom prefix or suffix, I would > suggest set as default for the others as well to prevent any further > text transformation post extracting the excerpts. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309559#comment-16309559 ] Dirk Rudolph edited comment on OAK-7109 at 1/3/18 12:47 PM: Here is an example where constraints get lost in the filter: {code} select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or ([propa] = 'false' and not([propb] in('foo','bar'))) {code} It implements kind of white-/blacklisting/xor ala "If a is set to true, b has to be in a configured set, if not, b has not to be in the configured set." It evaluates to: {code} [nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where [nt:base].[propa] is not null */ {code} Which doesn't contain anything of propb, so in that case facet counting will be wrong as well. As you can see the query is in DNF, and querying with its disjunctive statements individually works, well. I attached a unit test showing it for this specific example ([restrictionPropagationTest.patch|https://issues.apache.org/jira/secure/attachment/12904386/restrictionPropagationTest.patch]) Edit: I think there are 2 issues here: 1) the OR of the query with both statements 2) the not with the query containing only the second disjunctive statement. was (Author: diru): Here is an example where constraints get lost in the filter: {code} select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or ([propa] = 'false' and not([propb] in('foo','bar'))) {code} It implements kind of white-/blacklisting/xor ala "If a is set to true, b has to be in a configured set, if not, b has not to be in the configured set." It evaluates to: {code} [nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where [nt:base].[propa] is not null */ {code} Which doesn't contain anything of propb, so in that case facet counting will be wrong as well. As you can see the query is in DNF, and querying with its disjunctive statements individually works, well. I attached a unit test showing it for this specific example ([restrictionPropagationTest.patch|https://issues.apache.org/jira/secure/attachment/12904386/restrictionPropagationTest.patch]) > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch, > restrictionPropagationTest.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the > [disjunctive normal > form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex > query and executing a query for each of the disjunctive statements. As this > is expanding exponentially its only a theoretical solution, nothing for > production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309559#comment-16309559 ] Dirk Rudolph edited comment on OAK-7109 at 1/3/18 12:14 PM: Here is an example where constraints get lost in the filter: {code} select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or ([propa] = 'false' and not([propb] in('foo','bar'))) {code} It implements kind of white-/blacklisting/xor ala "If a is set to true, b has to be in a configured set, if not, b has not to be in the configured set." It evaluates to: {code} [nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where [nt:base].[propa] is not null */ {code} Which doesn't contain anything of propb, so in that case facet counting will be wrong as well. As you can see the query is in DNF, and querying with its disjunctive statements individually works, well. I attached a unit test showing it for this specific example ([restrictionPropagationTest.patch|https://issues.apache.org/jira/secure/attachment/12904386/restrictionPropagationTest.patch]) was (Author: diru): Here is an example where constraints get lost in the filter: {code} select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or ([propa] = 'false' and not([propb] in('foo','bar'))) {code} It implements kind of white-/blacklisting ala "If a is set to true, b has to be in a configured set, if not, b has not to be in the configured set." It evaluates to: {code} [nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where [nt:base].[propa] is not null */ {code} Which doesn't contain anything of propb, so in that case facet counting will be wrong as well. As you can see the query is in DNF, and querying with its disjunctive statements individually works, well. I attached a unit test showing it for this specific example ([restrictionPropagationTest.patch|https://issues.apache.org/jira/secure/attachment/12904386/restrictionPropagationTest.patch]) > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch, > restrictionPropagationTest.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the > [disjunctive normal > form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex > query and executing a query for each of the disjunctive statements. As this > is expanding exponentially its only a theoretical solution, nothing for > production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309559#comment-16309559 ] Dirk Rudolph edited comment on OAK-7109 at 1/3/18 12:13 PM: Here is an example where constraints get lost in the filter: {code} select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or ([propa] = 'false' and not([propb] in('foo','bar'))) {code} It implements kind of white-/blacklisting ala "If a is set to true, b has to be in a configured set, if not, b has not to be in the configured set." It evaluates to: {code} [nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where [nt:base].[propa] is not null */ {code} Which doesn't contain anything of propb, so in that case facet counting will be wrong as well. As you can see the query is in DNF, and querying with its disjunctive statements individually works, well. I attached a unit test showing it for this specific example ([restrictionPropagationTest.patch|https://issues.apache.org/jira/secure/attachment/12904386/restrictionPropagationTest.patch]) was (Author: diru): Here is an example where constraints get lost in the filter: {code} select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or ([propa] = 'false' and not([propb] in('foo','bar'))) {code} It implements kind of white-/blacklisting ala "If a is set to true, b has to be in a configured set, if not, b has not to be in the configured set." It evaluates to: {code} [nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where [nt:base].[propa] is not null */ {code} Which doesn't contain anything of propb, so in that case facet counting will be wrong as well. As you can see the query is in DNF, and querying with its disjunctive statements individually works, well. I attached a unit test showing it for this specific example (restrictionPropagationTest.patch) > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch, > restrictionPropagationTest.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the > [disjunctive normal > form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex > query and executing a query for each of the disjunctive statements. As this > is expanding exponentially its only a theoretical solution, nothing for > production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309559#comment-16309559 ] Dirk Rudolph edited comment on OAK-7109 at 1/3/18 12:13 PM: Here is an example where constraints get lost in the filter: {code} select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or ([propa] = 'false' and not([propb] in('foo','bar'))) {code} It implements kind of white-/blacklisting ala "If a is set to true, b has to be in a configured set, if not, b has not to be in the configured set." It evaluates to: {code} [nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where [nt:base].[propa] is not null */ {code} Which doesn't contain anything of propb, so in that case facet counting will be wrong as well. As you can see the query is in DNF, and querying with its disjunctive statements individually works, well. I attached a unit test showing it for this specific example (restrictionPropagationTest.patch) was (Author: diru): Here is an example where constraints get lost in the filter: {code} select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or ([propa] = 'false' and not([propb] in('foo','bar'))) {code} It implements kind of white-/blacklisting ala "If a is set to true, b has to be in a configured set, if not, b has not to be in the configured set." It evaluates to: {code} [nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where [nt:base].[propa] is not null */ {code} Which doesn't contain anything of propb, so in that case facet counting will be wrong as well. As you can see the query is in DNF, and querying with its disjunctive statements individually works, well. I attached a unit test showing it for this specific example. > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch, > restrictionPropagationTest.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the > [disjunctive normal > form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex > query and executing a query for each of the disjunctive statements. As this > is expanding exponentially its only a theoretical solution, nothing for > production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309559#comment-16309559 ] Dirk Rudolph edited comment on OAK-7109 at 1/3/18 12:12 PM: Here is an example where constraints get lost in the filter: {code} select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or ([propa] = 'false' and not([propb] in('foo','bar'))) {code} It implements kind of white-/blacklisting ala "If a is set to true, b has to be in a configured set, if not, b has not to be in the configured set." It evaluates to: {code} [nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where [nt:base].[propa] is not null */ {code} Which doesn't contain anything of propb, so in that case facet counting will be wrong as well. As you can see the query is in DNF, and querying with its disjunctive statements individually works, well. I attached a unit test showing it for this specific example. was (Author: diru): Here is an example where constraints get lost in the filter: {code} select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or ([propa] = 'false' and not([propb] in('foo','bar'))) {code} It implements kind of white-/blacklisting ala "If a is set to true, b has to be in a configured set, if not, b has not to be in the configured set." It evaluates to: {code} [nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where [nt:base].[propa] is not null */ {code} Which doesn't contain anything of propb, so in that case facet counting will be wrong as well. > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch, > restrictionPropagationTest.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the > [disjunctive normal > form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex > query and executing a query for each of the disjunctive statements. As this > is expanding exponentially its only a theoretical solution, nothing for > production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-7109: -- Attachment: restrictionPropagationTest.patch > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch, > restrictionPropagationTest.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the > [disjunctive normal > form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex > query and executing a query for each of the disjunctive statements. As this > is expanding exponentially its only a theoretical solution, nothing for > production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309559#comment-16309559 ] Dirk Rudolph commented on OAK-7109: --- Here is an example where constraints get lost in the filter: {code} select * from [nt:base] where ([propa] = 'true' and [propb] in('foo','bar')) or ([propa] = 'false' and not([propb] in('foo','bar'))) {code} It implements kind of white-/blacklisting ala "If a is set to true, b has to be in a configured set, if not, b has not to be in the configured set." It evaluates to: {code} [nt:base] as [nt:base] /* lucene:test2(/oak:index/test2) propa:[* TO *] where [nt:base].[propa] is not null */ {code} Which doesn't contain anything of propb, so in that case facet counting will be wrong as well. > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the > [disjunctive normal > form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex > query and executing a query for each of the disjunctive statements. As this > is expanding exponentially its only a theoretical solution, nothing for > production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309376#comment-16309376 ] Dirk Rudolph commented on OAK-7109: --- Hi [~catholicon] somehow the mail agent doesn't accept my mailings to oak-dev (I'm subscribed and receiving mail but sending doesn't work ... anyway). I checked the implementation of the optimisation and its not in dnf, as the optimisation is not done on the negation normal form of the query (so not(a or b) are not properly expanded to not(a) and not(b). For example (based on org.apache.jackrabbit.oak.query.SQL2OptimiseQueryTest#optimiseAndOrAnd()): {code} given ([a]=1 or [b]=2 or ([c]=3 and not([d]=4 or [e]=5))) and [x]=6 <=> ([a]=1 or [b]=2 or ([c]=3 and [d]<>4 and [e]<>5))) and [x]=6 expected ([a]=1 and [x]=6), ([b]=2 and [x]=6), ([c]=3 and [d]<>4 and [e]<>5 and [x]=6) actual ((c = 3) and (not ((d = 4) or (e = 5 and (x = 6), (b = 2) and (x = 6), (a = 1) and (x = 6) {code} And even, assuming we would have the alternative being a DNF and facet counting across unions would be supported merging the results from each of the queries given to lucene, the result will still be wrong as each of the disjunctive statements will not be mutually exclusive (as it would be with xor). So from my perspective there is not way to get proper facet counts in that case from consumer side and only the option of b) filtering the documents based on the filter c) passing all constraints to lucene would work. Regarding b) as from what I can see in the code base the nodes are not actually read but only the permissions on their path are checked in [FilteredSortedSetDocValuesFacetCounts.java#L91|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/util/FilteredSortedSetDocValuesFacetCounts.java#L91] I will check further why our specific query doesn't get entirely passed to lucene (or better which constraints are not taken into account beside the path constraints). Anyway as a user of the jcr api I would expect a RepositoryException (or any other) when I try to run a query with facet extraction that no index can provide - similar to the exception I get when the field I extract facets on is not stored. > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the > [disjunctive normal > form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex > query and executing a query for each of the disjunctive statements. As this > is expanding exponentially its only a theoretical solution, nothing for > production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-7078) NullPointerException in FilteredSortedSetDocValuesFacetCounts during query evaluation
[ https://issues.apache.org/jira/browse/OAK-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16301390#comment-16301390 ] Dirk Rudolph commented on OAK-7078: --- [~catholicon] any thought on that one? you can apply only the provided unit test to see the exception happening without the null check. > NullPointerException in FilteredSortedSetDocValuesFacetCounts during query > evaluation > - > > Key: OAK-7078 > URL: https://issues.apache.org/jira/browse/OAK-7078 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > > Running the following query {{select \[rep:facet(simple/tags)] from > \[nt:base] where contains(\[text], 'ipsum')}} with the following content > {code} > /content/foo > - text = "lorem lorem" > + simple/ >- tags = ["tag1", "tag2"] > /content/bar > - text = "lorem ipsum" > {code} > runs in the following NPE > {code} > java.lang.NullPointerException > at > org.apache.jackrabbit.oak.plugins.index.lucene.util.FilteredSortedSetDocValuesFacetCounts.getTopChildren(FilteredSortedSetDocValuesFacetCounts.java:63) > at > org.apache.lucene.facet.MultiFacets.getTopChildren(MultiFacets.java:52) > at > org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LucenePathCursor$2.getValue(LucenePropertyIndex.java:1646) > ... 38 more > {code} > This is because the result set for the query only contains {{/content/bar}} > and with that the count of the dimension {{simple/tag}} is 0. For that case > [SortedSetDocValuesFacetCounts#getDim()|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.7.1/lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java#L108] > returns {{null}} and so does {{getTopChildren}}. > This expected behaviour is properly handled in > [LucenePropertyIndex.java#L1647|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L1647] > but not in > [FilteredSortedSetDocValuesFacetCounts.java#L63|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/util/FilteredSortedSetDocValuesFacetCounts.java#L63] > where {{topChildren}} is dereferenced without null check. > To workaround that secure facets can be set to false, though the default > value is true. > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16301325#comment-16301325 ] Dirk Rudolph edited comment on OAK-7109 at 12/22/17 12:58 PM: -- Yeah support of unions with facets doesn't work well, as facets are extracted on each row, though they related to the result not the rows. Will open an improvement for that as well as this has some costs: basically calling getTopChildren() for each row while iterating the result set. With splitting the result I didn't mean running the query in a union but running individual queries merging their RowIterators sets manually and extracting facets only from the first hit of each merging them together as well. That basically works but as I said I would have to rewrite the query in DNF like in the example: {code:title=distribute and over or} contains(a.[*], 'ipsum') and (isdescendantnode(a,'/content1') or isdescendantnode(a,'/content2')) <=> (contains(a.[*], 'ipsum') and isdescendantnode(a,'/content1')) or (contains(a.[*], 'ipsum') and isdescendantnode(a,'/content2'))) {code} {code:title=split and run query for each disjunctive statement} contains(a.[*], 'ipsum') and isdescendantnode(a,'/content1') contains(a.[*], 'ipsum') and isdescendantnode(a,'/content2') {code} That basically works, but only in the case that both queries hit the same index as only then TF/IDF score is comparable (also across multiple queries). So the solutions I see are: a) creating DNF disjunctive statements of a query as alternatives (not sure if the alternative currently created is DNF) and support proper counting over union queries b) filtering the results in the using the query plans filter while counting facets, similar to the way its done for ACLs c) implementing a mode which translates any query as it is to its lucene equivalent Both a) and b) come probably with a drawback on performance. c) might not even be feasible. For our real world case the complexity is not only given by the path restriction but there are more restrictions conjunct to it. We tried already running one query for each path, but even with that the individual queries are too complex to be passed to lucene with all constraints. (not entirely sure why though ...) Edit: opened OAK-7110 for counting facets only once per result, not once per row. was (Author: diru): Yeah support of unions with facets doesn't work well, as facets are extracted on each row, though they related to the result not the rows. Will open an improvement for that as well as this has some costs: basically calling getTopChildren() for each row while iterating the result set. With splitting the result I didn't mean running the query in a union but running individual queries merging their RowIterators sets manually and extracting facets only from the first hit of each merging them together as well. That basically works but as I said I would have to rewrite the query in DNF like in the example: {code:title=distribute and over or} contains(a.[*], 'ipsum') and (isdescendantnode(a,'/content1') or isdescendantnode(a,'/content2')) <=> (contains(a.[*], 'ipsum') and isdescendantnode(a,'/content1')) or (contains(a.[*], 'ipsum') and isdescendantnode(a,'/content2'))) {code} {code:title=split and run query for each disjunctive statement} contains(a.[*], 'ipsum') and isdescendantnode(a,'/content1') contains(a.[*], 'ipsum') and isdescendantnode(a,'/content2') {code} That basically works, but only in the case that both queries hit the same index as only then TF/IDF score is comparable (also across multiple queries). So the solutions I see are: a) creating DNF disjunctive statements of a query as alternatives (not sure if the alternative currently created is DNF) and support proper counting over union queries b) filtering the results in the using the query plans filter while counting facets, similar to the way its done for ACLs c) implementing a mode which translates any query as it is to its lucene equivalent Both a) and b) come probably with a drawback on performance. c) might not even be feasible. Edit: opened OAK-7110 for counting facets only once per result, not once per row. > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a whe
[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16301325#comment-16301325 ] Dirk Rudolph edited comment on OAK-7109 at 12/22/17 12:55 PM: -- Yeah support of unions with facets doesn't work well, as facets are extracted on each row, though they related to the result not the rows. Will open an improvement for that as well as this has some costs: basically calling getTopChildren() for each row while iterating the result set. With splitting the result I didn't mean running the query in a union but running individual queries merging their RowIterators sets manually and extracting facets only from the first hit of each merging them together as well. That basically works but as I said I would have to rewrite the query in DNF like in the example: {code:title=distribute and over or} contains(a.[*], 'ipsum') and (isdescendantnode(a,'/content1') or isdescendantnode(a,'/content2')) <=> (contains(a.[*], 'ipsum') and isdescendantnode(a,'/content1')) or (contains(a.[*], 'ipsum') and isdescendantnode(a,'/content2'))) {code} {code:title=split and run query for each disjunctive statement} contains(a.[*], 'ipsum') and isdescendantnode(a,'/content1') contains(a.[*], 'ipsum') and isdescendantnode(a,'/content2') {code} That basically works, but only in the case that both queries hit the same index as only then TF/IDF score is comparable (also across multiple queries). So the solutions I see are: a) creating DNF disjunctive statements of a query as alternatives (not sure if the alternative currently created is DNF) and support proper counting over union queries b) filtering the results in the using the query plans filter while counting facets, similar to the way its done for ACLs c) implementing a mode which translates any query as it is to its lucene equivalent Both a) and b) come probably with a drawback on performance. c) might not even be feasible. Edit: opened OAK-7110 for counting facets only once per result, not once per row. was (Author: diru): Yeah support of unions with facets doesn't work well, as facets are extracted on each row, though they related to the result not the rows. Will open an improvement for that as well as this has some costs: basically calling getTopChildren() for each row while iterating the result set. With splitting the result I didn't mean running the query in a union but running individual queries merging their RowIterators sets manually and extracting facets only from the first hit of each merging them together as well. That basically works but as I said I would have to rewrite the query in DNF like in the example: {code} select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 'ipsum') and isdescendantnode(a,'/content1') select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 'ipsum') and isdescendantnode(a,'/content2') {code} That basically works, but only in the case that both queries hit the same index as only then TF/IDF score is comparable (also across multiple queries). So the solutions I see are: a) creating DNF disjunctive statements of a query as alternatives (not sure if the alternative currently created is DNF) and support proper counting over union queries b) filtering the results in the using the query plans filter while counting facets, similar to the way its done for ACLs c) implementing a mode which translates any query as it is to its lucene equivalent Both a) and b) come probably with a drawback on performance. c) might not even be feasible. Edit: opened OAK-7110 for counting facets only once per result, not once per row. > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > -
[jira] [Comment Edited] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16301325#comment-16301325 ] Dirk Rudolph edited comment on OAK-7109 at 12/22/17 12:43 PM: -- Yeah support of unions with facets doesn't work well, as facets are extracted on each row, though they related to the result not the rows. Will open an improvement for that as well as this has some costs: basically calling getTopChildren() for each row while iterating the result set. With splitting the result I didn't mean running the query in a union but running individual queries merging their RowIterators sets manually and extracting facets only from the first hit of each merging them together as well. That basically works but as I said I would have to rewrite the query in DNF like in the example: {code} select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 'ipsum') and isdescendantnode(a,'/content1') select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 'ipsum') and isdescendantnode(a,'/content2') {code} That basically works, but only in the case that both queries hit the same index as only then TF/IDF score is comparable (also across multiple queries). So the solutions I see are: a) creating DNF disjunctive statements of a query as alternatives (not sure if the alternative currently created is DNF) and support proper counting over union queries b) filtering the results in the using the query plans filter while counting facets, similar to the way its done for ACLs c) implementing a mode which translates any query as it is to its lucene equivalent Both a) and b) come probably with a drawback on performance. c) might not even be feasible. Edit: opened OAK-7110 for counting facets only once per result, not once per row. was (Author: diru): Yeah support of unions with facets doesn't work well, as facets are extracted on each row, though they related to the result not the rows. Will open an improvement for that as well as this has some costs: basically calling getTopChildren() for each row while iterating the result set. With splitting the result I didn't mean running the query in a union but running individual queries merging their RowIterators sets manually and extracting facets only from the first hit of each merging them together as well. That basically works but as I said I would have to rewrite the query in DNF like in the example: {code} select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 'ipsum') and isdescendantnode(a,'/content1') select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 'ipsum') and isdescendantnode(a,'/content2') {code} That basically works, but only in the case that both queries hit the same index as only then TF/IDF score is comparable (also across multiple queries). So the solutions I see are: a) creating DNF disjunctive statements of a query as alternatives (not sure if the alternative currently created is DNF) and support proper counting over union queries b) filtering the results in the using the query plans filter while counting facets, similar to the way its done for ACLs c) implementing a mode which translates any query as it is to its lucene equivalent Both a) and b) come probably with a drawback on performance. c) might not even be feasible. > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The
[jira] [Created] (OAK-7110) Run rep:facet counting only once per lucene result
Dirk Rudolph created OAK-7110: - Summary: Run rep:facet counting only once per lucene result Key: OAK-7110 URL: https://issues.apache.org/jira/browse/OAK-7110 Project: Jackrabbit Oak Issue Type: Improvement Components: lucene Affects Versions: 1.6.7 Reporter: Dirk Rudolph Priority: Minor Currently facet counting [(calling Facets#getTopChildren)|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L1752] is called for each facet field for each row. This is because constructing [QueryImpl|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/QueryImpl.java#L876] reads all columns of each row and so it read the facets as well. This might have a negative impact on performance extracting facets (not proven) and can be optimised by caching the counted topChildren for each field in the scope of the result, returning the cache result for subsequent calls. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-7109: -- Description: eComplex queries in that case are queries, which are passed to lucene not containing all original constraints. For example queries with multiple path restrictions like: {code} select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 'ipsum') and (isdescendantnode(a,'/content1') or isdescendantnode(a,'/content2')) {code} In that particular case the index planer gives ":fulltext:ipsum" to lucene even though the index supports evaluating path constraints. As counting the facets happens on the raw result of lucene, the returned facets are incorrect. For example having the following content {code} /content1/test/foo + text = lorem ipsum - simple/ + tags = tag1, tag2 /content2/test/bar + text = lorem ipsum - simple/ + tags = tag1, tag2 /content3/test/bar + text = lorem ipsum - simple/ + tags = tag1, tag2 {code} the expected result for the dimensions of simple/tags and the query above is - tag1: 2 - tag2: 2 as the result set is 2 results long and all documents are equal. The actual result set is - tag1: 3 - tag2: 3 as the path constraint is not handled by lucene. To workaround that the only solution that came to my mind is building the [disjunctive normal form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex query and executing a query for each of the disjunctive statements. As this is expanding exponentially its only a theoretical solution, nothing for production. was: eComplex queries in that case are queries, which are passed to lucene not containing all original constraints. For example queries with multiple path restrictions like: {code} select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 'ipsum') and (isdescendantnode(a,'/content1') or isdescendantnode(a,'/content2')) {code} In that particular case the index planer gives ":fulltext:ipsum" to lucene even though the index supports evaluating path constraints. As counting the facets happens on the raw result of lucene, the returned facets are incorrect. For example having the following content {code} /content1/test/foo + text = lorem ipsum - simple/ + tags = tag1, tag2 /content2/test/bar + text = lorem ipsum - simple/ + tags = tag1, tag2 /content3/test/bar + text = lorem ipsum - simple/ + tags = tag1, tag2 {code} the expected result for the dimensions of simple/tags and the query above is - tag1: 2 - tag2: 2 as the result set is 2 results long and all documents are equal. The actual result set is - tag1: 3 - tag2: 3 as the path constraint is not handled by lucene. To workaround that the only solution that came to my mind is building the DNF of my complex query and executing a query for each of the disjunctive statements. As this is expanding exponentially its only a theoretical solution, nothing for production. > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the > [disjunctive normal > form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex > query and executing a query for each of the disjunctive statements. As this > is expanding exponentially its only a theoretical solu
[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16301325#comment-16301325 ] Dirk Rudolph commented on OAK-7109: --- Yeah support of unions with facets doesn't work well, as facets are extracted on each row, though they related to the result not the rows. Will open an improvement for that as well as this has some costs: basically calling getTopChildren() for each row while iterating the result set. With splitting the result I didn't mean running the query in a union but running individual queries merging their RowIterators sets manually and extracting facets only from the first hit of each merging them together as well. That basically works but as I said I would have to rewrite the query in DNF like in the example: {code} select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 'ipsum') and isdescendantnode(a,'/content1') select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 'ipsum') and isdescendantnode(a,'/content2') {code} That basically works, but only in the case that both queries hit the same index as only then TF/IDF score is comparable (also across multiple queries). So the solutions I see are: a) creating DNF disjunctive statements of a query as alternatives (not sure if the alternative currently created is DNF) and support proper counting over union queries b) filtering the results in the using the query plans filter while counting facets, similar to the way its done for ACLs c) implementing a mode which translates any query as it is to its lucene equivalent Both a) and b) come probably with a drawback on performance. c) might not even be feasible. > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the DNF > of my complex query and executing a query for each of the disjunctive > statements. As this is expanding exponentially its only a theoretical > solution, nothing for production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-7109: -- Labels: facet (was: ) > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > Attachments: facetsInMultipleRoots.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the DNF > of my complex query and executing a query for each of the disjunctive > statements. As this is expanding exponentially its only a theoretical > solution, nothing for production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-7109: -- Description: eComplex queries in that case are queries, which are passed to lucene not containing all original constraints. For example queries with multiple path restrictions like: {code} select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 'ipsum') and (isdescendantnode(a,'/content1') or isdescendantnode(a,'/content2')) {code} In that particular case the index planer gives ":fulltext:ipsum" to lucene even though the index supports evaluating path constraints. As counting the facets happens on the raw result of lucene, the returned facets are incorrect. For example having the following content {code} /content1/test/foo + text = lorem ipsum - simple/ + tags = tag1, tag2 /content2/test/bar + text = lorem ipsum - simple/ + tags = tag1, tag2 /content3/test/bar + text = lorem ipsum - simple/ + tags = tag1, tag2 {code} the expected result for the dimensions of simple/tags and the query above is - tag1: 2 - tag2: 2 as the result set is 2 results long and all documents are equal. The actual result set is - tag1: 3 - tag2: 3 as the path constraint is not handled by lucene. To workaround that the only solution that came to my mind is building the DNF of my complex query and executing a query for each of the disjunctive statements. As this is expanding exponentially its only a theoretical solution, nothing for production. was: Complex queries in that case are queries, which are passed to lucene not containing all original constraints. For example queries with multiple path restrictions like: {code} select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 'ipsum') and (isdescendantnode(a,'/content1') or isdescendantnode(a,'/content2')) {code} In that particular case the index planer gives ":fulltext:ipsum" to lucene even though the index supports evaluating path constraints. As counting the facets happens on the raw result of lucene, the returned facets are incorrect. For example having the following content {code} /content1/test/foo + text = lorem ipsum - simple/ + tags = tag1, tag2 /content2/test/bar + text = lorem ipsum - simple/ + tags = tag1, tag2 /content1/test/bar + text = lorem ipsum - simple/ + tags = tag1, tag2 {code} the expected result for the dimensions of simple/tags and the query above is - tag1: 2 - tag2: 2 as the result set is 2 results long and all documents are equal. The actual result set is - tag1: 3 - tag2: 3 as the path constraint is not handled by lucene. To workaround that the only solution that came to my mind is building the DNF of my complex query and executing a query for each of the disjunctive statements. As this is expanding exponentially its only a theoretical solution, nothing for production. > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Attachments: facetsInMultipleRoots.patch > > > eComplex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content3/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the DNF > of my complex query and executing a query for each of the disjunctive > statements. As this is expanding exponentially its only a theoretical > solution, nothing for production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-7109) rep:facet returns wrong results for complex queries
[ https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-7109: -- Attachment: facetsInMultipleRoots.patch > rep:facet returns wrong results for complex queries > --- > > Key: OAK-7109 > URL: https://issues.apache.org/jira/browse/OAK-7109 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Attachments: facetsInMultipleRoots.patch > > > Complex queries in that case are queries, which are passed to lucene not > containing all original constraints. For example queries with multiple path > restrictions like: > {code} > select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], > 'ipsum') and (isdescendantnode(a,'/content1') or > isdescendantnode(a,'/content2')) > {code} > In that particular case the index planer gives ":fulltext:ipsum" to lucene > even though the index supports evaluating path constraints. > As counting the facets happens on the raw result of lucene, the returned > facets are incorrect. For example having the following content > {code} > /content1/test/foo > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content2/test/bar > + text = lorem ipsum > - simple/ > + tags = tag1, tag2 > /content1/test/bar > + text = lorem ipsum > - simple/ >+ tags = tag1, tag2 > {code} > the expected result for the dimensions of simple/tags and the query above is > - tag1: 2 > - tag2: 2 > as the result set is 2 results long and all documents are equal. The actual > result set is > - tag1: 3 > - tag2: 3 > as the path constraint is not handled by lucene. > To workaround that the only solution that came to my mind is building the DNF > of my complex query and executing a query for each of the disjunctive > statements. As this is expanding exponentially its only a theoretical > solution, nothing for production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OAK-7109) rep:facet returns wrong results for complex queries
Dirk Rudolph created OAK-7109: - Summary: rep:facet returns wrong results for complex queries Key: OAK-7109 URL: https://issues.apache.org/jira/browse/OAK-7109 Project: Jackrabbit Oak Issue Type: Bug Components: lucene Affects Versions: 1.6.7 Reporter: Dirk Rudolph Complex queries in that case are queries, which are passed to lucene not containing all original constraints. For example queries with multiple path restrictions like: {code} select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 'ipsum') and (isdescendantnode(a,'/content1') or isdescendantnode(a,'/content2')) {code} In that particular case the index planer gives ":fulltext:ipsum" to lucene even though the index supports evaluating path constraints. As counting the facets happens on the raw result of lucene, the returned facets are incorrect. For example having the following content {code} /content1/test/foo + text = lorem ipsum - simple/ + tags = tag1, tag2 /content2/test/bar + text = lorem ipsum - simple/ + tags = tag1, tag2 /content1/test/bar + text = lorem ipsum - simple/ + tags = tag1, tag2 {code} the expected result for the dimensions of simple/tags and the query above is - tag1: 2 - tag2: 2 as the result set is 2 results long and all documents are equal. The actual result set is - tag1: 3 - tag2: 3 as the path constraint is not handled by lucene. To workaround that the only solution that came to my mind is building the DNF of my complex query and executing a query for each of the disjunctive statements. As this is expanding exponentially its only a theoretical solution, nothing for production. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750
[ https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296560#comment-16296560 ] Dirk Rudolph edited comment on OAK-7070 at 12/19/17 9:55 AM: - Thanks [~catholicon] Your understanding is correct. And yes, I will move the comment to OAK-6597. To give a bit more context: I'm working currently on an AEM 6.3 project implementing fulltext search and we recently upgraded to 1.6.7 to make use of the changes in OAK-6750. Though our requirements also ask for excerpts and that's why I investigated in OAK-6597 as well and asked there for a backport it to 1.6. As this is blocking OAK-6597 and if we agree on making OAK-6597 available in 1.6 I would still like to backport it. The risk should be minimal and afaik I applied those changes to my for of 1.6 without any problems. Don't doing so opens the risk for us to use an unoffical port of oak for our project - or rejecting some of the customers requirements. This also comes together with OAK-7078 and OAK-7071. was (Author: diru): Thanks [~catholicon] Your understanding is correct. And yes, I will move the comment to OAK-6597. To give a bit more context: I'm working currently on an AEM 6.3 project implementing fulltext search and we recently upgraded to 1.6.7 to make use of the changes in OAK-6750. Though our requirements also ask for excerpts and that's why I investigated in OAK-6597 as well and ask there for a backport to 1.6 too. As this is blocking OAK-6597 and if we agree on making OAK-6597 available in 1.6 I would still like to backport it. The risk should be minimal and afaik I applied those changes to my for of 1.6 without any problems. Don't doing so opens the risk for us to use an unoffical port of oak for our project - or rejecting some of the customers requirements. This also comes together with OAK-7078 and OAK-7071. > rep:excerpt selector broken as regression of OAK-6750 > - > > Key: OAK-7070 > URL: https://issues.apache.org/jira/browse/OAK-7070 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7, 1.8 >Reporter: Dirk Rudolph >Assignee: Vikas Saurabh > Labels: excerpt > > The change made here: > https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114 > breaks the logic in line 676: > {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}} > This statement doesn't make much sense considering a query like {{select > \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or > even {{select \[rep:excerpt(text)] from \[test:Page] as page where > contains(page.\[text], 'term\*')}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Issue Comment Deleted] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750
[ https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-7070: -- Comment: was deleted (was: There is still the risk, that duplication appear in the excerpt because there is a highlighting hit in {{:fulltext}} and one for example in {{full:bar}}. To prevent that, it probably makes sense to first do the highlighting on {{:fulltext}} fields when analyzeFulltext is enabled and only if that hasn't been success full we fallback to the logic of highlighting {{full:}} fields. wdyt?) > rep:excerpt selector broken as regression of OAK-6750 > - > > Key: OAK-7070 > URL: https://issues.apache.org/jira/browse/OAK-7070 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7, 1.8 >Reporter: Dirk Rudolph >Assignee: Vikas Saurabh > Labels: excerpt > > The change made here: > https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114 > breaks the logic in line 676: > {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}} > This statement doesn't make much sense considering a query like {{select > \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or > even {{select \[rep:excerpt(text)] from \[test:Page] as page where > contains(page.\[text], 'term\*')}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene
[ https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296561#comment-16296561 ] Dirk Rudolph commented on OAK-6597: --- There is still the risk, that duplication appear in the excerpt because there is a highlighting hit in :fulltext and one for example in full:bar. To prevent that, it probably makes sense to first do the highlighting on :fulltext fields when analyzeFulltext is enabled and only if that hasn't been success full we fallback to the logic of highlighting full: fields. wdyt? > rep:excerpt not working for content indexed by aggregation in lucene > > > Key: OAK-6597 > URL: https://issues.apache.org/jira/browse/OAK-6597 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.1, 1.7.6, 1.8 >Reporter: Dirk Rudolph >Assignee: Chetan Mehrotra > Labels: excerpt > Fix For: 1.10 > > Attachments: excerpt-with-aggregation-test.patch > > > I mentioned that properties that got indexed due to an aggregation are not > considered for excerpts (highlighting) as they are not indexed as stored > fields. > See the attached patch that implements a test for excerpts in > {{LuceneIndexAggregationTest2}}. > It creates the following structure: > {code} > /content/foo [test:Page] > + bar (String) > - jcr:content [test:PageContent] > + bar (String) > {code} > where both strings (the _bar_ property at _foo_ and the _bar_ property at > _jcr:content_) contain different text. > Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in > _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the > former one the excerpt is properly provided for the later one it isn't. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene
[ https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296561#comment-16296561 ] Dirk Rudolph edited comment on OAK-6597 at 12/19/17 9:51 AM: - There is still the risk, that duplications appear in the excerpt because there is a highlighting hit in :fulltext and one for example in full:bar. To prevent that, it probably makes sense to first do the highlighting on :fulltext fields when analyzeFulltext is enabled and only if that hasn't been successful we fallback to the logic of highlighting full: fields. wdyt? was (Author: diru): There is still the risk, that duplications appear in the excerpt because there is a highlighting hit in :fulltext and one for example in full:bar. To prevent that, it probably makes sense to first do the highlighting on :fulltext fields when analyzeFulltext is enabled and only if that hasn't been success full we fallback to the logic of highlighting full: fields. wdyt? > rep:excerpt not working for content indexed by aggregation in lucene > > > Key: OAK-6597 > URL: https://issues.apache.org/jira/browse/OAK-6597 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.1, 1.7.6, 1.8 >Reporter: Dirk Rudolph >Assignee: Chetan Mehrotra > Labels: excerpt > Fix For: 1.10 > > Attachments: excerpt-with-aggregation-test.patch > > > I mentioned that properties that got indexed due to an aggregation are not > considered for excerpts (highlighting) as they are not indexed as stored > fields. > See the attached patch that implements a test for excerpts in > {{LuceneIndexAggregationTest2}}. > It creates the following structure: > {code} > /content/foo [test:Page] > + bar (String) > - jcr:content [test:PageContent] > + bar (String) > {code} > where both strings (the _bar_ property at _foo_ and the _bar_ property at > _jcr:content_) contain different text. > Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in > _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the > former one the excerpt is properly provided for the later one it isn't. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene
[ https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296561#comment-16296561 ] Dirk Rudolph edited comment on OAK-6597 at 12/19/17 9:51 AM: - There is still the risk, that duplications appear in the excerpt because there is a highlighting hit in :fulltext and one for example in full:bar. To prevent that, it probably makes sense to first do the highlighting on :fulltext fields when analyzeFulltext is enabled and only if that hasn't been success full we fallback to the logic of highlighting full: fields. wdyt? was (Author: diru): There is still the risk, that duplication appear in the excerpt because there is a highlighting hit in :fulltext and one for example in full:bar. To prevent that, it probably makes sense to first do the highlighting on :fulltext fields when analyzeFulltext is enabled and only if that hasn't been success full we fallback to the logic of highlighting full: fields. wdyt? > rep:excerpt not working for content indexed by aggregation in lucene > > > Key: OAK-6597 > URL: https://issues.apache.org/jira/browse/OAK-6597 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.1, 1.7.6, 1.8 >Reporter: Dirk Rudolph >Assignee: Chetan Mehrotra > Labels: excerpt > Fix For: 1.10 > > Attachments: excerpt-with-aggregation-test.patch > > > I mentioned that properties that got indexed due to an aggregation are not > considered for excerpts (highlighting) as they are not indexed as stored > fields. > See the attached patch that implements a test for excerpts in > {{LuceneIndexAggregationTest2}}. > It creates the following structure: > {code} > /content/foo [test:Page] > + bar (String) > - jcr:content [test:PageContent] > + bar (String) > {code} > where both strings (the _bar_ property at _foo_ and the _bar_ property at > _jcr:content_) contain different text. > Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in > _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the > former one the excerpt is properly provided for the later one it isn't. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750
[ https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296560#comment-16296560 ] Dirk Rudolph commented on OAK-7070: --- Thanks [~catholicon] Your understanding is correct. And yes, I will move the comment to OAK-6597. To give a bit more context: I'm working currently on an AEM 6.3 project implementing fulltext search and we recently upgraded to 1.6.7 to make use of the changes in OAK-6750. Though our requirements also ask for excerpts and that's why I investigated in OAK-6597 as well and ask there for a backport to 1.6 too. As this is blocking OAK-6597 and if we agree on making OAK-6597 available in 1.6 I would still like to backport it. The risk should be minimal and afaik I applied those changes to my for of 1.6 without any problems. Don't doing so opens the risk for us to use an unoffical port of oak for our project - or rejecting some of the customers requirements. This also comes together with OAK-7078 and OAK-7071. > rep:excerpt selector broken as regression of OAK-6750 > - > > Key: OAK-7070 > URL: https://issues.apache.org/jira/browse/OAK-7070 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7, 1.8 >Reporter: Dirk Rudolph >Assignee: Vikas Saurabh > Labels: excerpt > > The change made here: > https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114 > breaks the logic in line 676: > {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}} > This statement doesn't make much sense considering a query like {{select > \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or > even {{select \[rep:excerpt(text)] from \[test:Page] as page where > contains(page.\[text], 'term\*')}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-7078) NullPointerException in FilteredSortedSetDocValuesFacetCounts during query evaluation
[ https://issues.apache.org/jira/browse/OAK-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-7078: -- Description: Running the following query {{select \[rep:facet(simple/tags)] from \[nt:base] where contains(\[text], 'ipsum')}} with the following content {code} /content/foo - text = "lorem lorem" + simple/ - tags = ["tag1", "tag2"] /content/bar - text = "lorem ipsum" {code} runs in the following NPE {code} java.lang.NullPointerException at org.apache.jackrabbit.oak.plugins.index.lucene.util.FilteredSortedSetDocValuesFacetCounts.getTopChildren(FilteredSortedSetDocValuesFacetCounts.java:63) at org.apache.lucene.facet.MultiFacets.getTopChildren(MultiFacets.java:52) at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LucenePathCursor$2.getValue(LucenePropertyIndex.java:1646) ... 38 more {code} This is because the result set for the query only contains {{/content/bar}} and with that the count of the dimension {{simple/tag}} is 0. For that case [SortedSetDocValuesFacetCounts#getDim()|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.7.1/lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java#L108] returns {{null}} and so does {{getTopChildren}}. This expected behaviour is properly handled in [LucenePropertyIndex.java#L1647|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L1647] but not in [FilteredSortedSetDocValuesFacetCounts.java#L63|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/util/FilteredSortedSetDocValuesFacetCounts.java#L63] where {{topChildren}} is dereferenced without null check. To workaround that secure facets can be set to false, though the default value is true. was: Running the following query {{select \[rep:facet(simple/tags)] from \[nt:base] where contains(\[text], 'ipsum')}} with the following content {code} /content/foo - text = "lorem lorem" + simple/ - tags = ["tag1", "tag2"] /content/bar - text = "lorem {code} runs in the following NPE {code} java.lang.NullPointerException at org.apache.jackrabbit.oak.plugins.index.lucene.util.FilteredSortedSetDocValuesFacetCounts.getTopChildren(FilteredSortedSetDocValuesFacetCounts.java:63) at org.apache.lucene.facet.MultiFacets.getTopChildren(MultiFacets.java:52) at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LucenePathCursor$2.getValue(LucenePropertyIndex.java:1646) ... 38 more {code} This is because the result set for the query only contains {{/content/bar}} and with that the count of the dimension {{simple/tag}} is 0. For that case [SortedSetDocValuesFacetCounts#getDim()|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.7.1/lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java#L108] returns {{null}} and so does {{getTopChildren}}. This expected behaviour is properly handled in [LucenePropertyIndex.java#L1647|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L1647] but not in [FilteredSortedSetDocValuesFacetCounts.java#L63|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/util/FilteredSortedSetDocValuesFacetCounts.java#L63] where {{topChildren}} is dereferenced without null check. To workaround that secure facets can be set to false, though the default value is true. > NullPointerException in FilteredSortedSetDocValuesFacetCounts during query > evaluation > - > > Key: OAK-7078 > URL: https://issues.apache.org/jira/browse/OAK-7078 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > > Running the following query {{select \[rep:facet(simple/tags)] from > \[nt:base] where contains(\[text], 'ipsum')}} with the following content > {code} > /content/foo > - text = "lorem lorem" > + simple/ >- tags = ["tag1", "tag2"] > /content/bar > - text = "lorem ipsum" > {code} > runs in the following NPE > {code} > java.lang.NullPointerException > at > org.apache.jackrabbit.oak.plugins.index.lucene.util.FilteredSortedSetDocValuesFacetCounts.getTopChildren(FilteredSortedSetDocValuesFacetCounts.java:63) > at > org.apache.lucene.facet.MultiFacets.getTopChildren(MultiFacets.java:52) > at > org.apache.jackrabbit.oak.plugins.index.lucene.Luc
[jira] [Updated] (OAK-7078) NullPointerException in FilteredSortedSetDocValuesFacetCounts during query evaluation
[ https://issues.apache.org/jira/browse/OAK-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-7078: -- Labels: facet (was: ) > NullPointerException in FilteredSortedSetDocValuesFacetCounts during query > evaluation > - > > Key: OAK-7078 > URL: https://issues.apache.org/jira/browse/OAK-7078 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > Labels: facet > > Running the following query {{select \[rep:facet(simple/tags)] from > \[nt:base] where contains(\[text], 'ipsum')}} with the following content > {code} > /content/foo > - text = "lorem lorem" > + simple/ >- tags = ["tag1", "tag2"] > /content/bar > - text = "lorem > {code} > runs in the following NPE > {code} > java.lang.NullPointerException > at > org.apache.jackrabbit.oak.plugins.index.lucene.util.FilteredSortedSetDocValuesFacetCounts.getTopChildren(FilteredSortedSetDocValuesFacetCounts.java:63) > at > org.apache.lucene.facet.MultiFacets.getTopChildren(MultiFacets.java:52) > at > org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LucenePathCursor$2.getValue(LucenePropertyIndex.java:1646) > ... 38 more > {code} > This is because the result set for the query only contains {{/content/bar}} > and with that the count of the dimension {{simple/tag}} is 0. For that case > [SortedSetDocValuesFacetCounts#getDim()|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.7.1/lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java#L108] > returns {{null}} and so does {{getTopChildren}}. > This expected behaviour is properly handled in > [LucenePropertyIndex.java#L1647|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L1647] > but not in > [FilteredSortedSetDocValuesFacetCounts.java#L63|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/util/FilteredSortedSetDocValuesFacetCounts.java#L63] > where {{topChildren}} is dereferenced without null check. > To workaround that secure facets can be set to false, though the default > value is true. > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-7078) NullPointerException in FilteredSortedSetDocValuesFacetCounts during query evaluation
[ https://issues.apache.org/jira/browse/OAK-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295793#comment-16295793 ] Dirk Rudolph edited comment on OAK-7078 at 12/18/17 10:52 PM: -- I created [#77|https://github.com/apache/jackrabbit-oak/pull/77] which contains a unit test and the necessary null check for derlerencing topChildren in FilteredSortedSetDocValuesFacetCounts. In case thats ok, I would like to ask for backporting that to 1.6 branch at least (for backwards compatibility in AEM 6.3) was (Author: diru): I created [#77|https://github.com/apache/jackrabbit-oak/pull/77] which contains a unit test and the necessary null check for derlerencing topChildren in FilteredSortedSetDocValuesFacetCounts. > NullPointerException in FilteredSortedSetDocValuesFacetCounts during query > evaluation > - > > Key: OAK-7078 > URL: https://issues.apache.org/jira/browse/OAK-7078 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > > Running the following query {{select \[rep:facet(simple/tags)] from > \[nt:base] where contains(\[text], 'ipsum')}} with the following content > {code} > /content/foo > - text = "lorem lorem" > + simple/ >- tags = ["tag1", "tag2"] > /content/bar > - text = "lorem > {code} > runs in the following NPE > {code} > java.lang.NullPointerException > at > org.apache.jackrabbit.oak.plugins.index.lucene.util.FilteredSortedSetDocValuesFacetCounts.getTopChildren(FilteredSortedSetDocValuesFacetCounts.java:63) > at > org.apache.lucene.facet.MultiFacets.getTopChildren(MultiFacets.java:52) > at > org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LucenePathCursor$2.getValue(LucenePropertyIndex.java:1646) > ... 38 more > {code} > This is because the result set for the query only contains {{/content/bar}} > and with that the count of the dimension {{simple/tag}} is 0. For that case > [SortedSetDocValuesFacetCounts#getDim()|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.7.1/lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java#L108] > returns {{null}} and so does {{getTopChildren}}. > This expected behaviour is properly handled in > [LucenePropertyIndex.java#L1647|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L1647] > but not in > [FilteredSortedSetDocValuesFacetCounts.java#L63|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/util/FilteredSortedSetDocValuesFacetCounts.java#L63] > where {{topChildren}} is dereferenced without null check. > To workaround that secure facets can be set to false, though the default > value is true. > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-7078) NullPointerException in FilteredSortedSetDocValuesFacetCounts during query evaluation
[ https://issues.apache.org/jira/browse/OAK-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295793#comment-16295793 ] Dirk Rudolph commented on OAK-7078: --- I created [#77|https://github.com/apache/jackrabbit-oak/pull/77] which contains a unit test and the necessary null check for derlerencing topChildren in FilteredSortedSetDocValuesFacetCounts. > NullPointerException in FilteredSortedSetDocValuesFacetCounts during query > evaluation > - > > Key: OAK-7078 > URL: https://issues.apache.org/jira/browse/OAK-7078 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7 >Reporter: Dirk Rudolph > > Running the following query {{select \[rep:facet(simple/tags)] from > \[nt:base] where contains(\[text], 'ipsum')}} with the following content > {code} > /content/foo > - text = "lorem lorem" > + simple/ >- tags = ["tag1", "tag2"] > /content/bar > - text = "lorem > {code} > runs in the following NPE > {code} > java.lang.NullPointerException > at > org.apache.jackrabbit.oak.plugins.index.lucene.util.FilteredSortedSetDocValuesFacetCounts.getTopChildren(FilteredSortedSetDocValuesFacetCounts.java:63) > at > org.apache.lucene.facet.MultiFacets.getTopChildren(MultiFacets.java:52) > at > org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LucenePathCursor$2.getValue(LucenePropertyIndex.java:1646) > ... 38 more > {code} > This is because the result set for the query only contains {{/content/bar}} > and with that the count of the dimension {{simple/tag}} is 0. For that case > [SortedSetDocValuesFacetCounts#getDim()|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.7.1/lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java#L108] > returns {{null}} and so does {{getTopChildren}}. > This expected behaviour is properly handled in > [LucenePropertyIndex.java#L1647|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L1647] > but not in > [FilteredSortedSetDocValuesFacetCounts.java#L63|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/util/FilteredSortedSetDocValuesFacetCounts.java#L63] > where {{topChildren}} is dereferenced without null check. > To workaround that secure facets can be set to false, though the default > value is true. > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OAK-7078) NullPointerException in FilteredSortedSetDocValuesFacetCounts during query evaluation
Dirk Rudolph created OAK-7078: - Summary: NullPointerException in FilteredSortedSetDocValuesFacetCounts during query evaluation Key: OAK-7078 URL: https://issues.apache.org/jira/browse/OAK-7078 Project: Jackrabbit Oak Issue Type: Bug Components: lucene Affects Versions: 1.6.7 Reporter: Dirk Rudolph Running the following query {{select \[rep:facet(simple/tags)] from \[nt:base] where contains(\[text], 'ipsum')}} with the following content {code} /content/foo - text = "lorem lorem" + simple/ - tags = ["tag1", "tag2"] /content/bar - text = "lorem {code} runs in the following NPE {code} java.lang.NullPointerException at org.apache.jackrabbit.oak.plugins.index.lucene.util.FilteredSortedSetDocValuesFacetCounts.getTopChildren(FilteredSortedSetDocValuesFacetCounts.java:63) at org.apache.lucene.facet.MultiFacets.getTopChildren(MultiFacets.java:52) at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LucenePathCursor$2.getValue(LucenePropertyIndex.java:1646) ... 38 more {code} This is because the result set for the query only contains {{/content/bar}} and with that the count of the dimension {{simple/tag}} is 0. For that case [SortedSetDocValuesFacetCounts#getDim()|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.7.1/lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java#L108] returns {{null}} and so does {{getTopChildren}}. This expected behaviour is properly handled in [LucenePropertyIndex.java#L1647|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L1647] but not in [FilteredSortedSetDocValuesFacetCounts.java#L63|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.7/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/util/FilteredSortedSetDocValuesFacetCounts.java#L63] where {{topChildren}} is dereferenced without null check. To workaround that secure facets can be set to false, though the default value is true. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750
[ https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294646#comment-16294646 ] Dirk Rudolph commented on OAK-7070: --- There is still the risk, that duplication appear in the excerpt because there is a highlighting hit in {{:fulltext}} and one for example in {{full:bar}}. To prevent that, it probably makes sense to first do the highlighting on {{:fulltext}} fields when analyzeFulltext is enabled and only if that hasn't been success full we fallback to the logic of highlighting {{full:}} fields. wdyt? > rep:excerpt selector broken as regression of OAK-6750 > - > > Key: OAK-7070 > URL: https://issues.apache.org/jira/browse/OAK-7070 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7, 1.8 >Reporter: Dirk Rudolph >Assignee: Vikas Saurabh > Labels: excerpt > > The change made here: > https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114 > breaks the logic in line 676: > {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}} > This statement doesn't make much sense considering a query like {{select > \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or > even {{select \[rep:excerpt(text)] from \[test:Page] as page where > contains(page.\[text], 'term\*')}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OAK-7071) PostingsHighlighter, Highlighter and SimpleExcerptProvider return all different formats for excerpts
Dirk Rudolph created OAK-7071: - Summary: PostingsHighlighter, Highlighter and SimpleExcerptProvider return all different formats for excerpts Key: OAK-7071 URL: https://issues.apache.org/jira/browse/OAK-7071 Project: Jackrabbit Oak Issue Type: Bug Components: lucene Affects Versions: 1.6.7, 1.8 Reporter: Dirk Rudolph *PostingsHighligher* returns for example {quote} [my text with any highlighting followed by more text] {quote} because the PostingsHighligher itself returns for each field a {{String[]}} of phrases limited by the beforehand given max phrases. This String[] is the transformed to String using {{Arrays.toString()}} at [LucenePropertyIndex.java#L688|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java#L688] causing the value to be wrapped in square brackets. *Highlighter* returns {quote} my text with any highlighting followed by more text {quote} *SimpleExcerptProvider* returns {quote} my text with any highlighting followed by more text {quote} As the PostingsHighligher cannot get any custom prefix or suffix, I would suggest set as default for the others as well to prevent any further text transformation post extracting the excerpts. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750
[ https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294092#comment-16294092 ] Dirk Rudolph edited comment on OAK-7070 at 12/17/17 1:18 PM: - Following up with OAK-4401 it looks like rep:excerpt(propertyName) and rep:excerpt(.) are not meant to be used as columns in jcr-sql2 but to get an excerpt based on a result row. This is at least also the same behaviour as implemented in [XPathToSQL2Converter.java|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/xpath/XPathToSQL2Converter.java] was (Author: diru): Following up with OAK-4401 it looks like rep:excerpt(propertyName) and rep:excerpt(.) are not meant to be used as columns in jcr-sql2 but to get an excerpt based on a result row. > rep:excerpt selector broken as regression of OAK-6750 > - > > Key: OAK-7070 > URL: https://issues.apache.org/jira/browse/OAK-7070 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7, 1.8 >Reporter: Dirk Rudolph >Assignee: Vikas Saurabh > Labels: excerpt > > The change made here: > https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114 > breaks the logic in line 676: > {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}} > This statement doesn't make much sense considering a query like {{select > \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or > even {{select \[rep:excerpt(text)] from \[test:Page] as page where > contains(page.\[text], 'term\*')}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene
[ https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294133#comment-16294133 ] Dirk Rudolph edited comment on OAK-6597 at 12/17/17 1:10 PM: - I opened #76 for that, though that requires https://github.com/apache/jackrabbit-oak/pull/75 to be merged first. It would be great if that could be backported to 1.6 for compatibility with AEM 6.3 was (Author: diru): I opened #76 for that, though that requires https://github.com/apache/jackrabbit-oak/pull/75 to be merged first. > rep:excerpt not working for content indexed by aggregation in lucene > > > Key: OAK-6597 > URL: https://issues.apache.org/jira/browse/OAK-6597 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.1, 1.7.6, 1.8 >Reporter: Dirk Rudolph >Assignee: Chetan Mehrotra > Labels: excerpt > Fix For: 1.10 > > Attachments: excerpt-with-aggregation-test.patch > > > I mentioned that properties that got indexed due to an aggregation are not > considered for excerpts (highlighting) as they are not indexed as stored > fields. > See the attached patch that implements a test for excerpts in > {{LuceneIndexAggregationTest2}}. > It creates the following structure: > {code} > /content/foo [test:Page] > + bar (String) > - jcr:content [test:PageContent] > + bar (String) > {code} > where both strings (the _bar_ property at _foo_ and the _bar_ property at > _jcr:content_) contain different text. > Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in > _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the > former one the excerpt is properly provided for the later one it isn't. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene
[ https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294133#comment-16294133 ] Dirk Rudolph edited comment on OAK-6597 at 12/17/17 1:06 PM: - I opened #76 for that, though that requires https://github.com/apache/jackrabbit-oak/pull/75 to be merged first. was (Author: diru): I opened #76 for that, though that requires https://github.com/apache/jackrabbit-oak/pull/75 to be merged. > rep:excerpt not working for content indexed by aggregation in lucene > > > Key: OAK-6597 > URL: https://issues.apache.org/jira/browse/OAK-6597 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.1, 1.7.6, 1.8 >Reporter: Dirk Rudolph >Assignee: Chetan Mehrotra > Labels: excerpt > Fix For: 1.10 > > Attachments: excerpt-with-aggregation-test.patch > > > I mentioned that properties that got indexed due to an aggregation are not > considered for excerpts (highlighting) as they are not indexed as stored > fields. > See the attached patch that implements a test for excerpts in > {{LuceneIndexAggregationTest2}}. > It creates the following structure: > {code} > /content/foo [test:Page] > + bar (String) > - jcr:content [test:PageContent] > + bar (String) > {code} > where both strings (the _bar_ property at _foo_ and the _bar_ property at > _jcr:content_) contain different text. > Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in > _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the > former one the excerpt is properly provided for the later one it isn't. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750
[ https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294066#comment-16294066 ] Dirk Rudolph edited comment on OAK-7070 at 12/17/17 12:52 PM: -- To answer my own question: there is [longRepExcerpt test|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndexTest.java#L2394] which queries for for [rep:excerpt] but doesn't asserts its value from the result. The issue caused by that is that the rep:except in the result is null, not that the query is failing. ... this is because executeQuery(java.lang.String, java.lang.String, boolean, boolean) is called with pathsOnly. was (Author: diru): To answer my own question: there is [longRepExcerpt test|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndexTest.java#L2394] which queries for for [rep:excerpt] but doesn't assets its value from the result. The issue caused by that is that the rep:except in the result is null, not that the query is failing. ... this is because executeQuery(java.lang.String, java.lang.String, boolean, boolean) is called with pathsOnly. > rep:excerpt selector broken as regression of OAK-6750 > - > > Key: OAK-7070 > URL: https://issues.apache.org/jira/browse/OAK-7070 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7, 1.8 >Reporter: Dirk Rudolph >Assignee: Vikas Saurabh > Labels: excerpt > > The change made here: > https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114 > breaks the logic in line 676: > {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}} > This statement doesn't make much sense considering a query like {{select > \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or > even {{select \[rep:excerpt(text)] from \[test:Page] as page where > contains(page.\[text], 'term\*')}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750
[ https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-7070: -- Labels: excerpt (was: ) > rep:excerpt selector broken as regression of OAK-6750 > - > > Key: OAK-7070 > URL: https://issues.apache.org/jira/browse/OAK-7070 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7, 1.8 >Reporter: Dirk Rudolph >Assignee: Vikas Saurabh > Labels: excerpt > > The change made here: > https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114 > breaks the logic in line 676: > {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}} > This statement doesn't make much sense considering a query like {{select > \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or > even {{select \[rep:excerpt(text)] from \[test:Page] as page where > contains(page.\[text], 'term\*')}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750
[ https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-7070: -- Component/s: lucene > rep:excerpt selector broken as regression of OAK-6750 > - > > Key: OAK-7070 > URL: https://issues.apache.org/jira/browse/OAK-7070 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.7, 1.8 >Reporter: Dirk Rudolph >Assignee: Vikas Saurabh > Labels: excerpt > > The change made here: > https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114 > breaks the logic in line 676: > {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}} > This statement doesn't make much sense considering a query like {{select > \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or > even {{select \[rep:excerpt(text)] from \[test:Page] as page where > contains(page.\[text], 'term\*')}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene
[ https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-6597: -- Affects Version/s: 1.8 > rep:excerpt not working for content indexed by aggregation in lucene > > > Key: OAK-6597 > URL: https://issues.apache.org/jira/browse/OAK-6597 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.1, 1.7.6, 1.8 >Reporter: Dirk Rudolph >Assignee: Chetan Mehrotra > Labels: excerpt > Fix For: 1.10 > > Attachments: excerpt-with-aggregation-test.patch > > > I mentioned that properties that got indexed due to an aggregation are not > considered for excerpts (highlighting) as they are not indexed as stored > fields. > See the attached patch that implements a test for excerpts in > {{LuceneIndexAggregationTest2}}. > It creates the following structure: > {code} > /content/foo [test:Page] > + bar (String) > - jcr:content [test:PageContent] > + bar (String) > {code} > where both strings (the _bar_ property at _foo_ and the _bar_ property at > _jcr:content_) contain different text. > Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in > _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the > former one the excerpt is properly provided for the later one it isn't. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750
[ https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-7070: -- Affects Version/s: 1.6.7 > rep:excerpt selector broken as regression of OAK-6750 > - > > Key: OAK-7070 > URL: https://issues.apache.org/jira/browse/OAK-7070 > Project: Jackrabbit Oak > Issue Type: Bug >Affects Versions: 1.6.7, 1.8 >Reporter: Dirk Rudolph >Assignee: Vikas Saurabh > > The change made here: > https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114 > breaks the logic in line 676: > {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}} > This statement doesn't make much sense considering a query like {{select > \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or > even {{select \[rep:excerpt(text)] from \[test:Page] as page where > contains(page.\[text], 'term\*')}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene
[ https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294106#comment-16294106 ] Dirk Rudolph commented on OAK-6597: --- [~chetanm] reconsidering what you said above, I looks like you are completely right with you proposed approach. I missed with my assumption that fulltext fields are stored as multivalue fields instead of a concatenated state. So I don't expect any weird behaviour as described above. I will try to come up with a PR implementing your proposed solution. > rep:excerpt not working for content indexed by aggregation in lucene > > > Key: OAK-6597 > URL: https://issues.apache.org/jira/browse/OAK-6597 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.1, 1.7.6 >Reporter: Dirk Rudolph >Assignee: Chetan Mehrotra > Labels: excerpt > Fix For: 1.10 > > Attachments: excerpt-with-aggregation-test.patch > > > I mentioned that properties that got indexed due to an aggregation are not > considered for excerpts (highlighting) as they are not indexed as stored > fields. > See the attached patch that implements a test for excerpts in > {{LuceneIndexAggregationTest2}}. > It creates the following structure: > {code} > /content/foo [test:Page] > + bar (String) > - jcr:content [test:PageContent] > + bar (String) > {code} > where both strings (the _bar_ property at _foo_ and the _bar_ property at > _jcr:content_) contain different text. > Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in > _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the > former one the excerpt is properly provided for the later one it isn't. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750
[ https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294104#comment-16294104 ] Dirk Rudolph commented on OAK-7070: --- I opened a PR here: https://github.com/apache/jackrabbit-oak/pull/75. It would be great to get that applied to 1.6 and 1.7 branches as well (for backward compatibility in AEM 6.3) > rep:excerpt selector broken as regression of OAK-6750 > - > > Key: OAK-7070 > URL: https://issues.apache.org/jira/browse/OAK-7070 > Project: Jackrabbit Oak > Issue Type: Bug >Affects Versions: 1.8 >Reporter: Dirk Rudolph >Assignee: Vikas Saurabh > > The change made here: > https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114 > breaks the logic in line 676: > {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}} > This statement doesn't make much sense considering a query like {{select > \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or > even {{select \[rep:excerpt(text)] from \[test:Page] as page where > contains(page.\[text], 'term\*')}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750
[ https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294092#comment-16294092 ] Dirk Rudolph commented on OAK-7070: --- Following up with OAK-4401 it looks like rep:excerpt(propertyName) and rep:excerpt(.) are not meant to be used as columns in jcr-sql2 but to get an excerpt based on a result row. > rep:excerpt selector broken as regression of OAK-6750 > - > > Key: OAK-7070 > URL: https://issues.apache.org/jira/browse/OAK-7070 > Project: Jackrabbit Oak > Issue Type: Bug >Affects Versions: 1.8 >Reporter: Dirk Rudolph >Assignee: Vikas Saurabh > > The change made here: > https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114 > breaks the logic in line 676: > {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}} > This statement doesn't make much sense considering a query like {{select > \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or > even {{select \[rep:excerpt(text)] from \[test:Page] as page where > contains(page.\[text], 'term\*')}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750
[ https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294082#comment-16294082 ] Dirk Rudolph edited comment on OAK-7070 at 12/17/17 11:23 AM: -- Ok in the query parsing there is a different issue: The SQL2Parser doesn't properly parse rep:excerpt(.) and rep:excerpt(propertyName), but expects EXCERPT as keyword, see [SQL2Parser.java#L902|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/SQL2Parser.java#L902]. So for example rep:excerpt(.) is interpreted as normal propertyName and goes to [SelectorImpl.java#L392|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/ast/SelectorImpl.java#L392] where its not taken into account due to checking for equality. I cannot find anything about the EXCERPT in the jcr-sql2 specs, but for facets there is nothing special in the SQL2Parser, so I assume we keep it like that as it is and adapt the SelectorImpl further instead. was (Author: diru): Ok in the query parsing there is a different issue: The SQL2Parser doesn't properly parse rep:excerpt(.) and rep:excerpt(propertyName), but expects EXCERPT as keyword, see [SQL2Parser.java#L902|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/SQL2Parser.java#L902]. So for example rep:excerpt(.) is interpreted as normal propertyName and goes to [SelectorImpl.java#L392|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/ast/SelectorImpl.java#L392] where its not taken into account due to checking for equality. I cannot find anything about the EXCERPT in the jcr specs, but for facets there is nothing special in the SQL2Parser, so I assume we keep it like that as it is and adapt the SelectorImpl further instead. > rep:excerpt selector broken as regression of OAK-6750 > - > > Key: OAK-7070 > URL: https://issues.apache.org/jira/browse/OAK-7070 > Project: Jackrabbit Oak > Issue Type: Bug >Affects Versions: 1.8 >Reporter: Dirk Rudolph >Assignee: Vikas Saurabh > > The change made here: > https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114 > breaks the logic in line 676: > {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}} > This statement doesn't make much sense considering a query like {{select > \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or > even {{select \[rep:excerpt(text)] from \[test:Page] as page where > contains(page.\[text], 'term\*')}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750
[ https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294082#comment-16294082 ] Dirk Rudolph commented on OAK-7070: --- Ok in the query parsing there is a different issue: The SQL2Parser doesn't properly parse rep:excerpt(.) and rep:excerpt(propertyName), but expects EXCERPT as keyword, see [SQL2Parser.java#L902|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/SQL2Parser.java#L902]. So for example rep:excerpt(.) is interpreted as normal propertyName and goes to [SelectorImpl.java#L392|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/ast/SelectorImpl.java#L392] where its not taken into account due to checking for equality. I cannot find anything about the EXCERPT in the jcr specs, but for facets there is nothing special in the SQL2Parser, so I assume we keep it like that as it is and adapt the SelectorImpl further instead. > rep:excerpt selector broken as regression of OAK-6750 > - > > Key: OAK-7070 > URL: https://issues.apache.org/jira/browse/OAK-7070 > Project: Jackrabbit Oak > Issue Type: Bug >Affects Versions: 1.8 >Reporter: Dirk Rudolph >Assignee: Vikas Saurabh > > The change made here: > https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114 > breaks the logic in line 676: > {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}} > This statement doesn't make much sense considering a query like {{select > \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or > even {{select \[rep:excerpt(text)] from \[test:Page] as page where > contains(page.\[text], 'term\*')}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750
[ https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294069#comment-16294069 ] Dirk Rudolph commented on OAK-7070: --- I have a PR almost ready (fixing the Test and the issue) - If you want to? > rep:excerpt selector broken as regression of OAK-6750 > - > > Key: OAK-7070 > URL: https://issues.apache.org/jira/browse/OAK-7070 > Project: Jackrabbit Oak > Issue Type: Bug >Affects Versions: 1.8 >Reporter: Dirk Rudolph >Assignee: Vikas Saurabh > > The change made here: > https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114 > breaks the logic in line 676: > {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}} > This statement doesn't make much sense considering a query like {{select > \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or > even {{select \[rep:excerpt(text)] from \[test:Page] as page where > contains(page.\[text], 'term\*')}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750
[ https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294066#comment-16294066 ] Dirk Rudolph edited comment on OAK-7070 at 12/17/17 10:13 AM: -- To answer my own question: there is [longRepExcerpt test|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndexTest.java#L2394] which queries for for [rep:excerpt] but doesn't assets its value from the result. The issue caused by that is that the rep:except in the result is null, not that the query is failing. ... this is because executeQuery(java.lang.String, java.lang.String, boolean, boolean) is called with pathsOnly. was (Author: diru): To answer my own question: there is [longRepExcerpt test|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndexTest.java#L2394] which queries for for [rep:excerpt] but doesn't assets its value from the result. The issue caused by that is that the rep:except in the result is null, not that the query is failing. > rep:excerpt selector broken as regression of OAK-6750 > - > > Key: OAK-7070 > URL: https://issues.apache.org/jira/browse/OAK-7070 > Project: Jackrabbit Oak > Issue Type: Bug >Affects Versions: 1.8 >Reporter: Dirk Rudolph > > The change made here: > https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114 > breaks the logic in line 676: > {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}} > This statement doesn't make much sense considering a query like {{select > \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or > even {{select \[rep:excerpt(text)] from \[test:Page] as page where > contains(page.\[text], 'term\*')}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750
[ https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294066#comment-16294066 ] Dirk Rudolph commented on OAK-7070: --- To answer my own question: there is [longRepExcerpt test|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndexTest.java#L2394] which queries for for [rep:excerpt] but doesn't assets its value from the result. The issue caused by that is that the rep:except in the result is null, not that the query is failing. > rep:excerpt selector broken as regression of OAK-6750 > - > > Key: OAK-7070 > URL: https://issues.apache.org/jira/browse/OAK-7070 > Project: Jackrabbit Oak > Issue Type: Bug >Affects Versions: 1.8 >Reporter: Dirk Rudolph > > The change made here: > https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114 > breaks the logic in line 676: > {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}} > This statement doesn't make much sense considering a query like {{select > \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or > even {{select \[rep:excerpt(text)] from \[test:Page] as page where > contains(page.\[text], 'term\*')}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750
[ https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294063#comment-16294063 ] Dirk Rudolph commented on OAK-7070: --- [~catholicon] see the case in OAK-6597 (i''m currently working on) - it links to a ticket where a (currently disabled) test was added which was ok for 1 of the cases but failing for the second one. Now its failing for both. Though thats not 100% related so let me propose one. Which test case are you referring to? > rep:excerpt selector broken as regression of OAK-6750 > - > > Key: OAK-7070 > URL: https://issues.apache.org/jira/browse/OAK-7070 > Project: Jackrabbit Oak > Issue Type: Bug >Affects Versions: 1.8 >Reporter: Dirk Rudolph > > The change made here: > https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114 > breaks the logic in line 676: > {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}} > This statement doesn't make much sense considering a query like {{select > \[rep:excerpt] from \[test:Page] as page where contains(\*, 'term\*')}} or > even {{select \[rep:excerpt(text)] from \[test:Page] as page where > contains(page.\[text], 'term\*')}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750
[ https://issues.apache.org/jira/browse/OAK-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294060#comment-16294060 ] Dirk Rudolph commented on OAK-7070: --- As far as I can see there are the following selectors in sql2 queries for excerpts supported: 1) [rep:excerpt] 2) [rep:excerpt(.)] 3) [rep:excerpt(propertyName)] The expression previous to OAK-6750 includes only 1), where as a change to {{startsWith}} with trailing opening bracket (similar as to the facets) would only include 2) and 3). Though without the trailing opening bracket a bit to much might get consumed there so my suggestion is to use a disjunction of the both above. > rep:excerpt selector broken as regression of OAK-6750 > - > > Key: OAK-7070 > URL: https://issues.apache.org/jira/browse/OAK-7070 > Project: Jackrabbit Oak > Issue Type: Bug >Affects Versions: 1.8 >Reporter: Dirk Rudolph > > The change made here: > https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114 > breaks the logic in line 676: > {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}} > This statement doesn't make much sense considering a query like {{select > \[rep:excerpt] from \[test:Page] as page where contains(*, 'term*')}} or even > {{select \[rep:excerpt(text)] from \[test:Page] as page where > contains(page.\[text], 'term*')}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OAK-7070) rep:excerpt selector broken as regression of OAK-6750
Dirk Rudolph created OAK-7070: - Summary: rep:excerpt selector broken as regression of OAK-6750 Key: OAK-7070 URL: https://issues.apache.org/jira/browse/OAK-7070 Project: Jackrabbit Oak Issue Type: Bug Affects Versions: 1.8 Reporter: Dirk Rudolph The change made here: https://github.com/apache/jackrabbit-oak/commit/00c94b71293abcae6d76bb162c3f55c7d09b702e#diff-d4bdf443c61f24b634f33aab607e2114 breaks the logic in line 676: {{else if (oakPropertyName.equals(QueryConstants.REP_EXCERPT + "("))}} This statement doesn't make much sense considering a query like {{select \[rep:excerpt] from \[test:Page] as page where contains(*, 'term*')}} or even {{select \[rep:excerpt(text)] from \[test:Page] as page where contains(page.\[text], 'term*')}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OAK-6676) rep:facet doesn't work in combination with aliases in JCR-SQL2
Dirk Rudolph created OAK-6676: - Summary: rep:facet doesn't work in combination with aliases in JCR-SQL2 Key: OAK-6676 URL: https://issues.apache.org/jira/browse/OAK-6676 Project: Jackrabbit Oak Issue Type: Bug Components: core Affects Versions: 1.6.1 Reporter: Dirk Rudolph Priority: Minor Within [SelectorImpl#createFilter()|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/ast/SelectorImpl.java?view=markup#l389] the columnName is used to determine wether to add a indicating restriction for facets or not. So using query like {code} select [rep:facet(tags)] as facets from ... {code} Will not contain facets. Same applies for {{rep:excerpt}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6676) rep:facet doesn't work in combination with aliases in JCR-SQL2
[ https://issues.apache.org/jira/browse/OAK-6676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167678#comment-16167678 ] Dirk Rudolph commented on OAK-6676: --- I'm going to provide a patch with a unit test soon. > rep:facet doesn't work in combination with aliases in JCR-SQL2 > -- > > Key: OAK-6676 > URL: https://issues.apache.org/jira/browse/OAK-6676 > Project: Jackrabbit Oak > Issue Type: Bug > Components: core >Affects Versions: 1.6.1 >Reporter: Dirk Rudolph >Priority: Minor > > Within > [SelectorImpl#createFilter()|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/query/ast/SelectorImpl.java?view=markup#l389] > the columnName is used to determine wether to add a indicating restriction > for facets or not. > So using query like > {code} > select [rep:facet(tags)] as facets from ... > {code} > Will not contain facets. > Same applies for {{rep:excerpt}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-6643) Return a common format of excerpts independent of the highlighter used
[ https://issues.apache.org/jira/browse/OAK-6643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-6643: -- Labels: excerpt (was: ) > Return a common format of excerpts independent of the highlighter used > -- > > Key: OAK-6643 > URL: https://issues.apache.org/jira/browse/OAK-6643 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene >Affects Versions: 1.6.1 >Reporter: Dirk Rudolph >Priority: Minor > Labels: excerpt > > While using {{rep:excerpt}} functionality we mentioned that the format of the > {{PostingsHighlighter}} differs to the one of {{Highlighter}}. See the > example below: > {{PostingsHighlighter}} > {quote} > [In Central & Eastern Europe and Asia Pacific Allianz is one of the > leading international insurance companies. ] > {quote} > {{Highlighter}} > {quote} > "Life Risk Insurance" > {quote} > It would be great to have one single format, so that application doesn't have > to handle those differences. > Additionally the {{Arrays.toString(...)}} used to generated an excerpt string > from the {{PostingsHighlighter}} causes the excerpt text to be wrapped in > "[...]", I guess thats not intended. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene
[ https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-6597: -- Labels: excerpt (was: ) > rep:excerpt not working for content indexed by aggregation in lucene > > > Key: OAK-6597 > URL: https://issues.apache.org/jira/browse/OAK-6597 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.1, 1.7.6 >Reporter: Dirk Rudolph > Labels: excerpt > Fix For: 1.8 > > Attachments: excerpt-with-aggregation-test.patch > > > I mentioned that properties that got indexed due to an aggregation are not > considered for excerpts (highlighting) as they are not indexed as stored > fields. > See the attached patch that implements a test for excerpts in > {{LuceneIndexAggregationTest2}}. > It creates the following structure: > {code} > /content/foo [test:Page] > + bar (String) > - jcr:content [test:PageContent] > + bar (String) > {code} > where both strings (the _bar_ property at _foo_ and the _bar_ property at > _jcr:content_) contain different text. > Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in > _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the > former one the excerpt is properly provided for the later one it isn't. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-6643) Return a common format of excerpts independent of the highlighter used
[ https://issues.apache.org/jira/browse/OAK-6643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-6643: -- Component/s: lucene > Return a common format of excerpts independent of the highlighter used > -- > > Key: OAK-6643 > URL: https://issues.apache.org/jira/browse/OAK-6643 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene >Affects Versions: 1.6.1 >Reporter: Dirk Rudolph >Priority: Minor > Labels: excerpt > > While using {{rep:excerpt}} functionality we mentioned that the format of the > {{PostingsHighlighter}} differs to the one of {{Highlighter}}. See the > example below: > {{PostingsHighlighter}} > {quote} > [In Central & Eastern Europe and Asia Pacific Allianz is one of the > leading international insurance companies. ] > {quote} > {{Highlighter}} > {quote} > "Life Risk Insurance" > {quote} > It would be great to have one single format, so that application doesn't have > to handle those differences. > Additionally the {{Arrays.toString(...)}} used to generated an excerpt string > from the {{PostingsHighlighter}} causes the excerpt text to be wrapped in > "[...]", I guess thats not intended. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OAK-6643) Return a common format of excerpts independent of the highlighter used
Dirk Rudolph created OAK-6643: - Summary: Return a common format of excerpts independent of the highlighter used Key: OAK-6643 URL: https://issues.apache.org/jira/browse/OAK-6643 Project: Jackrabbit Oak Issue Type: Improvement Reporter: Dirk Rudolph Priority: Minor While using {{rep:excerpt}} functionality we mentioned that the format of the {{PostingsHighlighter}} differs to the one of {{Highlighter}}. See the example below: {{PostingsHighlighter}} {quote} [In Central & Eastern Europe and Asia Pacific Allianz is one of the leading international insurance companies. ] {quote} {{Highlighter}} {quote} "Life Risk Insurance" {quote} It would be great to have one single format, so that application doesn't have to handle those differences. Additionally the {{Arrays.toString(...)}} used to generated an excerpt string from the {{PostingsHighlighter}} causes the excerpt text to be wrapped in "[...]", I guess thats not intended. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-6643) Return a common format of excerpts independent of the highlighter used
[ https://issues.apache.org/jira/browse/OAK-6643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-6643: -- Affects Version/s: 1.6.1 > Return a common format of excerpts independent of the highlighter used > -- > > Key: OAK-6643 > URL: https://issues.apache.org/jira/browse/OAK-6643 > Project: Jackrabbit Oak > Issue Type: Improvement >Affects Versions: 1.6.1 >Reporter: Dirk Rudolph >Priority: Minor > > While using {{rep:excerpt}} functionality we mentioned that the format of the > {{PostingsHighlighter}} differs to the one of {{Highlighter}}. See the > example below: > {{PostingsHighlighter}} > {quote} > [In Central & Eastern Europe and Asia Pacific Allianz is one of the > leading international insurance companies. ] > {quote} > {{Highlighter}} > {quote} > "Life Risk Insurance" > {quote} > It would be great to have one single format, so that application doesn't have > to handle those differences. > Additionally the {{Arrays.toString(...)}} used to generated an excerpt string > from the {{PostingsHighlighter}} causes the excerpt text to be wrapped in > "[...]", I guess thats not intended. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-6600) queries on Date type results empty search results
[ https://issues.apache.org/jira/browse/OAK-6600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149122#comment-16149122 ] Dirk Rudolph edited comment on OAK-6600 at 8/31/17 4:17 PM: What you are showing here simply means that the format used to render your node doesn't expose its timezone. Still the timezone is stored as part of the calendar/date object within oak and date queries obviously don't handle dates as strings. What you can try is adding a timezone (your current servers timezone) to the date string within your query. Alternatively try to subtract 1d from the time in your >= query to check if its found afterwards - if so you most likely have a timezone issue. was (Author: diru): What you are showing here simply means that the format used to render your node doesn't expose its timezone. Still the timezone is stored as part of the calendar/date object within oak and ate queries obviously don't handle dates as strings. What you can try is adding a timezone (your current servers timezone) to the date string within your query. Alternatively try to subtract 1d from the time in your >= query to check if its found afterwards - if so you most likely have a timezone issue. > queries on Date type results empty search results > -- > > Key: OAK-6600 > URL: https://issues.apache.org/jira/browse/OAK-6600 > Project: Jackrabbit Oak > Issue Type: Bug > Components: core >Affects Versions: 1.6.0 >Reporter: Mouli >Priority: Blocker > Labels: jcr, oak > Attachments: screenshot-1.png, screenshot-2.png > > > there are two issues here > 1) by default when we try to store date in jcr it saves in below format > 2017-08-21 21:35:33 when i perform query on this date it is showing empty > search results. > select [body/dataNum] from [cas:article] where [jcr:lastModified] = > '2017-08-29 16:36:39' order by [jcr:created] DESC > 2) when i try to use > select [body/dataNum] from [cas:article] where [jcr:lastModified] > cast('2017-08-29 16:36:39' as date) order by [jcr:created] DESC it is > throwing an error not a date string, after some investigation i found that it > will accept only ISO8601 format (-MM-dd'T'HH:mm:ss.SSSZ). when i try to > store date in above format it automatically converts to .MM.dd HH:mm:ss . > now my questions are > 1) how to change default date format of jcr to -MM-dd'T'HH:mm:ss.SSSZ > 2) how to perform queries on Dates -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6607) Oak facet indexes seems to only work for nt:base
[ https://issues.apache.org/jira/browse/OAK-6607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149133#comment-16149133 ] Dirk Rudolph commented on OAK-6607: --- Facets don't work in combination with aggregation (see OAK-6597), though for me they work on {{cq:PageContent}} with property {{cq:tags}} quite well. > Oak facet indexes seems to only work for nt:base > > > Key: OAK-6607 > URL: https://issues.apache.org/jira/browse/OAK-6607 > Project: Jackrabbit Oak > Issue Type: Bug >Reporter: Van MOHAMED > > We are working in AEM and want to implement a Lucene facet index based on the > definition found here: > https://jackrabbit.apache.org/oak/docs/query/lucene.html. However, it only > works if you limit the node type to nt:base. Here's a snippet of a working > facet index definition. > {code:xml} > jcr:primaryType="oak:QueryIndexDefinition" > compatVersion="{Long}2" > reindex="{Boolean}false" > reindexCount="{Long}1" > type="lucene" > evaluatePathRestrictions="{Boolean}true" > async="async" > > > > > jcr:primaryType="nt:unstructured" > propertyIndex="{Boolean}true" > facets="{Boolean}true" > analyzed="{Boolean}true" > nodeScopeIndex="{Boolean}true" > name="contentType" /> > > > > > {code} > If we were to replace "nt:base" by "dam:Asset" for instance, and update the > contentType name property accordingly (in our case, updated in > jcr:content/metadata/contentType), then the facet wouldn't work anymore. In > the logs, we would get the message "facets for {} not yet indexed". -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6600) queries on Date type results empty search results
[ https://issues.apache.org/jira/browse/OAK-6600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149122#comment-16149122 ] Dirk Rudolph commented on OAK-6600: --- What you are showing here simply means that the format used to render your node doesn't expose its timezone. Still the timezone is stored as part of the calendar/date object within oak and ate queries obviously don't handle dates as strings. What you can try is adding a timezone (your current servers timezone) to the date string within your query. Alternatively try to subtract 1d from the time in your >= query to check if its found afterwards - if so you most likely have a timezone issue. > queries on Date type results empty search results > -- > > Key: OAK-6600 > URL: https://issues.apache.org/jira/browse/OAK-6600 > Project: Jackrabbit Oak > Issue Type: Bug > Components: core >Affects Versions: 1.6.0 >Reporter: Mouli >Priority: Blocker > Labels: jcr, oak > Attachments: screenshot-1.png, screenshot-2.png > > > there are two issues here > 1) by default when we try to store date in jcr it saves in below format > 2017-08-21 21:35:33 when i perform query on this date it is showing empty > search results. > select [body/dataNum] from [cas:article] where [jcr:lastModified] = > '2017-08-29 16:36:39' order by [jcr:created] DESC > 2) when i try to use > select [body/dataNum] from [cas:article] where [jcr:lastModified] > cast('2017-08-29 16:36:39' as date) order by [jcr:created] DESC it is > throwing an error not a date string, after some investigation i found that it > will accept only ISO8601 format (-MM-dd'T'HH:mm:ss.SSSZ). when i try to > store date in above format it automatically converts to .MM.dd HH:mm:ss . > now my questions are > 1) how to change default date format of jcr to -MM-dd'T'HH:mm:ss.SSSZ > 2) how to perform queries on Dates -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6600) queries on Date type results empty search results
[ https://issues.apache.org/jira/browse/OAK-6600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16148600#comment-16148600 ] Dirk Rudolph commented on OAK-6600: --- Its stored as {{Date}} field. The format shouldn't matter. > queries on Date type results empty search results > -- > > Key: OAK-6600 > URL: https://issues.apache.org/jira/browse/OAK-6600 > Project: Jackrabbit Oak > Issue Type: Bug > Components: core >Affects Versions: 1.6.0 >Reporter: Mouli >Priority: Blocker > Labels: jcr, oak > > there are two issues here > 1) by default when we try to store date in jcr it saves in below format > 2017-08-21 21:35:33 when i perform query on this date it is showing empty > search results. > select [body/dataNum] from [cas:article] where [jcr:lastModified] = > '2017-08-29 16:36:39' order by [jcr:created] DESC > 2) when i try to use > select [body/dataNum] from [cas:article] where [jcr:lastModified] > cast('2017-08-29 16:36:39' as date) order by [jcr:created] DESC it is > throwing an error not a date string, after some investigation i found that it > will accept only ISO8601 format (-MM-dd'T'HH:mm:ss.SSSZ). when i try to > store date in above format it automatically converts to .MM.dd HH:mm:ss . > now my questions are > 1) how to change default date format of jcr to -MM-dd'T'HH:mm:ss.SSSZ > 2) how to perform queries on Dates -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6600) queries on Date type results empty search results
[ https://issues.apache.org/jira/browse/OAK-6600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16148571#comment-16148571 ] Dirk Rudolph commented on OAK-6600: --- The folllowing works quite well for me: {code} select [jcr:path], [jcr:score], * from [nt:unstructured] as a where [jcr:lastModified] = cast('2017-08-30T21:18:42.917Z' as date) and isdescendantnode(a, '/content') /* xpath: /jcr:root/content//element(*,nt:unstructured)[@jcr:lastModified = xs:dateTime('2017-08-30T21:18:42.917Z')] */ {code} {code} select [jcr:path], [jcr:score], * from [nt:unstructured] as a where [jcr:lastModified] = cast('2017-08-30T21:18:42.917+02:00' as date) and isdescendantnode(a, '/content') /* xpath: /jcr:root/content//element(*,nt:unstructured)[@jcr:lastModified = xs:dateTime('2017-08-30T21:18:42.917+02:00')] */ {code} > queries on Date type results empty search results > -- > > Key: OAK-6600 > URL: https://issues.apache.org/jira/browse/OAK-6600 > Project: Jackrabbit Oak > Issue Type: Bug > Components: core >Affects Versions: 1.6.0 >Reporter: Mouli >Priority: Blocker > Labels: jcr, oak > > there are two issues here > 1) by default when we try to store date in jcr it saves in below format > 2017-08-21 21:35:33 when i perform query on this date it is showing empty > search results. > select [body/dataNum] from [cas:article] where [jcr:lastModified] = > '2017-08-29 16:36:39' order by [jcr:created] DESC > 2) when i try to use > select [body/dataNum] from [cas:article] where [jcr:lastModified] > cast('2017-08-29 16:36:39' as date) order by [jcr:created] DESC it is > throwing an error not a date string, after some investigation i found that it > will accept only ISO8601 format (-MM-dd'T'HH:mm:ss.SSSZ). when i try to > store date in above format it automatically converts to .MM.dd HH:mm:ss . > now my questions are > 1) how to change default date format of jcr to -MM-dd'T'HH:mm:ss.SSSZ > 2) how to perform queries on Dates -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene
[ https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146942#comment-16146942 ] Dirk Rudolph commented on OAK-6597: --- We should also double check spellcheck, suggestion and facets. From what I can see those are not taken into account for aggregated nodes either. > rep:excerpt not working for content indexed by aggregation in lucene > > > Key: OAK-6597 > URL: https://issues.apache.org/jira/browse/OAK-6597 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.1, 1.7.6 >Reporter: Dirk Rudolph > Fix For: 1.8 > > Attachments: excerpt-with-aggregation-test.patch > > > I mentioned that properties that got indexed due to an aggregation are not > considered for excerpts (highlighting) as they are not indexed as stored > fields. > See the attached patch that implements a test for excerpts in > {{LuceneIndexAggregationTest2}}. > It creates the following structure: > {code} > /content/foo [test:Page] > + bar (String) > - jcr:content [test:PageContent] > + bar (String) > {code} > where both strings (the _bar_ property at _foo_ and the _bar_ property at > _jcr:content_) contain different text. > Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in > _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the > former one the excerpt is properly provided for the later one it isn't. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene
[ https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146835#comment-16146835 ] Dirk Rudolph edited comment on OAK-6597 at 8/30/17 7:59 AM: {quote} which if enabled would enable storage of ":fulltext" field created in any of of the above way {quote} That would mean that the excerpt is created from a stored field containing all indexed properties of all nested nodes right? If so there could be the corner case that the excerpt would contain weird text on the boundaries of a single property value, no? Example: {code} /content/foo + jcr:content - text1 = "My fancy text" - text2 = "This isn't so fancy" {code} If I'm right that would cause an excerpt like "My fancy text This isn't so fancy" or even worse without the space: "My fancy textThis isn't so fancy". Wouldn't it make sense to store each and every nested property in its own analyzed field (full:_jcr_content/text1) or similar? Do we have any insights what will be the impact on the index size and with that the impact on query performance against one index that has that feature enabled? was (Author: diru): Do we have any insights what will be the impact on the index size and with that the impact on query performance against one index that has that feature enabled? > rep:excerpt not working for content indexed by aggregation in lucene > > > Key: OAK-6597 > URL: https://issues.apache.org/jira/browse/OAK-6597 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.1, 1.7.6 >Reporter: Dirk Rudolph > Fix For: 1.8 > > Attachments: excerpt-with-aggregation-test.patch > > > I mentioned that properties that got indexed due to an aggregation are not > considered for excerpts (highlighting) as they are not indexed as stored > fields. > See the attached patch that implements a test for excerpts in > {{LuceneIndexAggregationTest2}}. > It creates the following structure: > {code} > /content/foo [test:Page] > + bar (String) > - jcr:content [test:PageContent] > + bar (String) > {code} > where both strings (the _bar_ property at _foo_ and the _bar_ property at > _jcr:content_) contain different text. > Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in > _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the > former one the excerpt is properly provided for the later one it isn't. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene
[ https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146835#comment-16146835 ] Dirk Rudolph commented on OAK-6597: --- Do we have any insights what will be the impact on the index size and with that the impact on query performance against one index that has that feature enabled? > rep:excerpt not working for content indexed by aggregation in lucene > > > Key: OAK-6597 > URL: https://issues.apache.org/jira/browse/OAK-6597 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.1, 1.7.6 >Reporter: Dirk Rudolph > Fix For: 1.8 > > Attachments: excerpt-with-aggregation-test.patch > > > I mentioned that properties that got indexed due to an aggregation are not > considered for excerpts (highlighting) as they are not indexed as stored > fields. > See the attached patch that implements a test for excerpts in > {{LuceneIndexAggregationTest2}}. > It creates the following structure: > {code} > /content/foo [test:Page] > + bar (String) > - jcr:content [test:PageContent] > + bar (String) > {code} > where both strings (the _bar_ property at _foo_ and the _bar_ property at > _jcr:content_) contain different text. > Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in > _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the > former one the excerpt is properly provided for the later one it isn't. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6598) LuceneIndexAggregationTest2 doesn't get executed by mvn test
[ https://issues.apache.org/jira/browse/OAK-6598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146816#comment-16146816 ] Dirk Rudolph commented on OAK-6598: --- Thanks, [~chetanm]. It might be worth checking {{org.apache.jackrabbit.oak.plugins.document.ClusterTest2}} which has the same naming pattern. > LuceneIndexAggregationTest2 doesn't get executed by mvn test > > > Key: OAK-6598 > URL: https://issues.apache.org/jira/browse/OAK-6598 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.1, 1.7.6 >Reporter: Dirk Rudolph >Assignee: Chetan Mehrotra >Priority: Minor > Fix For: 1.8, 1.7.7 > > > I cannot find the results of > [LuceneIndexAggregationTest2|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexAggregationTest2.java] > on > [Jenkins|https://builds.apache.org/view/All/job/Apache%20Jackrabbit%20Oak%20matrix/] > nor am I able to execute them using {{mvn clean test}}. > It looks like this being related to {{...Test2.java}} not matching any > pattern and might effect other tests as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-6598) LuceneIndexAggregationTest2 doesn't get executed by mvn test
[ https://issues.apache.org/jira/browse/OAK-6598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-6598: -- Description: I cannot find the results of [LuceneIndexAggregationTest2|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexAggregationTest2.java] on [Jenkins|https://builds.apache.org/view/All/job/Apache%20Jackrabbit%20Oak%20matrix/] nor am I able to execute them using {{mvn clean test}}. It looks like this being related to {{...Test2.java}} not matching any pattern and might effect other tests as well. was: I cannot find the results of https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexAggregationTest2.java here https://builds.apache.org/view/All/job/Apache%20Jackrabbit%20Oak%20matrix/ nor am I able to execute them using {{mvn clean test}}. It looks like this being related to {{...Test2.java}} not matching any pattern and might effect other tests as well. > LuceneIndexAggregationTest2 doesn't get executed by mvn test > > > Key: OAK-6598 > URL: https://issues.apache.org/jira/browse/OAK-6598 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.1, 1.7.6 >Reporter: Dirk Rudolph > > I cannot find the results of > [LuceneIndexAggregationTest2|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexAggregationTest2.java] > on > [Jenkins|https://builds.apache.org/view/All/job/Apache%20Jackrabbit%20Oak%20matrix/] > nor am I able to execute them using {{mvn clean test}}. > It looks like this being related to {{...Test2.java}} not matching any > pattern and might effect other tests as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene
[ https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146266#comment-16146266 ] Dirk Rudolph edited comment on OAK-6597 at 8/29/17 10:38 PM: - This is blocked by OAK-6598 as long as {{LuceneIndexAggregationTest2}} is not running and/or failing. was (Author: diru): This is blocked as long as {{LuceneIndexAggregationTest2}} OAK-6598 is not running and/or failing. > rep:excerpt not working for content indexed by aggregation in lucene > > > Key: OAK-6597 > URL: https://issues.apache.org/jira/browse/OAK-6597 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.1, 1.7.6 >Reporter: Dirk Rudolph > Attachments: excerpt-with-aggregation-test.patch > > > I mentioned that properties that got indexed due to an aggregation are not > considered for excerpts (highlighting) as they are not indexed as stored > fields. > See the attached patch that implements a test for excerpts in > {{LuceneIndexAggregationTest2}}. > It creates the following structure: > {code} > /content/foo [test:Page] > + bar (String) > - jcr:content [test:PageContent] > + bar (String) > {code} > where both strings (the _bar_ property at _foo_ and the _bar_ property at > _jcr:content_) contain different text. > Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in > _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the > former one the excerpt is properly provided for the later one it isn't. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene
[ https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146266#comment-16146266 ] Dirk Rudolph commented on OAK-6597: --- This is blocked as long as {{LuceneIndexAggregationTest2}} OAK-6598 is not running and/or failing. > rep:excerpt not working for content indexed by aggregation in lucene > > > Key: OAK-6597 > URL: https://issues.apache.org/jira/browse/OAK-6597 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.1, 1.7.6 >Reporter: Dirk Rudolph > Attachments: excerpt-with-aggregation-test.patch > > > I mentioned that properties that got indexed due to an aggregation are not > considered for excerpts (highlighting) as they are not indexed as stored > fields. > See the attached patch that implements a test for excerpts in > {{LuceneIndexAggregationTest2}}. > It creates the following structure: > {code} > /content/foo [test:Page] > + bar (String) > - jcr:content [test:PageContent] > + bar (String) > {code} > where both strings (the _bar_ property at _foo_ and the _bar_ property at > _jcr:content_) contain different text. > Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in > _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the > former one the excerpt is properly provided for the later one it isn't. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-6598) LuceneIndexAggregationTest2 doesn't get executed by mvn test
[ https://issues.apache.org/jira/browse/OAK-6598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-6598: -- Component/s: lucene > LuceneIndexAggregationTest2 doesn't get executed by mvn test > > > Key: OAK-6598 > URL: https://issues.apache.org/jira/browse/OAK-6598 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.1, 1.7.6 >Reporter: Dirk Rudolph > > I cannot find the results of > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexAggregationTest2.java > here > https://builds.apache.org/view/All/job/Apache%20Jackrabbit%20Oak%20matrix/ > nor am I able to execute them using {{mvn clean test}}. > It looks like this being related to {{...Test2.java}} not matching any > pattern and might effect other tests as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OAK-6598) LuceneIndexAggregationTest2 doesn't get executed by maven
Dirk Rudolph created OAK-6598: - Summary: LuceneIndexAggregationTest2 doesn't get executed by maven Key: OAK-6598 URL: https://issues.apache.org/jira/browse/OAK-6598 Project: Jackrabbit Oak Issue Type: Bug Affects Versions: 1.7.6, 1.6.1 Reporter: Dirk Rudolph I cannot find the results of https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexAggregationTest2.java here https://builds.apache.org/view/All/job/Apache%20Jackrabbit%20Oak%20matrix/ nor am I able to execute them using {{mvn clean test}}. It looks like this being related to {{...Test2.java}} not matching any pattern and might effect other tests as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-6598) LuceneIndexAggregationTest2 doesn't get executed by mvn test
[ https://issues.apache.org/jira/browse/OAK-6598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-6598: -- Summary: LuceneIndexAggregationTest2 doesn't get executed by mvn test (was: LuceneIndexAggregationTest2 doesn't get executed by maven) > LuceneIndexAggregationTest2 doesn't get executed by mvn test > > > Key: OAK-6598 > URL: https://issues.apache.org/jira/browse/OAK-6598 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.1, 1.7.6 >Reporter: Dirk Rudolph > > I cannot find the results of > https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexAggregationTest2.java > here > https://builds.apache.org/view/All/job/Apache%20Jackrabbit%20Oak%20matrix/ > nor am I able to execute them using {{mvn clean test}}. > It looks like this being related to {{...Test2.java}} not matching any > pattern and might effect other tests as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene
[ https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-6597: -- Affects Version/s: 1.7.6 > rep:excerpt not working for content indexed by aggregation in lucene > > > Key: OAK-6597 > URL: https://issues.apache.org/jira/browse/OAK-6597 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.1, 1.7.6 >Reporter: Dirk Rudolph > Attachments: excerpt-with-aggregation-test.patch > > > I mentioned that properties that got indexed due to an aggregation are not > considered for excerpts (highlighting) as they are not indexed as stored > fields. > See the attached patch that implements a test for excerpts in > {{LuceneIndexAggregationTest2}}. > It creates the following structure: > {code} > /content/foo [test:Page] > + bar (String) > - jcr:content [test:PageContent] > + bar (String) > {code} > where both strings (the _bar_ property at _foo_ and the _bar_ property at > _jcr:content_) contain different text. > Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in > _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the > former one the excerpt is properly provided for the later one it isn't. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene
[ https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146126#comment-16146126 ] Dirk Rudolph commented on OAK-6597: --- This is because the property of node _/content/foo_, which is of the node type the index definition defines rules for, are added as stored fields using [LuceneDocumentMaker#indexProperty()|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneDocumentMaker.java#L247] (See [LuceneDocumentMaker.java line 112-129|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneDocumentMaker.java#L112]) and the properties of _/content/foo/jcr:content_ are added non-stored in [LuceneDocumentMaker#indexAggregatedNode()|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneDocumentMaker.java#L599] (See [LuceneDocumentMaker.java line 652-658|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneDocumentMaker.java#L652]) Is there any particular reason not to use {{indexProperty()}} for properties of the aggregated node? > rep:excerpt not working for content indexed by aggregation in lucene > > > Key: OAK-6597 > URL: https://issues.apache.org/jira/browse/OAK-6597 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.1 >Reporter: Dirk Rudolph > Attachments: excerpt-with-aggregation-test.patch > > > I mentioned that properties that got indexed due to an aggregation are not > considered for excerpts (highlighting) as they are not indexed as stored > fields. > See the attached patch that implements a test for excerpts in > {{LuceneIndexAggregationTest2}}. > It creates the following structure: > {code} > /content/foo [test:Page] > + bar (String) > - jcr:content [test:PageContent] > + bar (String) > {code} > where both strings (the _bar_ property at _foo_ and the _bar_ property at > _jcr:content_) contain different text. > Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in > _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the > former one the excerpt is properly provided for the later one it isn't. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene
[ https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-6597: -- Attachment: excerpt-with-aggregation-test.patch > rep:excerpt not working for content indexed by aggregation in lucene > > > Key: OAK-6597 > URL: https://issues.apache.org/jira/browse/OAK-6597 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.1 >Reporter: Dirk Rudolph > Attachments: excerpt-with-aggregation-test.patch > > > I mentioned that properties that got indexed due to an aggregation are not > considered for excerpts (highlighting) as they are not indexed as stored > fields. > See the attached patch that implements a test for excerpts in > {{LuceneIndexAggregationTest2}}. > It creates the following structure: > {code} > /content/foo [test:Page] > + bar (String) > - jcr:content [test:PageContent] > + bar (String) > {code} > where both strings (the _bar_ property at _foo_ and the _bar_ property at > _jcr:content_) contain different text. > Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in > _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the > former one the excerpt is properly provided for the later one it isn't. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene
[ https://issues.apache.org/jira/browse/OAK-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-6597: -- Component/s: lucene > rep:excerpt not working for content indexed by aggregation in lucene > > > Key: OAK-6597 > URL: https://issues.apache.org/jira/browse/OAK-6597 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.6.1 >Reporter: Dirk Rudolph > > I mentioned that properties that got indexed due to an aggregation are not > considered for excerpts (highlighting) as they are not indexed as stored > fields. > See the attached patch that implements a test for excerpts in > {{LuceneIndexAggregationTest2}}. > It creates the following structure: > {code} > /content/foo [test:Page] > + bar (String) > - jcr:content [test:PageContent] > + bar (String) > {code} > where both strings (the _bar_ property at _foo_ and the _bar_ property at > _jcr:content_) contain different text. > Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in > _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the > former one the excerpt is properly provided for the later one it isn't. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OAK-6597) rep:excerpt not working for content indexed by aggregation in lucene
Dirk Rudolph created OAK-6597: - Summary: rep:excerpt not working for content indexed by aggregation in lucene Key: OAK-6597 URL: https://issues.apache.org/jira/browse/OAK-6597 Project: Jackrabbit Oak Issue Type: Bug Affects Versions: 1.6.1 Reporter: Dirk Rudolph I mentioned that properties that got indexed due to an aggregation are not considered for excerpts (highlighting) as they are not indexed as stored fields. See the attached patch that implements a test for excerpts in {{LuceneIndexAggregationTest2}}. It creates the following structure: {code} /content/foo [test:Page] + bar (String) - jcr:content [test:PageContent] + bar (String) {code} where both strings (the _bar_ property at _foo_ and the _bar_ property at _jcr:content_) contain different text. Afterwards it queries for 2 terms ("tinc*" and "aliq*") that either exist in _/content/foo/bar_ or _/content/foo/jcr:content/bar_ but not in both. For the former one the excerpt is properly provided for the later one it isn't. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-5995) Lucene indexing with copyonread/write holding unexpectedly much files open
[ https://issues.apache.org/jira/browse/OAK-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992541#comment-15992541 ] Dirk Rudolph commented on OAK-5995: --- Thanks for your support [~chetanm]. It looks like the customer setup the pre production instance without clearing the local FS copy of the index so Lucene was working the wrong files and had issues with that. Clearing the local copy of the indexes and letting them be recreated from repo resolved the issue as far as I know. > Lucene indexing with copyonread/write holding unexpectedly much files open > -- > > Key: OAK-5995 > URL: https://issues.apache.org/jira/browse/OAK-5995 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.4.1 >Reporter: Dirk Rudolph >Assignee: Chetan Mehrotra > Attachments: lsofout2.txt > > > We recently faced the issue that our Oak based enterprise content management > system run into failures due to too much open files. Monitoring the lsof > output we found out that most of the opened files of the process are the > files within the configured localIndexDir of the LuceneIndexProviderService. > {code} > enableCopyOnReadSupport="true" > localIndexDir="tmp/index" > enableCopyOnWriteSupport="true" > {code} > See attached the lsof output: > {code} > ~ wc -l lsofout2.txt >20388 lsofout2.txt > ~ grep "tmp/index" lsofout2.txt | wc -l >13499 > {code} > where more then 60% of open files are "tmp/index" ones as configured as > {{localIndexDir}} shortly after a restart of the process. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (OAK-5995) Lucene indexing with copyonread/write holding unexpectedly much files open
[ https://issues.apache.org/jira/browse/OAK-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948506#comment-15948506 ] Dirk Rudolph edited comment on OAK-5995 at 3/30/17 6:51 AM: We will try to do so, thanks. In our specific AEM 6.2 setup the following lucene index are not configured with {{indexPath}}: * /content/oak:index/enablementResourceName * /oak:index/socialLucene and * /oak:index/damAssetLucene May be worth to be report that to daycare as well, as even with 6.2 SP1 oak is shipped in version 1.4.6 was (Author: diru): We will try to do so, thanks. In our specific AEM 6.2 setup its: * /content/oak:index/enablementResourceName * /oak:index/socialLucene and * /oak:index/damAssetLucene May be worth to be report that to daycare as well, as even with 6.2 SP1 oak is shipped in version 1.4.6 > Lucene indexing with copyonread/write holding unexpectedly much files open > -- > > Key: OAK-5995 > URL: https://issues.apache.org/jira/browse/OAK-5995 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.4.1 >Reporter: Dirk Rudolph >Assignee: Chetan Mehrotra > Attachments: lsofout2.txt > > > We recently faced the issue that our Oak based enterprise content management > system run into failures due to too much open files. Monitoring the lsof > output we found out that most of the opened files of the process are the > files within the configured localIndexDir of the LuceneIndexProviderService. > {code} > enableCopyOnReadSupport="true" > localIndexDir="tmp/index" > enableCopyOnWriteSupport="true" > {code} > See attached the lsof output: > {code} > ~ wc -l lsofout2.txt >20388 lsofout2.txt > ~ grep "tmp/index" lsofout2.txt | wc -l >13499 > {code} > where more then 60% of open files are "tmp/index" ones as configured as > {{localIndexDir}} shortly after a restart of the process. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (OAK-5995) Lucene indexing with copyonread/write holding unexpectedly much files open
[ https://issues.apache.org/jira/browse/OAK-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948506#comment-15948506 ] Dirk Rudolph commented on OAK-5995: --- We will try to do so, thanks. In our specific AEM 6.2 setup its: * /content/oak:index/enablementResourceName * /oak:index/socialLucene and * /oak:index/damAssetLucene May be worth to be report that to daycare as well, as even with 6.2 SP1 oak is shipped in version 1.4.6 > Lucene indexing with copyonread/write holding unexpectedly much files open > -- > > Key: OAK-5995 > URL: https://issues.apache.org/jira/browse/OAK-5995 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.4.1 >Reporter: Dirk Rudolph >Assignee: Chetan Mehrotra > Attachments: lsofout2.txt > > > We recently faced the issue that our Oak based enterprise content management > system run into failures due to too much open files. Monitoring the lsof > output we found out that most of the opened files of the process are the > files within the configured localIndexDir of the LuceneIndexProviderService. > {code} > enableCopyOnReadSupport="true" > localIndexDir="tmp/index" > enableCopyOnWriteSupport="true" > {code} > See attached the lsof output: > {code} > ~ wc -l lsofout2.txt >20388 lsofout2.txt > ~ grep "tmp/index" lsofout2.txt | wc -l >13499 > {code} > where more then 60% of open files are "tmp/index" ones as configured as > {{localIndexDir}} shortly after a restart of the process. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (OAK-5995) Lucene indexing with copyonread/write holding unexpectedly much files open
[ https://issues.apache.org/jira/browse/OAK-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947030#comment-15947030 ] Dirk Rudolph edited comment on OAK-5995 at 3/29/17 12:34 PM: - Thanks. I'm not sure about the frequency with which the system is writing to the index. Anyway, I got feedback form the operations team. We have quite a couple of exceptions including the IndexCopier: {code} grep -Hirn "IndexCopier" logs/ | wc -l grep: logs/._funionfs_control~: Permission denied 825370 {code} {code} 28.03.2017 13:20:22.381 *WARN* [172.19.48.185 [1490700022254] GET /libs/granite/ui/references/clientlibs/coral/references.css HTTP/1.1] org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopier [/oak:index/ntBaseLucene] Found local copy for _2.si in MMapDirectory@/appl/aem/tmp/index/9322909280ba43419b97546267900f301b5258987a41f4d535a3489a5ee602a7/2 lockFactory=NativeFSLockFactory@/appl/aem/tmp/index/9322909280ba43419b97546267900f301b5258987a41f4d535a3489a5ee602a7/2 but size of local 237 differs from remote 0. Content would be read from remote file only 28.03.2017 13:20:22.383 *WARN* [172.19.48.185 [1490700022254] GET /libs/granite/ui/references/clientlibs/coral/references.css HTTP/1.1] org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopier [/oak:index/ntBaseLucene] Found local copy for _2.cfe in MMapDirectory@/appl/aem/tmp/index/9322909280ba43419b97546267900f301b5258987a41f4d535a3489a5ee602a7/2 lockFactory=NativeFSLockFactory@/appl/aem/tmp/index/9322909280ba43419b97546267900f301b5258987a41f4d535a3489a5ee602a7/2 but size of local 258 differs from remote 0. Content would be read from remote file only 28.03.2017 13:20:22.386 *ERROR* [172.19.48.185 [1490700022254] GET /libs/granite/ui/references/clientlibs/coral/references.css HTTP/1.1] org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker Could not access the Lucene index at /oak:index/ntBaseLucene java.io.FileNotFoundException: [tags(/oak:index/ntBaseLucene)] _2.si at org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory.openInput(OakDirectory.java:180) at org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopier$CopyOnReadDirectory.openInput(IndexCopier.java:355) at org.apache.lucene.codecs.lucene46.Lucene46SegmentInfoReader.read(Lucene46SegmentInfoReader.java:49) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:340) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:843) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:66) at org.apache.jackrabbit.oak.plugins.index.lucene.IndexNode.(IndexNode.java:105) at org.apache.jackrabbit.oak.plugins.index.lucene.IndexNode.open(IndexNode.java:69) at org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.findIndexNode(IndexTracker.java:162) at org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.acquireIndexNode(IndexTracker.java:137) at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex.getPlans(LucenePropertyIndex.java:250) at org.apache.jackrabbit.oak.query.QueryImpl.getBestSelectorExecutionPlan(QueryImpl.java:1016) at org.apache.jackrabbit.oak.query.QueryImpl.getBestSelectorExecutionPlan(QueryImpl.java:949) at org.apache.jackrabbit.oak.query.ast.SelectorImpl.prepare(SelectorImpl.java:288) at org.apache.jackrabbit.oak.query.QueryImpl.prepare(QueryImpl.java:631) at org.apache.jackrabbit.oak.query.QueryEngineImpl.prepareAndSelect(QueryEngineImpl.java:298) at org.apache.jackrabbit.oak.query.QueryEngineImpl.executeQuery(QueryEngineImpl.java:273) at org.apache.jackrabbit.oak.query.QueryEngineImpl.executeQuery(QueryEngineImpl.java:233) at org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.resolveUUID(IdentifierManager.java:314) at org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.resolveUUID(IdentifierManager.java:308) at org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.resolveUUID(IdentifierManager.java:304) at org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.getTree(IdentifierManager.java:133) at org.apache.jackrabbit.oak.security.user.AuthorizableBaseProvider.getByContentID(AuthorizableBaseProvider.java:56) at org.apache.jackrabbit.oak.security.user.AuthorizableBaseProvider.getByID(AuthorizableBaseProvider.java:51) at org.apache.jackrabbit.oak.security.user.UserProvider.getAuthorizable(UserProvider.java:211) at org.apache.jackrabbit.oak.security.user.UserPrincipalProvider.getPrincip
[jira] [Commented] (OAK-5995) Lucene indexing with copyonread/write holding unexpectedly much files open
[ https://issues.apache.org/jira/browse/OAK-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947030#comment-15947030 ] Dirk Rudolph commented on OAK-5995: --- Thanks. I'm not sure about the frequency with which the system is writing to the index. Anyway, I got feedback form the operations team. We have quite a couple of exceptions including the IndexCopier: {code} grep -Hirn "IndexCopier" logs/ | wc -l grep: logs/._funionfs_control~: Permission denied 825370 {code} {code} 28.03.2017 13:20:22.381 *WARN* [172.19.48.185 [1490700022254] GET /libs/granite/ui/references/clientlibs/coral/references.css HTTP/1.1] org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopier [/oak:index/ntBaseLucene] Found local copy for _2.si in MMapDirectory@/appl/aem/tmp/index/9322909280ba43419b97546267900f301b5258987a41f4d535a3489a5ee602a7/2 lockFactory=NativeFSLockFactory@/appl/aem/tmp/index/9322909280ba43419b97546267900f301b5258987a41f4d535a3489a5ee602a7/2 but size of local 237 differs from remote 0. Content would be read from remote file only 28.03.2017 13:20:22.383 *WARN* [172.19.48.185 [1490700022254] GET /libs/granite/ui/references/clientlibs/coral/references.css HTTP/1.1] org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopier [/oak:index/ntBaseLucene] Found local copy for _2.cfe in MMapDirectory@/appl/aem/tmp/index/9322909280ba43419b97546267900f301b5258987a41f4d535a3489a5ee602a7/2 lockFactory=NativeFSLockFactory@/appl/aem/tmp/index/9322909280ba43419b97546267900f301b5258987a41f4d535a3489a5ee602a7/2 but size of local 258 differs from remote 0. Content would be read from remote file only 28.03.2017 13:20:22.386 *ERROR* [172.19.48.185 [1490700022254] GET /libs/granite/ui/references/clientlibs/coral/references.css HTTP/1.1] org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker Could not access the Lucene index at /oak:index/ntBaseLucene java.io.FileNotFoundException: [tags(/oak:index/ntBaseLucene)] _2.si at org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory.openInput(OakDirectory.java:180) at org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopier$CopyOnReadDirectory.openInput(IndexCopier.java:355) at org.apache.lucene.codecs.lucene46.Lucene46SegmentInfoReader.read(Lucene46SegmentInfoReader.java:49) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:340) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:843) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:66) at org.apache.jackrabbit.oak.plugins.index.lucene.IndexNode.(IndexNode.java:105) at org.apache.jackrabbit.oak.plugins.index.lucene.IndexNode.open(IndexNode.java:69) at org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.findIndexNode(IndexTracker.java:162) at org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.acquireIndexNode(IndexTracker.java:137) at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex.getPlans(LucenePropertyIndex.java:250) at org.apache.jackrabbit.oak.query.QueryImpl.getBestSelectorExecutionPlan(QueryImpl.java:1016) at org.apache.jackrabbit.oak.query.QueryImpl.getBestSelectorExecutionPlan(QueryImpl.java:949) at org.apache.jackrabbit.oak.query.ast.SelectorImpl.prepare(SelectorImpl.java:288) at org.apache.jackrabbit.oak.query.QueryImpl.prepare(QueryImpl.java:631) at org.apache.jackrabbit.oak.query.QueryEngineImpl.prepareAndSelect(QueryEngineImpl.java:298) at org.apache.jackrabbit.oak.query.QueryEngineImpl.executeQuery(QueryEngineImpl.java:273) at org.apache.jackrabbit.oak.query.QueryEngineImpl.executeQuery(QueryEngineImpl.java:233) at org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.resolveUUID(IdentifierManager.java:314) at org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.resolveUUID(IdentifierManager.java:308) at org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.resolveUUID(IdentifierManager.java:304) at org.apache.jackrabbit.oak.plugins.identifier.IdentifierManager.getTree(IdentifierManager.java:133) at org.apache.jackrabbit.oak.security.user.AuthorizableBaseProvider.getByContentID(AuthorizableBaseProvider.java:56) at org.apache.jackrabbit.oak.security.user.AuthorizableBaseProvider.getByID(AuthorizableBaseProvider.java:51) at org.apache.jackrabbit.oak.security.user.UserProvider.getAuthorizable(UserProvider.java:211) at org.apache.jackrabbit.oak.security.user.UserPrincipalProvider.getPrincipals(UserPrincipalProvider.java:134) at or
[jira] [Commented] (OAK-5995) Lucene indexing with copyonread/write holding unexpectedly much files open
[ https://issues.apache.org/jira/browse/OAK-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946729#comment-15946729 ] Dirk Rudolph commented on OAK-5995: --- Thanks Chetan, I will investigate with the Operations team and will let you know. In the meanwhile I checked the index definitions and there are indeed some which don't have the {{indexPath}} set. Though those are only small ones compared to others. Does the size matter? If this is about leaking file handles, what are the circumstances which impact that behaviour? (Index size, # of queries against the index, # of reads/writes)? > Lucene indexing with copyonread/write holding unexpectedly much files open > -- > > Key: OAK-5995 > URL: https://issues.apache.org/jira/browse/OAK-5995 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene >Affects Versions: 1.4.1 >Reporter: Dirk Rudolph >Assignee: Chetan Mehrotra > Attachments: lsofout2.txt > > > We recently faced the issue that our Oak based enterprise content management > system run into failures due to too much open files. Monitoring the lsof > output we found out that most of the opened files of the process are the > files within the configured localIndexDir of the LuceneIndexProviderService. > {code} > enableCopyOnReadSupport="true" > localIndexDir="tmp/index" > enableCopyOnWriteSupport="true" > {code} > See attached the lsof output: > {code} > ~ wc -l lsofout2.txt >20388 lsofout2.txt > ~ grep "tmp/index" lsofout2.txt | wc -l >13499 > {code} > where more then 60% of open files are "tmp/index" ones as configured as > {{localIndexDir}} shortly after a restart of the process. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (OAK-5995) Lucene indexing with copyonread/write holding unexpectedly much files open
[ https://issues.apache.org/jira/browse/OAK-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-5995: -- Attachment: lsofout2.txt > Lucene indexing with copyonread/write holding unexpectedly much files open > -- > > Key: OAK-5995 > URL: https://issues.apache.org/jira/browse/OAK-5995 > Project: Jackrabbit Oak > Issue Type: Bug > Components: indexing >Affects Versions: 1.4.1 >Reporter: Dirk Rudolph > Attachments: lsofout2.txt > > > We recently faced the issue that our Oak based enterprise content management > system run into failures due to too much open files. Monitoring the lsof > output we found out that most of the opened files of the process are the > files within the configured localIndexDir of the LuceneIndexProviderService. > {code} > enableCopyOnReadSupport="true" > localIndexDir="tmp/index" > enableCopyOnWriteSupport="true" > {code} > See attached the lsof output: > {code} > ~ wc -l lsofout2.txt >20388 lsofout2.txt > ~ grep "tmp/index" lsofout2.txt | wc -l >13499 > {code} > where more then 60% of open files are "tmp/index" ones as configured as > {{localIndexDir}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (OAK-5995) Lucene indexing with copyonread/write holding unexpectedly much files open
[ https://issues.apache.org/jira/browse/OAK-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dirk Rudolph updated OAK-5995: -- Description: We recently faced the issue that our Oak based enterprise content management system run into failures due to too much open files. Monitoring the lsof output we found out that most of the opened files of the process are the files within the configured localIndexDir of the LuceneIndexProviderService. {code} enableCopyOnReadSupport="true" localIndexDir="tmp/index" enableCopyOnWriteSupport="true" {code} See attached the lsof output: {code} ~ wc -l lsofout2.txt 20388 lsofout2.txt ~ grep "tmp/index" lsofout2.txt | wc -l 13499 {code} where more then 60% of open files are "tmp/index" ones as configured as {{localIndexDir}} shortly after a restart of the process. was: We recently faced the issue that our Oak based enterprise content management system run into failures due to too much open files. Monitoring the lsof output we found out that most of the opened files of the process are the files within the configured localIndexDir of the LuceneIndexProviderService. {code} enableCopyOnReadSupport="true" localIndexDir="tmp/index" enableCopyOnWriteSupport="true" {code} See attached the lsof output: {code} ~ wc -l lsofout2.txt 20388 lsofout2.txt ~ grep "tmp/index" lsofout2.txt | wc -l 13499 {code} where more then 60% of open files are "tmp/index" ones as configured as {{localIndexDir}}. > Lucene indexing with copyonread/write holding unexpectedly much files open > -- > > Key: OAK-5995 > URL: https://issues.apache.org/jira/browse/OAK-5995 > Project: Jackrabbit Oak > Issue Type: Bug > Components: indexing >Affects Versions: 1.4.1 >Reporter: Dirk Rudolph > Attachments: lsofout2.txt > > > We recently faced the issue that our Oak based enterprise content management > system run into failures due to too much open files. Monitoring the lsof > output we found out that most of the opened files of the process are the > files within the configured localIndexDir of the LuceneIndexProviderService. > {code} > enableCopyOnReadSupport="true" > localIndexDir="tmp/index" > enableCopyOnWriteSupport="true" > {code} > See attached the lsof output: > {code} > ~ wc -l lsofout2.txt >20388 lsofout2.txt > ~ grep "tmp/index" lsofout2.txt | wc -l >13499 > {code} > where more then 60% of open files are "tmp/index" ones as configured as > {{localIndexDir}} shortly after a restart of the process. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (OAK-5995) Lucene indexing with copyonread/write holding unexpectedly much files open
Dirk Rudolph created OAK-5995: - Summary: Lucene indexing with copyonread/write holding unexpectedly much files open Key: OAK-5995 URL: https://issues.apache.org/jira/browse/OAK-5995 Project: Jackrabbit Oak Issue Type: Bug Components: indexing Affects Versions: 1.4.1 Reporter: Dirk Rudolph Attachments: lsofout2.txt We recently faced the issue that our Oak based enterprise content management system run into failures due to too much open files. Monitoring the lsof output we found out that most of the opened files of the process are the files within the configured localIndexDir of the LuceneIndexProviderService. {code} enableCopyOnReadSupport="true" localIndexDir="tmp/index" enableCopyOnWriteSupport="true" {code} See attached the lsof output: {code} ~ wc -l lsofout2.txt 20388 lsofout2.txt ~ grep "tmp/index" lsofout2.txt | wc -l 13499 {code} where more then 60% of open files are "tmp/index" ones as configured as {{localIndexDir}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)