[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312687#comment-16312687
 ] 

Vikas Saurabh commented on OAK-7109:
------------------------------------

{quote}
(I know the "group by" and "count" are not currently supported by Oak).
Or are there other aspects I missed?
{quote}
Indeed fundamentally that's what facets do -  provide usually few (not 'all' 
unlike group by) properties and count according to how many documents match the 
query. Lucene's faceting support also does ranges although we don't support 
that yet - e.g. I could facet of "jcr:created" and the categories could turn 
out as "today", "within last week", etc (I'm not completely sure about the 
API... I'm just trying to illustrate that faceted categories can potentially be 
not-the-actually-stored-value).

bq. What do you mean with "scoring"?
The scoring part is entirely different issue unrelated to facets - e.g. we 
correctly won't (can't??) order documents matching queries such as {{.... WHERE 
(CONTAINS(., 'text') AND foo1='bar') OR (CONTAINS(., 'text' AND foo2='bar' AND 
foo3='bar')}} (foo=bar could be different fulltext clause too... the issue is 
that we can't quite merge scores coming out of separate lucene queries)....
But, let's ignore the scoring for this issue.

bq. What if Lucene doesn't index all the constraints?
I have a very pessimistic view that we should fail such queries - I mean it's 
better to fail and allow for right index def than giving incorrect results.

> rep:facet returns wrong results for complex queries
> ---------------------------------------------------
>
>                 Key: OAK-7109
>                 URL: https://issues.apache.org/jira/browse/OAK-7109
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: lucene
>    Affects Versions: 1.6.7
>            Reporter: Dirk Rudolph
>              Labels: facet
>         Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>    + tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to