[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16301325#comment-16301325
 ] 

Dirk Rudolph edited comment on OAK-7109 at 12/22/17 12:58 PM:
--------------------------------------------------------------

Yeah support of unions with facets doesn't work well, as facets are extracted 
on each row, though they related to the result not the rows. Will open an 
improvement for that as well as this has some costs: basically calling 
getTopChildren() for each row while iterating the result set. 

With splitting the result I didn't mean running the query in a union but 
running individual queries merging their RowIterators sets manually and 
extracting facets only from the first hit of each merging them together as 
well. That basically works but as I said I would have to rewrite the query in 
DNF like in the example:

{code:title=distribute and over or}
contains(a.[*], 'ipsum') and (isdescendantnode(a,'/content1') or 
isdescendantnode(a,'/content2'))
<=>
(contains(a.[*], 'ipsum') and isdescendantnode(a,'/content1')) or 
(contains(a.[*], 'ipsum')  and isdescendantnode(a,'/content2')))
{code}
{code:title=split and run query for each disjunctive statement}
contains(a.[*], 'ipsum') and isdescendantnode(a,'/content1')
contains(a.[*], 'ipsum') and isdescendantnode(a,'/content2')
{code}

That basically works, but only in the case that both queries hit the same index 
as only then TF/IDF score is comparable (also across multiple queries). So the 
solutions I see are:
a) creating DNF disjunctive statements of a query as alternatives (not sure if 
the alternative currently created is DNF) and support proper counting over 
union queries
b) filtering the results in the using the query plans filter while counting 
facets, similar to the way its done for ACLs
c) implementing a mode which translates any query as it is to its lucene 
equivalent

Both a) and b) come probably with a drawback on performance. c) might not even 
be feasible. 

For our real world case the complexity is not only given by the path 
restriction but there are more restrictions conjunct to it. We tried already 
running one query for each path, but even with that the individual queries are 
too complex to be passed to lucene with all constraints. (not entirely sure why 
though ...)

Edit: opened OAK-7110 for counting facets only once per result, not once per 
row.


was (Author: diru):
Yeah support of unions with facets doesn't work well, as facets are extracted 
on each row, though they related to the result not the rows. Will open an 
improvement for that as well as this has some costs: basically calling 
getTopChildren() for each row while iterating the result set. 

With splitting the result I didn't mean running the query in a union but 
running individual queries merging their RowIterators sets manually and 
extracting facets only from the first hit of each merging them together as 
well. That basically works but as I said I would have to rewrite the query in 
DNF like in the example:

{code:title=distribute and over or}
contains(a.[*], 'ipsum') and (isdescendantnode(a,'/content1') or 
isdescendantnode(a,'/content2'))
<=>
(contains(a.[*], 'ipsum') and isdescendantnode(a,'/content1')) or 
(contains(a.[*], 'ipsum')  and isdescendantnode(a,'/content2')))
{code}
{code:title=split and run query for each disjunctive statement}
contains(a.[*], 'ipsum') and isdescendantnode(a,'/content1')
contains(a.[*], 'ipsum') and isdescendantnode(a,'/content2')
{code}

That basically works, but only in the case that both queries hit the same index 
as only then TF/IDF score is comparable (also across multiple queries). So the 
solutions I see are:
a) creating DNF disjunctive statements of a query as alternatives (not sure if 
the alternative currently created is DNF) and support proper counting over 
union queries
b) filtering the results in the using the query plans filter while counting 
facets, similar to the way its done for ACLs
c) implementing a mode which translates any query as it is to its lucene 
equivalent

Both a) and b) come probably with a drawback on performance. c) might not even 
be feasible. 

Edit: opened OAK-7110 for counting facets only once per result, not once per 
row.

> rep:facet returns wrong results for complex queries
> ---------------------------------------------------
>
>                 Key: OAK-7109
>                 URL: https://issues.apache.org/jira/browse/OAK-7109
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: lucene
>    Affects Versions: 1.6.7
>            Reporter: Dirk Rudolph
>              Labels: facet
>         Attachments: facetsInMultipleRoots.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>    + tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to