[ 
https://issues.apache.org/jira/browse/LUCENE-8867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866830#comment-16866830
 ] 

Adrien Grand commented on LUCENE-8867:
--------------------------------------

Sorry, reading my comment again I realize it wasn't clear. I see two distinct 
changes in the pull request. One is about adding a new storage strategy for the 
case that a leaf only has a handful of unique values, I'm +1 on it. The second 
one is about taking advantage of this special case to not compute a relation 
with the same byte[] over and over again, the solution is a bit more 
controversial in my opinion.

bq. another option would be to change more radically the interface and add a 
matches(byte[]) method that returns a boolean and then use the visit(docID) 
method.

Right, this is what I had in mind when I said this is only a problem if you 
have data dimensions. Because if you don't, then you could call 
IntersectVisitor.compare(A, A) as a way to know whether value A matches, and we 
wouldn't need any new API?

> Optimise BKD tree for low cardinality leaves
> --------------------------------------------
>
>                 Key: LUCENE-8867
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8867
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Ignacio Vera
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently if a leaf on the BKD tree contains only few values, then the leaf 
> is treated the same way as it all values are different. It many cases it can 
> be much more efficient to store the distinct values with the cardinality.
> In addition, in this case the method IntersectVisitor#visit(docId, byte[]) is 
> called n times with the same byte array but different docID. This issue 
> proposes to add a new method to the interface that accepts an array of docs 
> so it can be override by implementors and gain search performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to