[ 
https://issues.apache.org/jira/browse/LUCENE-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera updated LUCENE-9087:
---------------------------------
    Attachment: Study of BKD tree performance with different values for max 
points per leaf.pdf
        Status: Open  (was: Open)

I have done an experiment about the effect of changing the default of 
maxPointsInLeafNode (attached the results in the issue). After this experience 
I would like to propose:

 
 * It makes sense to change the way we construct trees, so we always construct 
unbalanced trees as we are doing for the 1D case. This gives us more control 
abut the expected performance.
 *  Lower the default number of max points in leaf node to 512. Even though in 
many cases lowering even further can give greater performance, it would not 
behave so well in edge cases.

 

 

> Should the BKD tree use a fixed maxPointsInLeafNode? 
> -----------------------------------------------------
>
>                 Key: LUCENE-9087
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9087
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Ignacio Vera
>            Priority: Major
>         Attachments: Study of BKD tree performance with different values for 
> max points per leaf.pdf
>
>
> Currently the BKD tree uses a fixed maxPointsInLeafNode provided in the 
> constructor. For the current default codec the value is set to 1024. This is 
> a good compromise between memory usage and performance of the BKD tree.
> Lowering this value can increase search performance but it has a penalty in 
> memory usage. Now that the BKD tree can be load off-heap, this can be less of 
> a concern. Note that lowering too much that value can hurt performance as 
> well as the tree becomes too deep and benefits are gone.
> For data types that use the tree as an effective R-tree (ranges and shapes 
> datatypes) the benefits are larger as it can minimise the overlap between 
> leaf nodes. 
> Finally, creating too many leaf nodes can be dangerous at write time as 
> memory usage depends on the number of leaf nodes created. The writer creates 
> a long array of length = numberOfLeafNodes.
> What I am wondering here is if we can improve this situation in order to 
> create the most efficient tree? My current ideas are:
>  
>  * We can adapt the points per leaf depending on that number so we create a 
> tree with the best depth and best points per leaf. Note that for the for 1D 
> case we have an upper estimation of the number of points that the tree will 
> be indexing. 
>  * Add a mechanism so field types can easily define their best points per 
> leaf. In the case, field types like ranges or shapes can define its own value 
> to minimise overlap.
>  * Maybe the default is just too high now that we can load the tree off-heap.
>  
> Any thoughts?
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to