[ 
https://issues.apache.org/jira/browse/LUCENE-8452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16574539#comment-16574539
 ] 

Ignacio Vera commented on LUCENE-8452:
--------------------------------------

I think one of the keys to understand the performance of BKD-based shape 
indexing is to analyse  how 6-d bounding boxes are transformed into 2-d 
bounding boxes.  Our input is a 6d point representing the minimum value and 6d 
point representing the maximum value of a bounding box and we need to transform 
it into a 2d bounding box to check the relationship with the query shape. The 
logic to get the 2d bounding box goes as follow: 

In a 6-d points we have three dimensions corresponding to latitudes (0, 2, 4) 
and the other three for longitude (1,3,5). To build the 2d minimum point of the 
bounding box from the 6d minimum point, we compute the lat and lon using the 
minimum value of the dimensions for the corresponding lat/lon entries.  To 
build the 2d maximum point of the bounding box from the 6d maximum point, we 
compute the lat and lon using the maximum value of the dimensions for the 
corresponding lat/lon entries.  

For example: We have the following bounding box in 6d and the corresponding 
bounding box in 2d:

Min: (0,0,0,0,0,0) -> (0,0)

Max: (90,90,90,90,90,90)  -> (90,90)

 

Split first dimension in half:

 

Left tree:

Min: (0,0,0,0,0,0) -> (0,0)

Max: (*45*,90,90,90,90,90)  -> (90,90)

Right tree:

Min: (*45*,0,0,0,0,0) -> (0,0)

Max: (90,90,90,90,90,90)  -> (90,90)

 

Using left tree, split  third dimension in half :

 

Left-Left tree:

Min: (0,0,0,0,0,0) -> (0,0)

Max: (*45*,90,*45*,90,90,90)  -> (90,90)

Left-Right tree:

Min: (*45*,0,*45*,0,0,0) -> (0,0)

Max: (90,90,90,90,90,90)  -> (90,90)

 

Using left tree, split fifth dimension in half :

 

Left-Left-Left tree:

Min: (0,0,0,0,0,0) -> (0,0)

Max: (*45*,90,*45*,90,*45*,90)  -> (*45*,90)

Left-Left-Right tree:

Min: (*45*,0,*45*,0,*45*,0) -> (*45*,0)

Max: (90,90,90,90,90,90)  -> (90,90)

 

 We have to split all three dimension for latitude to actually have an effect 
on the 2-d bounding box. Which means that we need to visit many nodes of the 
tree before we actually filter out some of the branches.

[~nknize], does it make sense to you?

> BKD-based shape indexing benchmarks
> -----------------------------------
>
>                 Key: LUCENE-8452
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8452
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/sandbox
>            Reporter: Ignacio Vera
>            Priority: Major
>
> Initial benchmarking of the new BKD-based shape indexing suggest that 
> searches can be somewhat under-performing.   I open this ticket to share the 
> findings and to open a discussion how to speed up the solution.
>  
> The first benchmark is done by using the current benchmark in luceneutils for 
> indexing points and search by bounding box. We would expect {{LatLonShape}} 
> to be slower that {{LatLonPoint}} but still having a good performance. The 
> results of running such benchmark in my computer looks like:
>  
> LatLonPoint:
> 89.717239531 sec to index
> INDEX SIZE: 0.5087761553004384 GB
> READER MB: 0.6098232269287109
> maxDoc=60844404
> totHits=221118844
> BEST M hits/sec: 72.91056132596746
> BEST QPS: 74.19031323419311 
>  
> LatLonShape:
> 89.388678805 sec to index
> INDEX SIZE: 1.3028179928660393 GB
> READER MB: 0.8827085494995117
> maxDoc=60844404
> totHits=221118844
> BEST M hits/sec: 1.0053836784184809
> BEST QPS: 1.0230305276205143
>  
> A second benchmark has been performed indexing around 10 million 4-side 
> polygons and around 3 million points. Searches are performed using bounding 
> boxes. The results are compared with spatial trees alternatives. Spatial 
> trees use a composite strategy, precision=0.001 degrees and distErrPct=0.25:
>  
> s2 (Geo3d):
> 1191.732124301 sec to index part 0
> INDEX SIZE: 3.2086284114047885 GB
> READER MB: 19.453557014465332
> maxDoc=12949519
> totHits=705758537
> BEST M hits/sec: 13.311369588840462
> BEST QPS: 4.243743434150063
>  
> quad (JTS):
> 3252.62925159 sec to index part 0
> INDEX SIZE: 4.5238002222031355 GB
> READER MB: 41.15725612640381
> maxDoc=12949519
> totHits=705758357
> BEST M hits/sec: 35.54591930673003
> BEST QPS: 11.332252412866938
>  
> LatLonShape:
> 30.32712009 sec to index part 0
> INDEX SIZE: 0.5627057952806354 GB
> READER MB: 0.29498958587646484
> maxDoc=12949519
> totHits=705758228
> BEST M hits/sec: 3.4130465326433357
> BEST QPS: 1.0880999177593018
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to