[ 
https://issues.apache.org/jira/browse/LUCENE-7862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604279#comment-16604279
 ] 

Ignacio Vera commented on LUCENE-7862:
--------------------------------------

I have been playing with this approach to see if we can make the BKD tree more 
efficient. I have done some benchmarks that confirm a big boost in performance 
whenever there is correlation between the dimensions so it is particularly good 
for ranges. In addition this boost quite a bit the newly introduced approach 
for indexing shapes in LUCENE-8396.

Maybe we should only ad this extra information to the index when number of 
dimensions > 1. Here are a few benchmarks:

1) Ranges: Using the data points from GeoBench, the ranges have been created as 
following:
{code:java}
double lat = Double.parseDouble(parts[1]);
double lon = Double.parseDouble(parts[2]);
double latEnd = lat + 0.1 * Math.abs(lon);
double lonEnd = lon + 0.1 * Math.abs(lat);{code}
Where latitude is used in even dimension and longitude in uneven ones.

IT = Index time (sec)
 FM = Force merge time (sec)
 IS = Index size (GB)
 RH = Reader heap (MB
|| Approach|| Dimensions||IT Dev||IT Base||IT Diff||IFM Dev||FM Base||FM 
Diff||IS Dev||IS Base||IS Diff||RH Dev||RH Base||IRH Diff||
|ranges|1|102.7s|101.5s|1%|0.0s|0.0s|0%|0.81|0.81|0%|0.80|0.80|-0%|
|ranges|2|96.0s|94.4s|2%|0.0s|0.0s|0%|1.59|1.58|0%|1.02|1.03|-1%|
|ranges|3|93.1s|84.6s|10%|0.0s|0.0s|0%|2.29|2.29|0%|1.18|1.01|17%|
|ranges|4|101.0s|91.0s|11%|0.0s|0.0s|0%|3.07|3.05|0%|1.04|1.12|-7%|

HS = Hits per second
 QPS= Queries per second
 Hits = Total hits
||Approach|| Dimensions||HS Dev||HS Base||HS Diff||QPS Dev||QPS Base||QPS 
Diff||Hits Dev||Hits Base||Hits Diff||
|ranges|1|131.17|116.20|13%|10.81|9.58|13%|2730282630|2730282630|0%|
|ranges|2|34.07|10.83|215%|16.93|5.38|215%|452860046|452860046|0%|
|ranges|3|34.41|3.83|799%|17.10|1.90|799%|452860046|452860046|0%|
|ranges|4|26.01|3.27|696%|12.92|1.62|696%|452860046|452860046|0%|.  |

 

3) GeoBench: comparison of Geo benchmark using points (LatLonPoint), geo3d 
(Geo3DPoint) and shapes (LaLonShape)

IT = Index time (sec)
 FM = Force merge time (sec)
 IS = Index size (GB)
 RH = Reader heap (MB
|| Approach||IT Dev||IT Base||IT Diff||IFM Dev||FM Base||FM Diff||IS Dev||IS 
Base||IS Diff||RH Dev||RH Base||IRH Diff||
|points|103.3s|100.0s|3%|0.0s|0.0s|0%|0.51|0.51|0%|0.61|0.61|0%|
|geo3d|105.6s|102.9s|3%|0.0s|0.0s|0%|0.72|0.72|0%|0.62|0.62|-0%|
|shapes|108.3s|100.6s|8%|0.0s|0.0s|0%|1.31|1.30|0%|0.87|0.87|-0%|

HS = Hits per second
 QPS= Queries per second
 Hits = Total hits
||Approach|| Shape||HS Dev||HS Base||HS Diff||QPS Dev||QPS Base||QPS Diff||Hits 
Dev||Hits Base||Hits Diff||
|points|polyRussia|17.19|17.19|0%|4.90|4.90|0%|3508846|3508846|0%|
|points|polyMedium|9.51|9.26|3%|116.51|113.40|3%|2693559|2693559|0%|
|points|distance|77.56|77.33|0%|45.57|45.43|0%|382961957|382961957|0%|
|points|nearest 10|0.05|0.05|-0%|4651.95|4664.57|-0%|60844404|60844404|0%|
|points|box|81.95|83.07|-1%|83.39|84.53|-1%|221118844|221118844|0%|
|points|poly 10|80.37|79.78|1%|50.83|50.45|1%|355809475|355809475|0%|
|points|sort|33.25|31.26|6%|33.83|31.81|6%|221118844|221118844|0%|
|geo3d|polyRussia|0.51|0.50|2%|0.15|0.14|2%|3508671|3508671|0%|
|geo3d|polyMedium|0.42|0.43|-0%|5.20|5.23|-0%|2693545|2693545|0%|
|geo3d|distance|64.36|63.47|1%|37.77|37.25|1%|383371884|383371884|0%|
|geo3d|box|49.29|51.60|-4%|50.15|52.50|-4%|221118844|221118844|0%|
|geo3d|poly 10|39.40|39.43|-0%|24.91|24.93|-0%|355855227|355855227|0%|
|shapes|polyRussia|2.81|0.31|819%|0.80|0.09|819%|3508846|3508846|0%|
|shapes|polyMedium|0.52|0.08|539%|6.38|1.00|539%|2693559|2693559|0%|
|shapes|box|11.07|1.46|661%|11.27|1.48|661%|221118844|221118844|0%|
|shapes|poly 10|16.83|1.46|1052%|10.64|0.92|1052%|355809475|355809475|0%|

 

> Should BKD cells store their min/max packed values?
> ---------------------------------------------------
>
>                 Key: LUCENE-7862
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7862
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7862.patch, LUCENE-7862.patch
>
>
> The index of the BKD tree already allows to know lower and upper bounds of 
> values in a given dimension. However the actual range of values might be more 
> narrow than what the index tells us, especially if splitting on one dimension 
> reduces the range of values in at least one other dimension. For instance 
> this tends to be the case with range fields: since we enforce that lower 
> bounds are less than upper bounds, splitting on one dimension will also 
> affect the range of values in the other dimension.
> So I'm wondering whether we should store the actual range of values for each 
> dimension in leaf blocks, this will hopefully allow to figure out that either 
> none or all values match in a block without having to check them all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to