[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values
[ https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952042#comment-16952042 ] Michael Sokolov commented on LUCENE-8928: - > I'm curious whether that also helps the 3D field you use to filter deals Yes, it will be interesting to see! It might take us a little while, and there tend to be more "real" deals around certain times of year ... > BKDWriter could make splitting decisions based on the actual range of values > > > Key: LUCENE-8928 > URL: https://issues.apache.org/jira/browse/LUCENE-8928 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Currently BKDWriter assumes that splitting on one dimension has no effect on > values in other dimensions. While this may be ok for geo points, this is > usually not true for ranges (or geo shapes, which are ranges too). Maybe we > could get better indexing by re-computing the range of values on each > dimension before making the choice of the split dimension? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values
[ https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951846#comment-16951846 ] Adrien Grand commented on LUCENE-8928: -- I'm curious whether that also helps the 3D field you use to filter deals [~mikemccand] [~sokolov]. > BKDWriter could make splitting decisions based on the actual range of values > > > Key: LUCENE-8928 > URL: https://issues.apache.org/jira/browse/LUCENE-8928 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Currently BKDWriter assumes that splitting on one dimension has no effect on > values in other dimensions. While this may be ok for geo points, this is > usually not true for ranges (or geo shapes, which are ranges too). Maybe we > could get better indexing by re-computing the range of values on each > dimension before making the choice of the split dimension? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values
[ https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951232#comment-16951232 ] ASF subversion and git services commented on LUCENE-8928: - Commit e719ea5a42c2b4ec4bec2668056708e910f85e05 in lucene-solr's branch refs/heads/branch_8x from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e719ea5 ] LUCENE-8928: Check that point is inside an edge bounding box when checking if the point belongs to the edge > BKDWriter could make splitting decisions based on the actual range of values > > > Key: LUCENE-8928 > URL: https://issues.apache.org/jira/browse/LUCENE-8928 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Currently BKDWriter assumes that splitting on one dimension has no effect on > values in other dimensions. While this may be ok for geo points, this is > usually not true for ranges (or geo shapes, which are ranges too). Maybe we > could get better indexing by re-computing the range of values on each > dimension before making the choice of the split dimension? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values
[ https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951230#comment-16951230 ] ASF subversion and git services commented on LUCENE-8928: - Commit 1d97e25a8a77c44d60ea5350b344cce1e48dcc71 in lucene-solr's branch refs/heads/master from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1d97e25 ] LUCENE-8928: Check that point is inside an edge bounding box when checking if the point belongs to the edge > BKDWriter could make splitting decisions based on the actual range of values > > > Key: LUCENE-8928 > URL: https://issues.apache.org/jira/browse/LUCENE-8928 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Currently BKDWriter assumes that splitting on one dimension has no effect on > values in other dimensions. While this may be ok for geo points, this is > usually not true for ranges (or geo shapes, which are ranges too). Maybe we > could get better indexing by re-computing the range of values on each > dimension before making the choice of the split dimension? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values
[ https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951189#comment-16951189 ] Michael McCandless commented on LUCENE-8928: Wow :) Awesome! > BKDWriter could make splitting decisions based on the actual range of values > > > Key: LUCENE-8928 > URL: https://issues.apache.org/jira/browse/LUCENE-8928 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Currently BKDWriter assumes that splitting on one dimension has no effect on > values in other dimensions. While this may be ok for geo points, this is > usually not true for ranges (or geo shapes, which are ranges too). Maybe we > could get better indexing by re-computing the range of values on each > dimension before making the choice of the split dimension? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values
[ https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950859#comment-16950859 ] Adrien Grand commented on LUCENE-8928: -- [~daddywri] The above might be interesting to you. > BKDWriter could make splitting decisions based on the actual range of values > > > Key: LUCENE-8928 > URL: https://issues.apache.org/jira/browse/LUCENE-8928 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Currently BKDWriter assumes that splitting on one dimension has no effect on > values in other dimensions. While this may be ok for geo points, this is > usually not true for ranges (or geo shapes, which are ranges too). Maybe we > could get better indexing by re-computing the range of values on each > dimension before making the choice of the split dimension? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values
[ https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950858#comment-16950858 ] Adrien Grand commented on LUCENE-8928: -- The nightly benchmarks liked this change http://people.apache.org/~mikemccand/geobench.html - 32% faster distance filtering with Geo3D - 38% faster 10-gons filtering with Geo3D - 20% faster 10-gons filtering with shapes - 30% faster box filtering with Geo3D - 22% faster box filtering with shapes - 1% space reduction for shapes However - ~7% slower indexing for Geo3D - ~8% slower indexing for shapes > BKDWriter could make splitting decisions based on the actual range of values > > > Key: LUCENE-8928 > URL: https://issues.apache.org/jira/browse/LUCENE-8928 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Currently BKDWriter assumes that splitting on one dimension has no effect on > values in other dimensions. While this may be ok for geo points, this is > usually not true for ranges (or geo shapes, which are ranges too). Maybe we > could get better indexing by re-computing the range of values on each > dimension before making the choice of the split dimension? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values
[ https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949591#comment-16949591 ] ASF subversion and git services commented on LUCENE-8928: - Commit a9c77504023b3f1e0b81dbe52537fa19f4586200 in lucene-solr's branch refs/heads/branch_8x from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a9c7750 ] LUCENE-8928: Compute exact bounds every N splits (#926) When building a kd-tree for dimensions n > 2, compute exact bounds for an inner node every N splits to improve the quality of the tree. N is defined by SPLITS_BEFORE_EXACT_BOUNDS which is set to 4. > BKDWriter could make splitting decisions based on the actual range of values > > > Key: LUCENE-8928 > URL: https://issues.apache.org/jira/browse/LUCENE-8928 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Currently BKDWriter assumes that splitting on one dimension has no effect on > values in other dimensions. While this may be ok for geo points, this is > usually not true for ranges (or geo shapes, which are ranges too). Maybe we > could get better indexing by re-computing the range of values on each > dimension before making the choice of the split dimension? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values
[ https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938429#comment-16938429 ] Ignacio Vera commented on LUCENE-8928: -- Run some benchmarks by comparing this new approach with the previous approach shown a similar query performance but a much faster indexing rate: ||Approach||Index time (sec)||Index time (sec)|| ||Force merge time (sec)||Force merge time (sec)|| ||Index size (GB)||Index size (GB)|| ||Reader heap (MB)||Reader heap (MB)|| || ||Dev||Base||Diff||Dev||Base||diff||Dev||Base||Diff||Dev||Base||Diff|| |geo3d|163.5s|218.4s|-25%|0.0s|0.0s| 0%|0.71|0.71|-0%|1.75|1.75|-0%| |shapes|227.8s|319.6s|-29%|0.0s|0.0s| 0%|1.27|1.27| 0%|1.78|1.78| 0%| ||Approach||Shape||M hits/sec||M hits/sec|| ||QPS ||QPS || ||Hit count ||Hit count|| || || ||Dev||Base ||Diff||Dev||Base||Diff||Dev||Base||Diff|| |geo3d|box|55.58|57.53|-3%|56.56|58.54|-3%|221118844|221118844| 0%| |geo3d|polyRussia|0.56|0.56|-1%|0.16|0.16|-1%|3508671|3508671| 0%| |geo3d|poly 10|48.87|51.25|-5%|30.90|32.41|-5%|355855227|355855227| 0%| |geo3d|polyMedium|0.62|0.63|-1%|7.64|7.67|-1%|2693545|2693545| 0%| |geo3d|distance|68.16|69.70|-2%|40.00|40.91|-2%|383371884|383371884| 0%| |shapes|box|45.99|46.52|-1%|46.80|47.34|-1%|221118844|221118844| 0%| |shapes|polyRussia|6.64|7.01|-5%|1.89|2.00|-5%|3508846|3508846| 0%| |shapes|poly 10|33.40|34.69|-4%|21.12|21.93|-4%|355809475|355809475| 0%| |shapes|polyMedium|3.07|3.30|-7%|37.62|40.43|-7%|2693559|2693559| 0%| > BKDWriter could make splitting decisions based on the actual range of values > > > Key: LUCENE-8928 > URL: https://issues.apache.org/jira/browse/LUCENE-8928 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > Currently BKDWriter assumes that splitting on one dimension has no effect on > values in other dimensions. While this may be ok for geo points, this is > usually not true for ranges (or geo shapes, which are ranges too). Maybe we > could get better indexing by re-computing the range of values on each > dimension before making the choice of the split dimension? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values
[ https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936519#comment-16936519 ] Ignacio Vera commented on LUCENE-8928: -- I have played a bit more with this idea and I wondered if we need to compute exact bounds for every split. I modified [~jpountz] patch so instead of computing the bounds for every split, it computes every N splits. This is controlled by a static property called {{SPLITS_BEFORE_EXACT_BOUNDS}}. The patch can be found here: https://github.com/iverase/lucene-solr/commit/e63f8c73a86c46ec406143fcd0cb31a8371dfe63 My test show that setting this value to 4 (compute exact bounds every 4 splits) reduces the indexing overhead to around 10% and keeps almost the same performance as the previous approach. Maybe we can find a better heuristic to set such value. In addition, this patch does not apply for dimension <= 2 and the split algorithm is reverted to the original one. > BKDWriter could make splitting decisions based on the actual range of values > > > Key: LUCENE-8928 > URL: https://issues.apache.org/jira/browse/LUCENE-8928 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > Currently BKDWriter assumes that splitting on one dimension has no effect on > values in other dimensions. While this may be ok for geo points, this is > usually not true for ranges (or geo shapes, which are ranges too). Maybe we > could get better indexing by re-computing the range of values on each > dimension before making the choice of the split dimension? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org