[ 
https://issues.apache.org/jira/browse/LUCENE-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356757#comment-16356757
 ] 

Ignacio Vera edited comment on LUCENE-8126 at 2/8/18 3:35 PM:
--------------------------------------------------------------

I have added the performance test classes on the branch in case you want to 
have a look to check results are not bias.

I have done two test (They do not include index size as I have no found a 
reliable way of getting that info), I can only show you for now some tipical 
results.

1) Indexing 5k random polygons and executing 50 Random queries. Trees have 
precision set to 1e-4 and distErrPct to 5%:

load geohash : 132,15 indexed shapes per second
 query geohash recursive : 10,88 queries per second
 query geohash composite : 7,29 queries per second

load quad : 299,04 indexed shapes per second
 query quad recursive : 15,13 queries per second
 query quad composite : 10,89 queries per second

load s2 arity 4 : 623,67 indexed shapes per second
 query s2 arity 4 recursive : 40,00 queries per second
 query s2 arity 4 composite : 21,51 queries per second

load s2 arity 16 : 283,46 indexed shapes per second
 query s2 arity 16 recursive : 12,22 queries per second
 query s2 arity 16 composite : 9,99 queries per second

load s2 arity 64 : 159,05 indexed shapes per second
 query s2 arity 64 recursive : 5,13 queries per second
 query s2 arity 64 composite : 3,58 queries per second

 

1) Indexing 50k random points and executing 50 Random queries. Trees have 
precision set to 1e-6 and distErrPct to 0%:

load geohash : 14898,69 indexed shapes per second
 query geohash recursive : 2,31 queries per second
 query geohash composite : 2,29 queries per second

load quad : 5068,42 indexed shapes per second
 query quad recursive : 5,10 queries per second
 query quad composite : 5,11 queries per second

load s2 arity 4 : 9748,49 indexed shapes per second
 query s2 arity 4 recursive : 11,34 queries per second
 query s2 arity 4 composite : 11,38 queries per second

load s2 arity 16 : 15513,50 indexed shapes per second
 query s2 arity 16 recursive : 3,86 queries per second
 query s2 arity 16 composite : 3,83 queries per second

load s2 arity 64 : 16886,19 indexed shapes per second
 query s2 arity 64 recursive : 1,56 queries per second
 query s2 arity 64 composite : 1,56 queries per second

 

Some data of my use case: I need to index ~3M  documents. 2M are points, 0.5M 
polygons and 0.5M multi-polygons with an averge size of 20. They need to be 
indexed on the celestial sphere (unit sphere). All polygons are squared (4 
points) with different sizes, from 1 square degree to very tiny ones, all 
distributed mainly on the south hemisphere of the sphere. Moving from Geohash 
to S2 has provided the following benefits:

1) 20% reduccion of index size.

2) 75% reduccion on indexing time (yuhu!).

3) 2.5 times faster queries.

 

Anyway what we need is a benchmark for SPT the same way that there is one for 
BKD tree. Probably my next mini-project.

Conclusion: If you use Geo3D, you probably want to use S2 :)

 


was (Author: ivera):
I have added the performance test classes on the branch in case you want to 
have a look to check results are not bias.

I have done two test (They do not include index size as I have no found a 
reliable way of getting that info), I can only show you for now some tipical 
results.

1) Indexing 5k random polygons and executing 50 Random queries. Trees have 
precision set to 1e-4 and distErrPct to 5%:

load geohash : 132,15 indexed shapes per second
query geohash recursive : 10,88 queries per second
query geohash composite : 7,29 queries per second

load quad : 299,04 indexed shapes per second
query quad recursive : 15,13 queries per second
query quad composite : 10,89 queries per second

load s2 arity 4 : 623,67 indexed shapes per second
query s2 arity 4 recursive : 40,00 queries per second
query s2 arity 4 composite : 21,51 queries per second

load s2 arity 16 : 283,46 indexed shapes per second
query s2 arity 16 recursive : 12,22 queries per second
query s2 arity 16 composite : 9,99 queries per second

load s2 arity 64 : 159,05 indexed shapes per second
query s2 arity 64 recursive : 5,13 queries per second
query s2 arity 64 composite : 3,58 queries per second

 

1) Indexing 50k random points and executing 50 Random queries. Trees have 
precision set to 1e-6 and distErrPct to 0%:

load geohash : 14898,69 indexed shapes per second
query geohash recursive : 2,31 queries per second
query geohash composite : 2,29 queries per second

load quad : 5068,42 indexed shapes per second
query quad recursive : 5,10 queries per second
query quad composite : 5,11 queries per second

load s2 arity 4 : 9748,49 indexed shapes per second
query s2 arity 4 recursive : 11,34 queries per second
query s2 arity 4 composite : 11,38 queries per second

load s2 arity 16 : 15513,50 indexed shapes per second
query s2 arity 16 recursive : 3,86 queries per second
query s2 arity 16 composite : 3,83 queries per second

load s2 arity 64 : 16886,19 indexed shapes per second
query s2 arity 64 recursive : 1,56 queries per second
query s2 arity 64 composite : 1,56 queries per second

 

Some data of my use case: I need to index ~3M  documents. 2M are points, 0.5M 
polygons and 0.5M multi-polygons with an averge size of 20. They need to be 
indexed on the celestial sphere (unit sphere). All polygons are squared (4 
points) with different sizes, from 1 square degree to very tiny ones, all 
distributed mainly on the south hemisphere of the sphere. Moving from Geohash 
to S2 has provided the following benefits:

1) 20% reduccion of index size.

2) 75% reduccion on indexing time (yuhu!). 

3) 2.5 times faster queries.

 

Anyway what we need is a benchmark for SPT the same way that there is one for 
BKD tree. Probably my next mini-project.

Conclusion: If you use Geo3D, you probably want to use S2 :)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

> Spatial prefix tree based on S2 geometry
> ----------------------------------------
>
>                 Key: LUCENE-8126
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8126
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/spatial-extras
>            Reporter: Ignacio Vera
>            Assignee: Ignacio Vera
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hi [~dsmiley],
> I have been working on a prefix tree based on goggle S2 geometry 
> (https://s2geometry.io/) to be used mainly with Geo3d shapes with very 
> promising results, in particular for complex shapes (e.g polygons). Using 
> this pixelization scheme reduces the size of the index, improves the 
> performance of the queries and reduces the loading time for non-point shapes. 
> If you are ok with this contribution and before providing any code I would 
> like to understand what is the correct/prefered approach:
> 1) Add new depency to the S2 library 
> (https://mvnrepository.com/artifact/io.sgr/s2-geometry-library-java). It has 
> Apache 2.0 license so it should be ok.
> 2) Create a utility class with all methods necessary to navigate the S2 tree 
> and create shapes from S2 cells (basically port what we need from the library 
> into Lucene).
> What do you think?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to