[ https://issues.apache.org/jira/browse/LUCENE-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356757#comment-16356757 ]
Ignacio Vera edited comment on LUCENE-8126 at 2/8/18 3:35 PM: -------------------------------------------------------------- I have added the performance test classes on the branch in case you want to have a look to check results are not bias. I have done two test (They do not include index size as I have no found a reliable way of getting that info), I can only show you for now some tipical results. 1) Indexing 5k random polygons and executing 50 Random queries. Trees have precision set to 1e-4 and distErrPct to 5%: load geohash : 132,15 indexed shapes per second query geohash recursive : 10,88 queries per second query geohash composite : 7,29 queries per second load quad : 299,04 indexed shapes per second query quad recursive : 15,13 queries per second query quad composite : 10,89 queries per second load s2 arity 4 : 623,67 indexed shapes per second query s2 arity 4 recursive : 40,00 queries per second query s2 arity 4 composite : 21,51 queries per second load s2 arity 16 : 283,46 indexed shapes per second query s2 arity 16 recursive : 12,22 queries per second query s2 arity 16 composite : 9,99 queries per second load s2 arity 64 : 159,05 indexed shapes per second query s2 arity 64 recursive : 5,13 queries per second query s2 arity 64 composite : 3,58 queries per second 1) Indexing 50k random points and executing 50 Random queries. Trees have precision set to 1e-6 and distErrPct to 0%: load geohash : 14898,69 indexed shapes per second query geohash recursive : 2,31 queries per second query geohash composite : 2,29 queries per second load quad : 5068,42 indexed shapes per second query quad recursive : 5,10 queries per second query quad composite : 5,11 queries per second load s2 arity 4 : 9748,49 indexed shapes per second query s2 arity 4 recursive : 11,34 queries per second query s2 arity 4 composite : 11,38 queries per second load s2 arity 16 : 15513,50 indexed shapes per second query s2 arity 16 recursive : 3,86 queries per second query s2 arity 16 composite : 3,83 queries per second load s2 arity 64 : 16886,19 indexed shapes per second query s2 arity 64 recursive : 1,56 queries per second query s2 arity 64 composite : 1,56 queries per second Some data of my use case: I need to index ~3M documents. 2M are points, 0.5M polygons and 0.5M multi-polygons with an averge size of 20. They need to be indexed on the celestial sphere (unit sphere). All polygons are squared (4 points) with different sizes, from 1 square degree to very tiny ones, all distributed mainly on the south hemisphere of the sphere. Moving from Geohash to S2 has provided the following benefits: 1) 20% reduccion of index size. 2) 75% reduccion on indexing time (yuhu!). 3) 2.5 times faster queries. Anyway what we need is a benchmark for SPT the same way that there is one for BKD tree. Probably my next mini-project. Conclusion: If you use Geo3D, you probably want to use S2 :) was (Author: ivera): I have added the performance test classes on the branch in case you want to have a look to check results are not bias. I have done two test (They do not include index size as I have no found a reliable way of getting that info), I can only show you for now some tipical results. 1) Indexing 5k random polygons and executing 50 Random queries. Trees have precision set to 1e-4 and distErrPct to 5%: load geohash : 132,15 indexed shapes per second query geohash recursive : 10,88 queries per second query geohash composite : 7,29 queries per second load quad : 299,04 indexed shapes per second query quad recursive : 15,13 queries per second query quad composite : 10,89 queries per second load s2 arity 4 : 623,67 indexed shapes per second query s2 arity 4 recursive : 40,00 queries per second query s2 arity 4 composite : 21,51 queries per second load s2 arity 16 : 283,46 indexed shapes per second query s2 arity 16 recursive : 12,22 queries per second query s2 arity 16 composite : 9,99 queries per second load s2 arity 64 : 159,05 indexed shapes per second query s2 arity 64 recursive : 5,13 queries per second query s2 arity 64 composite : 3,58 queries per second 1) Indexing 50k random points and executing 50 Random queries. Trees have precision set to 1e-6 and distErrPct to 0%: load geohash : 14898,69 indexed shapes per second query geohash recursive : 2,31 queries per second query geohash composite : 2,29 queries per second load quad : 5068,42 indexed shapes per second query quad recursive : 5,10 queries per second query quad composite : 5,11 queries per second load s2 arity 4 : 9748,49 indexed shapes per second query s2 arity 4 recursive : 11,34 queries per second query s2 arity 4 composite : 11,38 queries per second load s2 arity 16 : 15513,50 indexed shapes per second query s2 arity 16 recursive : 3,86 queries per second query s2 arity 16 composite : 3,83 queries per second load s2 arity 64 : 16886,19 indexed shapes per second query s2 arity 64 recursive : 1,56 queries per second query s2 arity 64 composite : 1,56 queries per second Some data of my use case: I need to index ~3M documents. 2M are points, 0.5M polygons and 0.5M multi-polygons with an averge size of 20. They need to be indexed on the celestial sphere (unit sphere). All polygons are squared (4 points) with different sizes, from 1 square degree to very tiny ones, all distributed mainly on the south hemisphere of the sphere. Moving from Geohash to S2 has provided the following benefits: 1) 20% reduccion of index size. 2) 75% reduccion on indexing time (yuhu!). 3) 2.5 times faster queries. Anyway what we need is a benchmark for SPT the same way that there is one for BKD tree. Probably my next mini-project. Conclusion: If you use Geo3D, you probably want to use S2 :) > Spatial prefix tree based on S2 geometry > ---------------------------------------- > > Key: LUCENE-8126 > URL: https://issues.apache.org/jira/browse/LUCENE-8126 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/spatial-extras > Reporter: Ignacio Vera > Assignee: Ignacio Vera > Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Hi [~dsmiley], > I have been working on a prefix tree based on goggle S2 geometry > (https://s2geometry.io/) to be used mainly with Geo3d shapes with very > promising results, in particular for complex shapes (e.g polygons). Using > this pixelization scheme reduces the size of the index, improves the > performance of the queries and reduces the loading time for non-point shapes. > If you are ok with this contribution and before providing any code I would > like to understand what is the correct/prefered approach: > 1) Add new depency to the S2 library > (https://mvnrepository.com/artifact/io.sgr/s2-geometry-library-java). It has > Apache 2.0 license so it should be ok. > 2) Create a utility class with all methods necessary to navigate the S2 tree > and create shapes from S2 cells (basically port what we need from the library > into Lucene). > What do you think? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org