[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527338#comment-14527338
 ] 

Michael McCandless commented on LUCENE-6450:
--------------------------------------------

Here's the OSM subset I'm using for the benchmarks:
http://people.apache.org/~mikemccand/latlon.subsetPlusAllLondon.txt.lzma

It's a random 1/50th of the latest OSM export (as of last week), but
includes all points within London, UK.

The search benchmark then runs a fixed set (225 total) of axis-aligned
rectangle intersects queries around London.

Look for Index/SearchOSM/GeoPoint.java/py in luceneutil...

I ran the same benchmarks (except for Packed/QuadPrefixTree):

*Geopoint*

  Index time: 157.3 sec (incl. forceMerge)
  Index size: 1.8 GB
  Mean query time: .077 sec
  221,119,062 total hits

*GeoHashPrefixTree*

  Index time: 628.5 sec (incl. forceMerge)
  Index size: 4.2 GB
  Mean query time: .039 sec
  221,120,027 total hits

*libspatialindex* (using Python Rtree wrapper)

  Index time: 469.6 sec
  Index size: 2.6 GB
  Mean query time: .158 sec
  221,118,844 total hits

The first geopoint patch here got exactly the same total hit count as
libspatialindex, but now it's different, I think because of the
precision control to control how deep the ranges recurse.  I think
it's also expected geohash won't get the same hit count since it's
doing a bit of quantizing (level 11 ... not sure what that equates to
in meters).

I'm surprised the Rtree impl is so slow ...


> Add simple encoded GeoPointField type to core
> ---------------------------------------------
>
>                 Key: LUCENE-6450
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6450
>             Project: Lucene - Core
>          Issue Type: New Feature
>    Affects Versions: Trunk, 5.x
>            Reporter: Nicholas Knize
>            Priority: Minor
>         Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
> LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch
>
>
> At the moment all spatial capabilities, including basic point based indexing 
> and querying, require the lucene-spatial module. The spatial module, designed 
> to handle all things geo, requires dependency overhead (s4j, jts) to provide 
> spatial rigor for even the most simplistic spatial search use-cases (e.g., 
> lat/lon bounding box, point in poly, distance search). This feature trims the 
> overhead by adding a new GeoPointField type to core along with 
> GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
> field is intended as a straightforward lightweight type for the most basic 
> geo point use-cases without the overhead. 
> The field uses simple bit twiddling operations (currently morton hashing) to 
> encode lat/lon into a single long term.  The queries leverage simple 
> multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
> candidate terms deferring the more expensive mathematics to the smaller 
> candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to