[
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527338#comment-14527338
]
Michael McCandless commented on LUCENE-6450:
--------------------------------------------
Here's the OSM subset I'm using for the benchmarks:
http://people.apache.org/~mikemccand/latlon.subsetPlusAllLondon.txt.lzma
It's a random 1/50th of the latest OSM export (as of last week), but
includes all points within London, UK.
The search benchmark then runs a fixed set (225 total) of axis-aligned
rectangle intersects queries around London.
Look for Index/SearchOSM/GeoPoint.java/py in luceneutil...
I ran the same benchmarks (except for Packed/QuadPrefixTree):
*Geopoint*
Index time: 157.3 sec (incl. forceMerge)
Index size: 1.8 GB
Mean query time: .077 sec
221,119,062 total hits
*GeoHashPrefixTree*
Index time: 628.5 sec (incl. forceMerge)
Index size: 4.2 GB
Mean query time: .039 sec
221,120,027 total hits
*libspatialindex* (using Python Rtree wrapper)
Index time: 469.6 sec
Index size: 2.6 GB
Mean query time: .158 sec
221,118,844 total hits
The first geopoint patch here got exactly the same total hit count as
libspatialindex, but now it's different, I think because of the
precision control to control how deep the ranges recurse. I think
it's also expected geohash won't get the same hit count since it's
doing a bit of quantizing (level 11 ... not sure what that equates to
in meters).
I'm surprised the Rtree impl is so slow ...
> Add simple encoded GeoPointField type to core
> ---------------------------------------------
>
> Key: LUCENE-6450
> URL: https://issues.apache.org/jira/browse/LUCENE-6450
> Project: Lucene - Core
> Issue Type: New Feature
> Affects Versions: Trunk, 5.x
> Reporter: Nicholas Knize
> Priority: Minor
> Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch,
> LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch
>
>
> At the moment all spatial capabilities, including basic point based indexing
> and querying, require the lucene-spatial module. The spatial module, designed
> to handle all things geo, requires dependency overhead (s4j, jts) to provide
> spatial rigor for even the most simplistic spatial search use-cases (e.g.,
> lat/lon bounding box, point in poly, distance search). This feature trims the
> overhead by adding a new GeoPointField type to core along with
> GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This
> field is intended as a straightforward lightweight type for the most basic
> geo point use-cases without the overhead.
> The field uses simple bit twiddling operations (currently morton hashing) to
> encode lat/lon into a single long term. The queries leverage simple
> multi-phase filtering that starts by leveraging NumericRangeQuery to reduce
> candidate terms deferring the more expensive mathematics to the smaller
> candidate sets.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]