[
https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106429#comment-14106429
]
David Smiley commented on LUCENE-4922:
--------------------------------------
This is awesome [~varunshenoy]! It's been a pleasure working with you over the
summer. I'll add some more context to those reading along...
What we have here is a new SPT impl that aims to be fast, and flexibly
trade-off index size with search speed similar to how Lucene's
single-dimensional Trie fields let you pick a precisionStep. We call it "flex"
for short; the class is FlexPrefixTree2D. But it's more flexible than simply
picking the equivalent of a precisionStep as it can be configured to vary at
each level (step). As I suspected, we concluded the fastest results were when
the highest and lowest levels had many sub-cells, but few in the middle. At
its best, targeting a similar precision & index size as geohash, I found Flex
to do point-radius queries that were twice as fast on my machine. Your mileage
will certainly vary; Varun's numbers above are on the conservative end. Note
that currently the sub-cells must be a power of 4 and limited to 4, 16, or 64
since it uses a single byte per level, and some bits are reserved for something
called a leaf. Adding support for 256 should be easy, so long as you only
index points since points have no leaf bits.
Another interesting thing about Flex is that it internally uses a square world
model. This means each cell is square, which is more desirable for heat-map
display usage than rectangles produced from Geohash (and Quad when given
rectangular world bounds). This also means the top level is only half-used but
it's of marginal consequence.
Flex internally has it's spatial coordinates expressed in integers on a power
of 2 grid, which eliminates floating point division in lieu of bit shifting.
Since on the surface the shapes in & out are floating point, there is a
conversion step, but it's optimized away in the case of a rectangle, hence the
performance boost when doing rectangle queries. Use of integers vs longs
hinders maximum precision but if it becomes a problem (I doubt it), it could be
easily changed. Using an integer space also better lends itself to spatial
applications that are not floating point (like using points to represent time
durations). More work is needed to fully realize that.
As followers here may infer by it's absence, there is no Hilbert Curve ordering
in Flex yet. It was lower on the priority list because I suspect whatever
performance boost comes of it will be substantially less than the variable grid
size support.
> A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
> ----------------------------------------------------------------------
>
> Key: LUCENE-4922
> URL: https://issues.apache.org/jira/browse/LUCENE-4922
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/spatial
> Reporter: David Smiley
> Assignee: David Smiley
> Labels: gsoc2014
> Attachments: HilbertConverter.zip, LUCENE-4922.patch
>
>
> My wish-list for an ideal SpatialPrefixTree has these properties:
> * Hilbert Curve ordering
> * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for
> all in-between)
> * Compact binary encoding (so-called "Morton number")
> * Works for geodetic (i.e. lat & lon) and non-geodetic
> Some bonus wishes for use in geospatial:
> * Use an equal-area projection such that each cell has an equal area to all
> others at the same level.
> * When advancing a grid level, if a cell's width is less than half its
> height. then divide it as 4 vertically stacked instead of 2 by 2. The point
> is to avoid super-skinny cells which occurs towards the poles and degrades
> performance.
> All of this requires some basic performance benchmarks to measure the effects
> of these characteristics.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]