[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
[ https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106429#comment-14106429 ] David Smiley commented on LUCENE-4922: -- This is awesome [~varunshenoy]! It's been a pleasure working with you over the summer. I'll add some more context to those reading along... What we have here is a new SPT impl that aims to be fast, and flexibly trade-off index size with search speed similar to how Lucene's single-dimensional Trie fields let you pick a precisionStep. We call it "flex" for short; the class is FlexPrefixTree2D. But it's more flexible than simply picking the equivalent of a precisionStep as it can be configured to vary at each level (step). As I suspected, we concluded the fastest results were when the highest and lowest levels had many sub-cells, but few in the middle. At its best, targeting a similar precision & index size as geohash, I found Flex to do point-radius queries that were twice as fast on my machine. Your mileage will certainly vary; Varun's numbers above are on the conservative end. Note that currently the sub-cells must be a power of 4 and limited to 4, 16, or 64 since it uses a single byte per level, and some bits are reserved for something called a leaf. Adding support for 256 should be easy, so long as you only index points since points have no leaf bits. Another interesting thing about Flex is that it internally uses a square world model. This means each cell is square, which is more desirable for heat-map display usage than rectangles produced from Geohash (and Quad when given rectangular world bounds). This also means the top level is only half-used but it's of marginal consequence. Flex internally has it's spatial coordinates expressed in integers on a power of 2 grid, which eliminates floating point division in lieu of bit shifting. Since on the surface the shapes in & out are floating point, there is a conversion step, but it's optimized away in the case of a rectangle, hence the performance boost when doing rectangle queries. Use of integers vs longs hinders maximum precision but if it becomes a problem (I doubt it), it could be easily changed. Using an integer space also better lends itself to spatial applications that are not floating point (like using points to represent time durations). More work is needed to fully realize that. As followers here may infer by it's absence, there is no Hilbert Curve ordering in Flex yet. It was lower on the priority list because I suspect whatever performance boost comes of it will be substantially less than the variable grid size support. > A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes > -- > > Key: LUCENE-4922 > URL: https://issues.apache.org/jira/browse/LUCENE-4922 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley > Labels: gsoc2014 > Attachments: HilbertConverter.zip, LUCENE-4922.patch > > > My wish-list for an ideal SpatialPrefixTree has these properties: > * Hilbert Curve ordering > * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for > all in-between) > * Compact binary encoding (so-called "Morton number") > * Works for geodetic (i.e. lat & lon) and non-geodetic > Some bonus wishes for use in geospatial: > * Use an equal-area projection such that each cell has an equal area to all > others at the same level. > * When advancing a grid level, if a cell's width is less than half its > height. then divide it as 4 vertically stacked instead of 2 by 2. The point > is to avoid super-skinny cells which occurs towards the poles and degrades > performance. > All of this requires some basic performance benchmarks to measure the effects > of these characteristics. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
[ https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104962#comment-14104962 ] Varun V Shenoy commented on LUCENE-4922: - As my GSOC project, I have implemented a spatial prefix tree with the above mentioned features. This tree, named "FlexPrefixTree" has the following features. - FPT performs 25% faster for circular queries and 36% faster for rectangular queries. - The implementation should be easy to adapt to a future integer based spatial application when floating point isn’t desired. - Very compact index sizes are possible. Upto 14% lessar than GeoHashPrefixTree and 60% lessar than QuadPrefixTree implementations. - Flexible trade-offs between index size and search speed. Variable number of sub-cells allows us to achieve very small index sizes. - FPT internally uses int, to speed up spatial calculations. - Due to reuse of a FlexCell per level, BytesRef per FPT, Rectangle shape per level, FlexPrefixTreeIterator per level and compact index sizes, we could achieve upto 48% reduction in average used memory and 38% in average total used memory. > A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes > -- > > Key: LUCENE-4922 > URL: https://issues.apache.org/jira/browse/LUCENE-4922 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley > Labels: gsoc2014 > Attachments: HilbertConverter.zip > > > My wish-list for an ideal SpatialPrefixTree has these properties: > * Hilbert Curve ordering > * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for > all in-between) > * Compact binary encoding (so-called "Morton number") > * Works for geodetic (i.e. lat & lon) and non-geodetic > Some bonus wishes for use in geospatial: > * Use an equal-area projection such that each cell has an equal area to all > others at the same level. > * When advancing a grid level, if a cell's width is less than half its > height. then divide it as 4 vertically stacked instead of 2 by 2. The point > is to avoid super-skinny cells which occurs towards the poles and degrades > performance. > All of this requires some basic performance benchmarks to measure the effects > of these characteristics. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
[ https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976526#comment-13976526 ] Varun V Shenoy commented on LUCENE-4922: - Hi, I will be tackling this issue as an accepted GSOC student and below is my branch on GitHub. https://github.com/shenoyvvarun/lucene-solr/tree/4992 > A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes > -- > > Key: LUCENE-4922 > URL: https://issues.apache.org/jira/browse/LUCENE-4922 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley > Labels: gsoc2014 > Attachments: HilbertConverter.zip > > > My wish-list for an ideal SpatialPrefixTree has these properties: > * Hilbert Curve ordering > * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for > all in-between) > * Compact binary encoding (so-called "Morton number") > * Works for geodetic (i.e. lat & lon) and non-geodetic > Some bonus wishes for use in geospatial: > * Use an equal-area projection such that each cell has an equal area to all > others at the same level. > * When advancing a grid level, if a cell's width is less than half its > height. then divide it as 4 vertically stacked instead of 2 by 2. The point > is to avoid super-skinny cells which occurs towards the poles and degrades > performance. > All of this requires some basic performance benchmarks to measure the effects > of these characteristics. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
[ https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13925320#comment-13925320 ] Varun V Shenoy commented on LUCENE-4922: - Thanks Hatim, I have been going through your proposal, and it helped me a lot. I am getting really excited. > A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes > -- > > Key: LUCENE-4922 > URL: https://issues.apache.org/jira/browse/LUCENE-4922 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley > Labels: gsoc2014 > Attachments: HilbertConverter.zip > > > My wish-list for an ideal SpatialPrefixTree has these properties: > * Hilbert Curve ordering > * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for > all in-between) > * Compact binary encoding (so-called "Morton number") > * Works for geodetic (i.e. lat & lon) and non-geodetic > Some bonus wishes for use in geospatial: > * Use an equal-area projection such that each cell has an equal area to all > others at the same level. > * When advancing a grid level, if a cell's width is less than half its > height. then divide it as 4 vertically stacked instead of 2 by 2. The point > is to avoid super-skinny cells which occurs towards the poles and degrades > performance. > All of this requires some basic performance benchmarks to measure the effects > of these characteristics. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
[ https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918257#comment-13918257 ] Hatim Hakeel commented on LUCENE-4922: -- Hi, I have provided a link to the proposal I prepared last year for the benefit of anyone looking for related background material like documentation, papers etc. Hope it would be helpful. http://hatimhakeel.blogspot.com/2014/03/this-is-my-gsoc-2013-project-proposal.html > A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes > -- > > Key: LUCENE-4922 > URL: https://issues.apache.org/jira/browse/LUCENE-4922 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley > Labels: gsoc2014 > Attachments: HilbertConverter.zip > > > My wish-list for an ideal SpatialPrefixTree has these properties: > * Hilbert Curve ordering > * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for > all in-between) > * Compact binary encoding (so-called "Morton number") > * Works for geodetic (i.e. lat & lon) and non-geodetic > Some bonus wishes for use in geospatial: > * Use an equal-area projection such that each cell has an equal area to all > others at the same level. > * When advancing a grid level, if a cell's width is less than half its > height. then divide it as 4 vertically stacked instead of 2 by 2. The point > is to avoid super-skinny cells which occurs towards the poles and degrades > performance. > All of this requires some basic performance benchmarks to measure the effects > of these characteristics. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
Hi, I have provided a link to the proposal I prepared last year for the benefit of anyone looking for related background material like documentation, papers etc. Hope it would be helpful. http://hatimhakeel.blogspot.com/2014/03/this-is-my-gsoc-2013-project-proposal.html On Mon, Mar 3, 2014 at 12:32 PM, Varun V Shenoy (JIRA) wrote: > > [ > https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917781#comment-13917781] > > Varun V Shenoy commented on LUCENE-4922: > - > > I am Varun and I will be really interested to pursue this as my GSOC2014 > project. Any documentation, references, papers etc on this topic will be > extremely helpful. > > > A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes > > -- > > > > Key: LUCENE-4922 > > URL: https://issues.apache.org/jira/browse/LUCENE-4922 > > Project: Lucene - Core > > Issue Type: New Feature > > Components: modules/spatial > >Reporter: David Smiley > >Assignee: David Smiley > > Labels: gsoc2014 > > Attachments: HilbertConverter.zip > > > > > > My wish-list for an ideal SpatialPrefixTree has these properties: > > * Hilbert Curve ordering > > * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 > for all in-between) > > * Compact binary encoding (so-called "Morton number") > > * Works for geodetic (i.e. lat & lon) and non-geodetic > > Some bonus wishes for use in geospatial: > > * Use an equal-area projection such that each cell has an equal area to > all others at the same level. > > * When advancing a grid level, if a cell's width is less than half its > height. then divide it as 4 vertically stacked instead of 2 by 2. The point > is to avoid super-skinny cells which occurs towards the poles and degrades > performance. > > All of this requires some basic performance benchmarks to measure the > effects of these characteristics. > > > > -- > This message was sent by Atlassian JIRA > (v6.2#6252) > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
[ https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917781#comment-13917781 ] Varun V Shenoy commented on LUCENE-4922: - I am Varun and I will be really interested to pursue this as my GSOC2014 project. Any documentation, references, papers etc on this topic will be extremely helpful. > A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes > -- > > Key: LUCENE-4922 > URL: https://issues.apache.org/jira/browse/LUCENE-4922 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley > Labels: gsoc2014 > Attachments: HilbertConverter.zip > > > My wish-list for an ideal SpatialPrefixTree has these properties: > * Hilbert Curve ordering > * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for > all in-between) > * Compact binary encoding (so-called "Morton number") > * Works for geodetic (i.e. lat & lon) and non-geodetic > Some bonus wishes for use in geospatial: > * Use an equal-area projection such that each cell has an equal area to all > others at the same level. > * When advancing a grid level, if a cell's width is less than half its > height. then divide it as 4 vertically stacked instead of 2 by 2. The point > is to avoid super-skinny cells which occurs towards the poles and degrades > performance. > All of this requires some basic performance benchmarks to measure the effects > of these characteristics. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
[ https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13657418#comment-13657418 ] John Berryman commented on LUCENE-4922: --- That's fine. Pull me into the conversations you have with Hatim and I can probably help things move along faster. > A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes > -- > > Key: LUCENE-4922 > URL: https://issues.apache.org/jira/browse/LUCENE-4922 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley > Labels: gsoc2013, mentor, newdev > Attachments: HilbertConverter.zip > > > My wish-list for an ideal SpatialPrefixTree has these properties: > * Hilbert Curve ordering > * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for > all in-between) > * Compact binary encoding (so-called "Morton number") > * Works for geodetic (i.e. lat & lon) and non-geodetic > Some bonus wishes for use in geospatial: > * Use an equal-area projection such that each cell has an equal area to all > others at the same level. > * When advancing a grid level, if a cell's width is less than half its > height. then divide it as 4 vertically stacked instead of 2 by 2. The point > is to avoid super-skinny cells which occurs towards the poles and degrades > performance. > All of this requires some basic performance benchmarks to measure the effects > of these characteristics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
[ https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655944#comment-13655944 ] David Smiley commented on LUCENE-4922: -- Thanks for your interest in this John. You've gotten the ball rolling. I confess I don't do any byte or bit manipulation work so it'll take some time to fully digest what's going on in this code which uses a lot of it. More comments would have helped. It might be interesting to look at how Lucene Trie floating point fields work. Thanks for the reference to efficient ways to interleave bits; I'll re-post here for convenient reference: http://www-graphics.stanford.edu/~seander/bithacks.html#InterleaveBMN (there are 2 approaches there). I'm not sure when I'll have significant time to contribute more directly other than add to the conversation. > A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes > -- > > Key: LUCENE-4922 > URL: https://issues.apache.org/jira/browse/LUCENE-4922 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley > Labels: gsoc2013, mentor, newdev > Attachments: HilbertConverter.zip > > > My wish-list for an ideal SpatialPrefixTree has these properties: > * Hilbert Curve ordering > * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for > all in-between) > * Compact binary encoding (so-called "Morton number") > * Works for geodetic (i.e. lat & lon) and non-geodetic > Some bonus wishes for use in geospatial: > * Use an equal-area projection such that each cell has an equal area to all > others at the same level. > * When advancing a grid level, if a cell's width is less than half its > height. then divide it as 4 vertically stacked instead of 2 by 2. The point > is to avoid super-skinny cells which occurs towards the poles and degrades > performance. > All of this requires some basic performance benchmarks to measure the effects > of these characteristics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
[ https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655668#comment-13655668 ] John Berryman commented on LUCENE-4922: --- I just attached code called "HilbertConverter". This is a Java example of converting from a set of x,y values (within specified bounds) to a HilbertOrdered value. The strategy is to # Convert the x,y value so doubles between 0 and 1. # Convert these doubles to large integers with max value 256^numBytes (user specifies numBytes). Note: there's probably a way to do these two last steps simultaneously and regain some precision. Note: numBytes must be <= 7 for now. # Interleave the bits so that you get a z-order value (note, I did this the "obvious way". In the code I've pointed to a website with a much more efficient method - so called morten numbers. # Convert the z-ordered value to a hilbert-ordered value. How to do this last step really deserves a big whiteboard session - it's an inherently visual discussion. However, as a clue to what's happening in the code: * There are only 4 shapes that compose the Hilbert curve. In the code I've called them "D,U,n, and C" because of the way they look. These are the states of a state machine. * I convert from z to hilbert 2 bits at a time. * On the first iteration I assume that I'm in the "D" state. In this simplistic case, I convert from 2 z-ordered bits to 2 hilbert-ordered bits based upon a lookup table that goes with the D state. I replace the z-ordered bits with the hilbert-ordered bits. * I then check for which state I should go to next based upon a different lookup table that goes with the D state. It directs me to another state. * I then get the next 2 bits from the byte array and repeat this method until I'm out of bits. I've spot checked the input/output and it looks good (you'll see where I've done this in the code). No tests! Also. This method could be slower than expected because I'm doing all operations on 2 bits at a time. As is, the method in the python code might even be faster because (correct me if I'm wrong) a double multiply can take place in one clock cycle. That said, the methods here can be extended to operate at the processor word size. What's more, I'm working with lookup tables. I suspect that you could use magic numbers and bitmasks and all that jazz and make something that took up very little space or time. Also, I couldn't find them now, but there exist known efficient algorithms for doing this conversion. (I guess this weekend I just felt like making my own.:-/) I've run across algorithms that are even for higher dimension Hilbert Curves > A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes > -- > > Key: LUCENE-4922 > URL: https://issues.apache.org/jira/browse/LUCENE-4922 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley > Labels: gsoc2013, mentor, newdev > Attachments: HilbertConverter.zip > > > My wish-list for an ideal SpatialPrefixTree has these properties: > * Hilbert Curve ordering > * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for > all in-between) > * Compact binary encoding (so-called "Morton number") > * Works for geodetic (i.e. lat & lon) and non-geodetic > Some bonus wishes for use in geospatial: > * Use an equal-area projection such that each cell has an equal area to all > others at the same level. > * When advancing a grid level, if a cell's width is less than half its > height. then divide it as 4 vertically stacked instead of 2 by 2. The point > is to avoid super-skinny cells which occurs towards the poles and degrades > performance. > All of this requires some basic performance benchmarks to measure the effects > of these characteristics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
[ https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13653095#comment-13653095 ] John Berryman commented on LUCENE-4922: --- Hmmm... integer representation huh. Well here's a thought then: As a first got at this idea, let's define something like a geohash where x are interleaved, but here's how we do it. At the top level, number squares from 0 to 3. {noformat} 0 --- 1 | | 2 --- 3 {noformat} At the next level, number thing similarly, {noformat} 00 -- 01 -- 10 -- 11 | | | | 02 -- 03 -- 12 -- 13 | | | | 20 -- 21 -- 30 -- 31 | | | | 22 -- 23 -- 32 -- 33 {noformat} Even though this *looks* like the hilbert thing I did above, notice that this is actually the Z-ordering and it's much easier to compute. In this case, the first two bits encodes which of the four big boxes the point is in, the next two bits encodes which of the four sub boxes the point is in, etc. So for example [0.375, 0.625] would be encoded to a depth of 2 by "03" which can be stored in half a byte. Got it? So... now since we have the original point encoded in z-ordering. We can create a new hilbert_point algorithm that takes a byte array representing the z-ordering encoding of a point rather than a 2-vector of doubles. And the code looks much the same except that instead of the "val[0]/2" we're actually just iterating through the byte array 2 bits at a time (with no backtracking or lookahead). This would make for some exquisitely indecipherable code. But ultimately it might not help that much - it largely depends upon how complex the z-ordering encoding is. > A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes > -- > > Key: LUCENE-4922 > URL: https://issues.apache.org/jira/browse/LUCENE-4922 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley > Labels: gsoc2013, mentor, newdev > > My wish-list for an ideal SpatialPrefixTree has these properties: > * Hilbert Curve ordering > * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for > all in-between) > * Compact binary encoding (so-called "Morton number") > * Works for geodetic (i.e. lat & lon) and non-geodetic > Some bonus wishes for use in geospatial: > * Use an equal-area projection such that each cell has an equal area to all > others at the same level. > * When advancing a grid level, if a cell's width is less than half its > height. then divide it as 4 vertically stacked instead of 2 by 2. The point > is to avoid super-skinny cells which occurs towards the poles and degrades > performance. > All of this requires some basic performance benchmarks to measure the effects > of these characteristics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
[ https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13652934#comment-13652934 ] David Smiley commented on LUCENE-4922: -- Cool John. I suggest that any future code-samples be attached to the issue. BTW another thing about the implementation of this grid I want to explore is the efficacy of working with integers and fixed precision (configured), versus floating point math. > A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes > -- > > Key: LUCENE-4922 > URL: https://issues.apache.org/jira/browse/LUCENE-4922 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley > Labels: gsoc2013, mentor, newdev > > My wish-list for an ideal SpatialPrefixTree has these properties: > * Hilbert Curve ordering > * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for > all in-between) > * Compact binary encoding (so-called "Morton number") > * Works for geodetic (i.e. lat & lon) and non-geodetic > Some bonus wishes for use in geospatial: > * Use an equal-area projection such that each cell has an equal area to all > others at the same level. > * When advancing a grid level, if a cell's width is less than half its > height. then divide it as 4 vertically stacked instead of 2 by 2. The point > is to avoid super-skinny cells which occurs towards the poles and degrades > performance. > All of this requires some basic performance benchmarks to measure the effects > of these characteristics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
[ https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13652785#comment-13652785 ] John Berryman commented on LUCENE-4922: --- # Converts from an x,y coordinate (vals[0],vals[1]) between min and max values to a point on the Hilbert curve in base4 # def hilbert_point(vals,depth,mins=[],maxs=[]): """ cell_names 3 2 0 1 """ if(mins): # then convert vals to some place in 1-by-1 grid at the origin vals[0] = (vals[0]-mins[0])/(maxs[0]-mins[0]) vals[1] = (vals[1]-mins[1])/(maxs[1]-mins[1]) if(depth == 0): return '' if( vals[0]<=0.5 and vals[1]<=0.5 ): cell_name = '0' x = vals[1]*2 y = vals[0]*2 elif( vals[0]>=0.5 and vals[1]<=0.5 ): cell_name = '1' x = (vals[0] - 0.5)*2 y = vals[1]*2 elif( vals[0]>=0.5 and vals[1]>=0.5 ): cell_name = '2' x = (vals[0] - 0.5)*2 y = (vals[1] - 0.5)*2 elif( vals[0]<=0.5 and vals[1]>=0.5 ): cell_name = '3' x = 1 - (vals[1] - 0.5)*2 y = 1 - vals[0]*2 return cell_name + hilbert_point([x,y],depth-1) > A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes > -- > > Key: LUCENE-4922 > URL: https://issues.apache.org/jira/browse/LUCENE-4922 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley > Labels: gsoc2013, mentor, newdev > > My wish-list for an ideal SpatialPrefixTree has these properties: > * Hilbert Curve ordering > * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for > all in-between) > * Compact binary encoding (so-called "Morton number") > * Works for geodetic (i.e. lat & lon) and non-geodetic > Some bonus wishes for use in geospatial: > * Use an equal-area projection such that each cell has an equal area to all > others at the same level. > * When advancing a grid level, if a cell's width is less than half its > height. then divide it as 4 vertically stacked instead of 2 by 2. The point > is to avoid super-skinny cells which occurs towards the poles and degrades > performance. > All of this requires some basic performance benchmarks to measure the effects > of these characteristics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
[ https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639242#comment-13639242 ] Hatim Hakeel commented on LUCENE-4922: -- Hi David, Thanks for the compliments :). I'm eagerly looking forward to the announcement of accepted proposals with fingers crossed. > A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes > -- > > Key: LUCENE-4922 > URL: https://issues.apache.org/jira/browse/LUCENE-4922 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley > Labels: gsoc2013, mentor, newdev > > My wish-list for an ideal SpatialPrefixTree has these properties: > * Hilbert Curve ordering > * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for > all in-between) > * Compact binary encoding (so-called "Morton number") > * Works for geodetic (i.e. lat & lon) and non-geodetic > Some bonus wishes for use in geospatial: > * Use an equal-area projection such that each cell has an equal area to all > others at the same level. > * When advancing a grid level, if a cell's width is less than half its > height. then divide it as 4 vertically stacked instead of 2 by 2. The point > is to avoid super-skinny cells which occurs towards the poles and degrades > performance. > All of this requires some basic performance benchmarks to measure the effects > of these characteristics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
[ https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639152#comment-13639152 ] David Smiley commented on LUCENE-4922: -- Thanks for the proposal Hatim! That's a good proposal. I enjoyed reading some of the research papers you referenced. I'm excited to think that at the end of the summer, there's going to be an awesome optimized PrefixTree; and I'm a tad jealous that a lucky student is going to build it instead of me :-) p.s. I independently developed my own outline of tasks to do, which I'll share with the student that is accepted (which is likely you but theoretically could be another student if there is interest from others) > A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes > -- > > Key: LUCENE-4922 > URL: https://issues.apache.org/jira/browse/LUCENE-4922 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley > Labels: gsoc2013, mentor, newdev > > My wish-list for an ideal SpatialPrefixTree has these properties: > * Hilbert Curve ordering > * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for > all in-between) > * Compact binary encoding (so-called "Morton number") > * Works for geodetic (i.e. lat & lon) and non-geodetic > Some bonus wishes for use in geospatial: > * Use an equal-area projection such that each cell has an equal area to all > others at the same level. > * When advancing a grid level, if a cell's width is less than half its > height. then divide it as 4 vertically stacked instead of 2 by 2. The point > is to avoid super-skinny cells which occurs towards the poles and degrades > performance. > All of this requires some basic performance benchmarks to measure the effects > of these characteristics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
[ https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638856#comment-13638856 ] Hatim Hakeel commented on LUCENE-4922: -- Hi David, Thanks for the feedback. I have drafted a proposal based on your comments and stuff I figured out by digging into the background. https://google-melange.appspot.com/gsoc/proposal/review/google/gsoc2013/hatimhakeel/1 Hope it satisfies the requirements of the project. Thanks, Hatim. > A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes > -- > > Key: LUCENE-4922 > URL: https://issues.apache.org/jira/browse/LUCENE-4922 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley > Labels: gsoc2013, mentor, newdev > > My wish-list for an ideal SpatialPrefixTree has these properties: > * Hilbert Curve ordering > * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for > all in-between) > * Compact binary encoding (so-called "Morton number") > * Works for geodetic (i.e. lat & lon) and non-geodetic > Some bonus wishes for use in geospatial: > * Use an equal-area projection such that each cell has an equal area to all > others at the same level. > * When advancing a grid level, if a cell's width is less than half its > height. then divide it as 4 vertically stacked instead of 2 by 2. The point > is to avoid super-skinny cells which occurs towards the poles and degrades > performance. > All of this requires some basic performance benchmarks to measure the effects > of these characteristics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
[ https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630054#comment-13630054 ] David Smiley commented on LUCENE-4922: -- Thanks for your interest Hatim! Your assessment of just needing to extend SpatialPrefixTree is mostly correct. It also involves implementing Cell (formerly known as Node) which is tightly related. There are lots of potential optimizations here. As part of this task, modifying SpatialPrefixTree & Cell itself is absolutely okay if we find better (more optimized) ways of doing things, though we need to retain backwards compatibility with the two existing SpatialPrefixTree implementations -- geohash & quad. *If* retaining that backwards compatibility turns out to be too constraining on making this SPT be even better, then we could make SPT an interface. But the general API of SPT & Cell should be the same as it is now. What will help in this task is to know that there are extensive tests of the PrefixTree based spatial search strategies, such that if supplying a new SPT causes a failure, then you know you've got a bug ;-). But there are few unit level tests of the SPT itself, and so knowing precisely what is broken could be hard. You are correct that the bonus wishes are purely optional optimizations, time permitting. Very optional; in general I want to see that without these optimizations that the SPT is as good as it can reasonably be. It _may_ turn out that these geo specific optimizations have extra runtime costs (such as calculating the sin() a lot or complexity that make it not worthwhile; I don't know. > A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes > -- > > Key: LUCENE-4922 > URL: https://issues.apache.org/jira/browse/LUCENE-4922 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley > Labels: gsoc2013, mentor, newdev > > My wish-list for an ideal SpatialPrefixTree has these properties: > * Hilbert Curve ordering > * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for > all in-between) > * Compact binary encoding (so-called "Morton number") > * Works for geodetic (i.e. lat & lon) and non-geodetic > Some bonus wishes for use in geospatial: > * Use an equal-area projection such that each cell has an equal area to all > others at the same level. > * When advancing a grid level, if a cell's width is less than half its > height. then divide it as 4 vertically stacked instead of 2 by 2. The point > is to avoid super-skinny cells which occurs towards the poles and degrades > performance. > All of this requires some basic performance benchmarks to measure the effects > of these characteristics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
[ https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629809#comment-13629809 ] Hatim Hakeel commented on LUCENE-4922: -- Hi David, I believe this would require to implement a class that extends the SpatialPrefixTree class. It should support all porperties mentioned. We get it to perform Hilbert space filling curve based spatial data structure traversal in making range queries etc. Apart from performance benchmarks to measure effects of the new feature, are the bonus wishes for use in geospatial to be considered as optimizations given availability of time? Thanks, Hatim. > A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes > -- > > Key: LUCENE-4922 > URL: https://issues.apache.org/jira/browse/LUCENE-4922 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/spatial >Reporter: David Smiley >Assignee: David Smiley > Labels: gsoc2013, mentor, newdev > > My wish-list for an ideal SpatialPrefixTree has these properties: > * Hilbert Curve ordering > * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for > all in-between) > * Compact binary encoding (so-called "Morton number") > * Works for geodetic (i.e. lat & lon) and non-geodetic > Some bonus wishes for use in geospatial: > * Use an equal-area projection such that each cell has an equal area to all > others at the same level. > * When advancing a grid level, if a cell's width is less than half its > height. then divide it as 4 vertically stacked instead of 2 by 2. The point > is to avoid super-skinny cells which occurs towards the poles and degrades > performance. > All of this requires some basic performance benchmarks to measure the effects > of these characteristics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org