[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes

2014-08-21 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106429#comment-14106429
 ] 

David Smiley commented on LUCENE-4922:
--

This is awesome [~varunshenoy]!  It's been a pleasure working with you over the 
summer.  I'll add some more context to those reading along... 

What we have here is a new SPT impl that aims to be fast, and flexibly 
trade-off index size with search speed similar to how Lucene's 
single-dimensional Trie fields let you pick a precisionStep.  We call it "flex" 
for short; the class is FlexPrefixTree2D.  But it's more flexible than simply 
picking the equivalent of a precisionStep as it can be configured to vary at 
each level (step).  As I suspected,  we concluded the fastest results were when 
the highest and lowest levels had many sub-cells, but few in the middle.  At 
its best, targeting a similar precision & index size as geohash, I found Flex 
to do point-radius queries that were twice as fast on my machine. Your mileage 
will certainly vary; Varun's numbers above are on the conservative end.  Note 
that currently the sub-cells must be a power of 4 and limited to 4, 16, or 64 
since it uses a single byte per level, and some bits are reserved for something 
called a leaf.  Adding support for 256 should be easy, so long as you only 
index points since points have no leaf bits.

Another interesting thing about Flex is that it internally uses a square world 
model.  This means each cell is square, which is more desirable for heat-map 
display usage than rectangles produced from Geohash (and Quad when given 
rectangular world bounds).  This also means the top level is only half-used but 
it's of marginal consequence.

Flex internally has it's spatial coordinates expressed in integers on a power 
of 2 grid, which eliminates floating point division in lieu of bit shifting.  
Since on the surface the shapes in & out are floating point, there is a 
conversion step, but it's optimized away in the case of a rectangle, hence the 
performance boost when doing rectangle queries.  Use of integers vs longs 
hinders maximum precision but if it becomes a problem (I doubt it), it could be 
easily changed.  Using an integer space also better lends itself to spatial 
applications that are not floating point (like using points to represent time 
durations).  More work is needed to fully realize that.

As followers here may infer by it's absence, there is no Hilbert Curve ordering 
in Flex yet.  It was lower on the priority list because I suspect whatever 
performance boost comes of it will be substantially less than the variable grid 
size support.

> A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
> --
>
> Key: LUCENE-4922
> URL: https://issues.apache.org/jira/browse/LUCENE-4922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>  Labels: gsoc2014
> Attachments: HilbertConverter.zip, LUCENE-4922.patch
>
>
> My wish-list for an ideal SpatialPrefixTree has these properties:
> * Hilbert Curve ordering
> * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for 
> all in-between)
> * Compact binary encoding (so-called "Morton number")
> * Works for geodetic (i.e. lat & lon) and non-geodetic
> Some bonus wishes for use in geospatial:
> * Use an equal-area projection such that each cell has an equal area to all 
> others at the same level.
> * When advancing a grid level, if a cell's width is less than half its 
> height. then divide it as 4 vertically stacked instead of 2 by 2. The point 
> is to avoid super-skinny cells which occurs towards the poles and degrades 
> performance.
> All of this requires some basic performance benchmarks to measure the effects 
> of these characteristics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes

2014-08-20 Thread Varun V Shenoy (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104962#comment-14104962
 ] 

Varun  V Shenoy commented on LUCENE-4922:
-

As my GSOC project, I have implemented a spatial prefix tree with the above 
mentioned features. This tree, named "FlexPrefixTree" has the following 
features.
- FPT performs 25% faster for circular queries and 36% faster for rectangular 
queries.
- The implementation should be easy to adapt to a future integer based spatial 
application when floating point isn’t desired.
- Very compact index sizes are possible. Upto 14% lessar than GeoHashPrefixTree 
and 60% lessar than QuadPrefixTree implementations.
- Flexible trade-offs between index size and search speed. Variable number of 
sub-cells allows us to achieve very small index sizes.
- FPT internally uses int, to speed up spatial calculations.
- Due to reuse of a FlexCell per level, BytesRef per FPT, Rectangle shape per 
level, FlexPrefixTreeIterator per level and compact index sizes, we could 
achieve upto 48% reduction in average used memory and 38% in average total used 
memory.


> A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
> --
>
> Key: LUCENE-4922
> URL: https://issues.apache.org/jira/browse/LUCENE-4922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>  Labels: gsoc2014
> Attachments: HilbertConverter.zip
>
>
> My wish-list for an ideal SpatialPrefixTree has these properties:
> * Hilbert Curve ordering
> * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for 
> all in-between)
> * Compact binary encoding (so-called "Morton number")
> * Works for geodetic (i.e. lat & lon) and non-geodetic
> Some bonus wishes for use in geospatial:
> * Use an equal-area projection such that each cell has an equal area to all 
> others at the same level.
> * When advancing a grid level, if a cell's width is less than half its 
> height. then divide it as 4 vertically stacked instead of 2 by 2. The point 
> is to avoid super-skinny cells which occurs towards the poles and degrades 
> performance.
> All of this requires some basic performance benchmarks to measure the effects 
> of these characteristics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes

2014-04-22 Thread Varun V Shenoy (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976526#comment-13976526
 ] 

Varun  V Shenoy commented on LUCENE-4922:
-

Hi,
I will be tackling this issue as an accepted GSOC student and below is my 
branch on GitHub.
https://github.com/shenoyvvarun/lucene-solr/tree/4992

> A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
> --
>
> Key: LUCENE-4922
> URL: https://issues.apache.org/jira/browse/LUCENE-4922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>  Labels: gsoc2014
> Attachments: HilbertConverter.zip
>
>
> My wish-list for an ideal SpatialPrefixTree has these properties:
> * Hilbert Curve ordering
> * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for 
> all in-between)
> * Compact binary encoding (so-called "Morton number")
> * Works for geodetic (i.e. lat & lon) and non-geodetic
> Some bonus wishes for use in geospatial:
> * Use an equal-area projection such that each cell has an equal area to all 
> others at the same level.
> * When advancing a grid level, if a cell's width is less than half its 
> height. then divide it as 4 vertically stacked instead of 2 by 2. The point 
> is to avoid super-skinny cells which occurs towards the poles and degrades 
> performance.
> All of this requires some basic performance benchmarks to measure the effects 
> of these characteristics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes

2014-03-09 Thread Varun V Shenoy (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13925320#comment-13925320
 ] 

Varun  V Shenoy commented on LUCENE-4922:
-

Thanks Hatim, I have been going through your proposal, and it helped me a lot. 
I am getting really excited.

> A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
> --
>
> Key: LUCENE-4922
> URL: https://issues.apache.org/jira/browse/LUCENE-4922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>  Labels: gsoc2014
> Attachments: HilbertConverter.zip
>
>
> My wish-list for an ideal SpatialPrefixTree has these properties:
> * Hilbert Curve ordering
> * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for 
> all in-between)
> * Compact binary encoding (so-called "Morton number")
> * Works for geodetic (i.e. lat & lon) and non-geodetic
> Some bonus wishes for use in geospatial:
> * Use an equal-area projection such that each cell has an equal area to all 
> others at the same level.
> * When advancing a grid level, if a cell's width is less than half its 
> height. then divide it as 4 vertically stacked instead of 2 by 2. The point 
> is to avoid super-skinny cells which occurs towards the poles and degrades 
> performance.
> All of this requires some basic performance benchmarks to measure the effects 
> of these characteristics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes

2014-03-03 Thread Hatim Hakeel (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918257#comment-13918257
 ] 

Hatim Hakeel commented on LUCENE-4922:
--

Hi,
I have provided a link to the proposal I prepared last year for the benefit of 
anyone looking for related background material like documentation, papers etc. 
Hope it would be helpful.

http://hatimhakeel.blogspot.com/2014/03/this-is-my-gsoc-2013-project-proposal.html

> A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
> --
>
> Key: LUCENE-4922
> URL: https://issues.apache.org/jira/browse/LUCENE-4922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>  Labels: gsoc2014
> Attachments: HilbertConverter.zip
>
>
> My wish-list for an ideal SpatialPrefixTree has these properties:
> * Hilbert Curve ordering
> * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for 
> all in-between)
> * Compact binary encoding (so-called "Morton number")
> * Works for geodetic (i.e. lat & lon) and non-geodetic
> Some bonus wishes for use in geospatial:
> * Use an equal-area projection such that each cell has an equal area to all 
> others at the same level.
> * When advancing a grid level, if a cell's width is less than half its 
> height. then divide it as 4 vertically stacked instead of 2 by 2. The point 
> is to avoid super-skinny cells which occurs towards the poles and degrades 
> performance.
> All of this requires some basic performance benchmarks to measure the effects 
> of these characteristics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes

2014-03-03 Thread Hatim Hakeel
Hi,
I have provided a link to the proposal I prepared last year for the benefit
of anyone looking for related background material like documentation,
papers etc. Hope it would be helpful.

http://hatimhakeel.blogspot.com/2014/03/this-is-my-gsoc-2013-project-proposal.html



On Mon, Mar 3, 2014 at 12:32 PM, Varun V Shenoy (JIRA) wrote:

>
> [
> https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917781#comment-13917781]
>
> Varun  V Shenoy commented on LUCENE-4922:
> -
>
> I am Varun and I will be really interested to pursue this as my GSOC2014
> project. Any documentation, references, papers etc on this topic will be
> extremely helpful.
>
> > A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
> > --
> >
> > Key: LUCENE-4922
> > URL: https://issues.apache.org/jira/browse/LUCENE-4922
> > Project: Lucene - Core
> >  Issue Type: New Feature
> >  Components: modules/spatial
> >Reporter: David Smiley
> >Assignee: David Smiley
> >  Labels: gsoc2014
> > Attachments: HilbertConverter.zip
> >
> >
> > My wish-list for an ideal SpatialPrefixTree has these properties:
> > * Hilbert Curve ordering
> > * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16
> for all in-between)
> > * Compact binary encoding (so-called "Morton number")
> > * Works for geodetic (i.e. lat & lon) and non-geodetic
> > Some bonus wishes for use in geospatial:
> > * Use an equal-area projection such that each cell has an equal area to
> all others at the same level.
> > * When advancing a grid level, if a cell's width is less than half its
> height. then divide it as 4 vertically stacked instead of 2 by 2. The point
> is to avoid super-skinny cells which occurs towards the poles and degrades
> performance.
> > All of this requires some basic performance benchmarks to measure the
> effects of these characteristics.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.2#6252)
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes

2014-03-02 Thread Varun V Shenoy (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917781#comment-13917781
 ] 

Varun  V Shenoy commented on LUCENE-4922:
-

I am Varun and I will be really interested to pursue this as my GSOC2014 
project. Any documentation, references, papers etc on this topic will be 
extremely helpful.

> A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
> --
>
> Key: LUCENE-4922
> URL: https://issues.apache.org/jira/browse/LUCENE-4922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>  Labels: gsoc2014
> Attachments: HilbertConverter.zip
>
>
> My wish-list for an ideal SpatialPrefixTree has these properties:
> * Hilbert Curve ordering
> * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for 
> all in-between)
> * Compact binary encoding (so-called "Morton number")
> * Works for geodetic (i.e. lat & lon) and non-geodetic
> Some bonus wishes for use in geospatial:
> * Use an equal-area projection such that each cell has an equal area to all 
> others at the same level.
> * When advancing a grid level, if a cell's width is less than half its 
> height. then divide it as 4 vertically stacked instead of 2 by 2. The point 
> is to avoid super-skinny cells which occurs towards the poles and degrades 
> performance.
> All of this requires some basic performance benchmarks to measure the effects 
> of these characteristics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes

2013-05-14 Thread John Berryman (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13657418#comment-13657418
 ] 

John Berryman commented on LUCENE-4922:
---

That's fine. Pull me into the conversations you have with Hatim and I can 
probably help things move along faster.

> A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
> --
>
> Key: LUCENE-4922
> URL: https://issues.apache.org/jira/browse/LUCENE-4922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>  Labels: gsoc2013, mentor, newdev
> Attachments: HilbertConverter.zip
>
>
> My wish-list for an ideal SpatialPrefixTree has these properties:
> * Hilbert Curve ordering
> * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for 
> all in-between)
> * Compact binary encoding (so-called "Morton number")
> * Works for geodetic (i.e. lat & lon) and non-geodetic
> Some bonus wishes for use in geospatial:
> * Use an equal-area projection such that each cell has an equal area to all 
> others at the same level.
> * When advancing a grid level, if a cell's width is less than half its 
> height. then divide it as 4 vertically stacked instead of 2 by 2. The point 
> is to avoid super-skinny cells which occurs towards the poles and degrades 
> performance.
> All of this requires some basic performance benchmarks to measure the effects 
> of these characteristics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes

2013-05-13 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655944#comment-13655944
 ] 

David Smiley commented on LUCENE-4922:
--

Thanks for your interest in this John. You've gotten the ball rolling. I 
confess I don't do any byte or bit manipulation work so it'll take some time to 
fully digest what's going on in this code which uses a lot of it.  More 
comments would have helped. It might be interesting to look at how Lucene Trie 
floating point fields work.  Thanks for the reference to efficient ways to 
interleave bits; I'll re-post here for convenient reference: 
http://www-graphics.stanford.edu/~seander/bithacks.html#InterleaveBMN
 (there are 2 approaches there).  I'm not sure when I'll have significant time 
to contribute more directly other than add to the conversation.

> A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
> --
>
> Key: LUCENE-4922
> URL: https://issues.apache.org/jira/browse/LUCENE-4922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>  Labels: gsoc2013, mentor, newdev
> Attachments: HilbertConverter.zip
>
>
> My wish-list for an ideal SpatialPrefixTree has these properties:
> * Hilbert Curve ordering
> * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for 
> all in-between)
> * Compact binary encoding (so-called "Morton number")
> * Works for geodetic (i.e. lat & lon) and non-geodetic
> Some bonus wishes for use in geospatial:
> * Use an equal-area projection such that each cell has an equal area to all 
> others at the same level.
> * When advancing a grid level, if a cell's width is less than half its 
> height. then divide it as 4 vertically stacked instead of 2 by 2. The point 
> is to avoid super-skinny cells which occurs towards the poles and degrades 
> performance.
> All of this requires some basic performance benchmarks to measure the effects 
> of these characteristics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes

2013-05-12 Thread John Berryman (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655668#comment-13655668
 ] 

John Berryman commented on LUCENE-4922:
---

I just attached code called "HilbertConverter". This is a Java example of 
converting from a set of x,y values (within specified bounds) to a 
HilbertOrdered value.

The strategy is to

# Convert the x,y value so doubles between 0 and 1.
# Convert these doubles to large integers with max value 256^numBytes (user 
specifies numBytes). Note: there's probably a way to do these two last steps 
simultaneously and regain some precision. Note: numBytes must be <= 7 for now.
# Interleave the bits so that you get a z-order value (note, I did this the 
"obvious way". In the code I've pointed to a website with a much more efficient 
method - so called morten numbers.
# Convert the z-ordered value to a hilbert-ordered value.

How to do this last step really deserves a big whiteboard session - it's an 
inherently visual discussion. However, as a clue to what's happening in the 
code:

* There are only 4 shapes that compose the Hilbert curve. In the code I've 
called them "D,U,n, and C" because of the way they look. These are the states 
of a state machine.
* I convert from z to hilbert 2 bits at a time.
* On the first iteration I assume that I'm in the "D" state. In this simplistic 
case, I convert from 2 z-ordered bits to 2 hilbert-ordered bits based upon a 
lookup table that goes with the D state. I replace the z-ordered bits with the 
hilbert-ordered bits.
* I then check for which state I should go to next based upon a different 
lookup table that goes with the D state. It directs me to another state.
* I then get the next 2 bits from the byte array and repeat this method until 
I'm out of bits.

I've spot checked the input/output and it looks good (you'll see where I've 
done this in the code). No tests!

Also. This method could be slower than expected because I'm doing all 
operations on 2 bits at a time. As is, the method in the python code might even 
be faster because (correct me if I'm wrong) a double multiply can take place in 
one clock cycle.

That said, the methods here can be extended to operate at the processor word 
size.

What's more, I'm working with lookup tables. I suspect that you could use magic 
numbers and bitmasks and all that jazz and make something that took up very 
little space or time.

Also, I couldn't find them now, but there exist known efficient algorithms for 
doing this conversion. (I guess this weekend I just felt like making my 
own.:-/) I've run across algorithms that are even for higher dimension Hilbert 
Curves


> A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
> --
>
> Key: LUCENE-4922
> URL: https://issues.apache.org/jira/browse/LUCENE-4922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>  Labels: gsoc2013, mentor, newdev
> Attachments: HilbertConverter.zip
>
>
> My wish-list for an ideal SpatialPrefixTree has these properties:
> * Hilbert Curve ordering
> * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for 
> all in-between)
> * Compact binary encoding (so-called "Morton number")
> * Works for geodetic (i.e. lat & lon) and non-geodetic
> Some bonus wishes for use in geospatial:
> * Use an equal-area projection such that each cell has an equal area to all 
> others at the same level.
> * When advancing a grid level, if a cell's width is less than half its 
> height. then divide it as 4 vertically stacked instead of 2 by 2. The point 
> is to avoid super-skinny cells which occurs towards the poles and degrades 
> performance.
> All of this requires some basic performance benchmarks to measure the effects 
> of these characteristics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes

2013-05-09 Thread John Berryman (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13653095#comment-13653095
 ] 

John Berryman commented on LUCENE-4922:
---

Hmmm... integer representation huh. Well here's a thought then:

As a first got at this idea, let's define something like a geohash where x are 
interleaved, but here's how we do it. At the top level, number squares from 0 
to 3.

{noformat}
   0 --- 1
   | |
   2 --- 3
{noformat}

At the next level, number thing similarly, 

{noformat}
00 -- 01 -- 10 -- 11
| | | |
02 -- 03 -- 12 -- 13
| | | |
20 -- 21 -- 30 -- 31
| | | |
22 -- 23 -- 32 -- 33
{noformat}

Even though this *looks* like the hilbert thing I did above, notice that this 
is actually the Z-ordering and it's much easier to compute.

In this case, the first two bits encodes which of the four big boxes the point 
is in, the next two bits encodes which of the four sub boxes the point is in, 
etc. So for example [0.375, 0.625] would be encoded to a depth of 2 by "03" 
which can be stored in half a byte.

Got it? So... now since we have the original point encoded in z-ordering. We 
can create a new hilbert_point algorithm that takes a byte array representing 
the z-ordering encoding of a point rather than a 2-vector of doubles. And the 
code looks much the same except that instead of the "val[0]/2" we're actually 
just iterating through the byte array 2 bits at a time (with no backtracking or 
lookahead).

This would make for some exquisitely indecipherable code. But ultimately it 
might not help that much - it largely depends upon how complex the z-ordering 
encoding is.

> A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
> --
>
> Key: LUCENE-4922
> URL: https://issues.apache.org/jira/browse/LUCENE-4922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>  Labels: gsoc2013, mentor, newdev
>
> My wish-list for an ideal SpatialPrefixTree has these properties:
> * Hilbert Curve ordering
> * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for 
> all in-between)
> * Compact binary encoding (so-called "Morton number")
> * Works for geodetic (i.e. lat & lon) and non-geodetic
> Some bonus wishes for use in geospatial:
> * Use an equal-area projection such that each cell has an equal area to all 
> others at the same level.
> * When advancing a grid level, if a cell's width is less than half its 
> height. then divide it as 4 vertically stacked instead of 2 by 2. The point 
> is to avoid super-skinny cells which occurs towards the poles and degrades 
> performance.
> All of this requires some basic performance benchmarks to measure the effects 
> of these characteristics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes

2013-05-09 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13652934#comment-13652934
 ] 

David Smiley commented on LUCENE-4922:
--

Cool John.  I suggest that any future code-samples be attached to the issue.

BTW another thing about the implementation of this grid I want to explore is 
the efficacy of working with integers and fixed precision (configured), versus 
floating point math.

> A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
> --
>
> Key: LUCENE-4922
> URL: https://issues.apache.org/jira/browse/LUCENE-4922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>  Labels: gsoc2013, mentor, newdev
>
> My wish-list for an ideal SpatialPrefixTree has these properties:
> * Hilbert Curve ordering
> * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for 
> all in-between)
> * Compact binary encoding (so-called "Morton number")
> * Works for geodetic (i.e. lat & lon) and non-geodetic
> Some bonus wishes for use in geospatial:
> * Use an equal-area projection such that each cell has an equal area to all 
> others at the same level.
> * When advancing a grid level, if a cell's width is less than half its 
> height. then divide it as 4 vertically stacked instead of 2 by 2. The point 
> is to avoid super-skinny cells which occurs towards the poles and degrades 
> performance.
> All of this requires some basic performance benchmarks to measure the effects 
> of these characteristics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes

2013-05-08 Thread John Berryman (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13652785#comment-13652785
 ] 

John Berryman commented on LUCENE-4922:
---

# Converts from an x,y coordinate (vals[0],vals[1]) between min and max values 
to a point on the Hilbert curve in base4
#
def hilbert_point(vals,depth,mins=[],maxs=[]):
"""
cell_names

3 2
0 1

"""
if(mins):
# then convert vals to some place in 1-by-1 grid at the origin
vals[0] = (vals[0]-mins[0])/(maxs[0]-mins[0])
vals[1] = (vals[1]-mins[1])/(maxs[1]-mins[1])

if(depth == 0):
return ''

if( vals[0]<=0.5 and vals[1]<=0.5 ):
cell_name = '0'
x = vals[1]*2
y = vals[0]*2
elif( vals[0]>=0.5 and vals[1]<=0.5 ):
cell_name = '1'
x = (vals[0] - 0.5)*2
y = vals[1]*2
elif( vals[0]>=0.5 and vals[1]>=0.5 ):
cell_name = '2'
x = (vals[0] - 0.5)*2
y = (vals[1] - 0.5)*2
elif( vals[0]<=0.5 and vals[1]>=0.5 ):
cell_name = '3'
x = 1 - (vals[1] - 0.5)*2
y = 1 - vals[0]*2

return cell_name + hilbert_point([x,y],depth-1)

> A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
> --
>
> Key: LUCENE-4922
> URL: https://issues.apache.org/jira/browse/LUCENE-4922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>  Labels: gsoc2013, mentor, newdev
>
> My wish-list for an ideal SpatialPrefixTree has these properties:
> * Hilbert Curve ordering
> * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for 
> all in-between)
> * Compact binary encoding (so-called "Morton number")
> * Works for geodetic (i.e. lat & lon) and non-geodetic
> Some bonus wishes for use in geospatial:
> * Use an equal-area projection such that each cell has an equal area to all 
> others at the same level.
> * When advancing a grid level, if a cell's width is less than half its 
> height. then divide it as 4 vertically stacked instead of 2 by 2. The point 
> is to avoid super-skinny cells which occurs towards the poles and degrades 
> performance.
> All of this requires some basic performance benchmarks to measure the effects 
> of these characteristics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes

2013-04-23 Thread Hatim Hakeel (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639242#comment-13639242
 ] 

Hatim Hakeel commented on LUCENE-4922:
--

Hi David,

Thanks for the compliments :). I'm eagerly looking forward to the announcement 
of accepted proposals with fingers crossed.



> A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
> --
>
> Key: LUCENE-4922
> URL: https://issues.apache.org/jira/browse/LUCENE-4922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>  Labels: gsoc2013, mentor, newdev
>
> My wish-list for an ideal SpatialPrefixTree has these properties:
> * Hilbert Curve ordering
> * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for 
> all in-between)
> * Compact binary encoding (so-called "Morton number")
> * Works for geodetic (i.e. lat & lon) and non-geodetic
> Some bonus wishes for use in geospatial:
> * Use an equal-area projection such that each cell has an equal area to all 
> others at the same level.
> * When advancing a grid level, if a cell's width is less than half its 
> height. then divide it as 4 vertically stacked instead of 2 by 2. The point 
> is to avoid super-skinny cells which occurs towards the poles and degrades 
> performance.
> All of this requires some basic performance benchmarks to measure the effects 
> of these characteristics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes

2013-04-23 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639152#comment-13639152
 ] 

David Smiley commented on LUCENE-4922:
--

Thanks for the proposal Hatim!  That's a good proposal.  I enjoyed reading some 
of the research papers you referenced.  I'm excited to think that at the end of 
the summer, there's going to be an awesome optimized PrefixTree; and I'm a tad 
jealous that a lucky student is going to build it instead of me :-) 

p.s. I independently developed my own outline of tasks to do, which I'll share 
with the student that is accepted (which is likely you but theoretically could 
be another student if there is interest from others)

> A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
> --
>
> Key: LUCENE-4922
> URL: https://issues.apache.org/jira/browse/LUCENE-4922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>  Labels: gsoc2013, mentor, newdev
>
> My wish-list for an ideal SpatialPrefixTree has these properties:
> * Hilbert Curve ordering
> * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for 
> all in-between)
> * Compact binary encoding (so-called "Morton number")
> * Works for geodetic (i.e. lat & lon) and non-geodetic
> Some bonus wishes for use in geospatial:
> * Use an equal-area projection such that each cell has an equal area to all 
> others at the same level.
> * When advancing a grid level, if a cell's width is less than half its 
> height. then divide it as 4 vertically stacked instead of 2 by 2. The point 
> is to avoid super-skinny cells which occurs towards the poles and degrades 
> performance.
> All of this requires some basic performance benchmarks to measure the effects 
> of these characteristics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes

2013-04-23 Thread Hatim Hakeel (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638856#comment-13638856
 ] 

Hatim Hakeel commented on LUCENE-4922:
--

Hi David,

Thanks for the feedback. I have drafted a proposal based on your comments and 
stuff I figured out by digging into the background.

https://google-melange.appspot.com/gsoc/proposal/review/google/gsoc2013/hatimhakeel/1

Hope it satisfies the requirements of the project. 

Thanks,
Hatim.

> A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
> --
>
> Key: LUCENE-4922
> URL: https://issues.apache.org/jira/browse/LUCENE-4922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>  Labels: gsoc2013, mentor, newdev
>
> My wish-list for an ideal SpatialPrefixTree has these properties:
> * Hilbert Curve ordering
> * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for 
> all in-between)
> * Compact binary encoding (so-called "Morton number")
> * Works for geodetic (i.e. lat & lon) and non-geodetic
> Some bonus wishes for use in geospatial:
> * Use an equal-area projection such that each cell has an equal area to all 
> others at the same level.
> * When advancing a grid level, if a cell's width is less than half its 
> height. then divide it as 4 vertically stacked instead of 2 by 2. The point 
> is to avoid super-skinny cells which occurs towards the poles and degrades 
> performance.
> All of this requires some basic performance benchmarks to measure the effects 
> of these characteristics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes

2013-04-12 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630054#comment-13630054
 ] 

David Smiley commented on LUCENE-4922:
--

Thanks for your interest Hatim!

Your assessment of just needing to extend SpatialPrefixTree is mostly correct.  
It also involves implementing Cell (formerly known as Node) which is tightly 
related.  There are lots of potential optimizations here.  As part of this 
task, modifying SpatialPrefixTree & Cell itself is absolutely okay if we find 
better (more optimized) ways of doing things, though we need to retain 
backwards compatibility with the two existing SpatialPrefixTree implementations 
-- geohash & quad.  *If* retaining that backwards compatibility turns out to be 
too constraining on making this SPT be even better, then we could make SPT an 
interface.  But the general API of SPT & Cell should be the same as it is now.  

What will help in this task is to know that there are extensive tests of the 
PrefixTree based spatial search strategies, such that if supplying a new SPT 
causes a failure, then you know you've got a bug ;-).  But there are few unit 
level tests of the SPT itself, and so knowing precisely what is broken could be 
hard.

You are correct that the bonus wishes are purely optional optimizations, time 
permitting.  Very optional; in general I want to see that without these 
optimizations that the SPT is as good as it can reasonably be.  It _may_ turn 
out that these geo specific optimizations have extra runtime costs (such as 
calculating the sin() a lot or complexity that make it not worthwhile; I don't 
know.

> A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
> --
>
> Key: LUCENE-4922
> URL: https://issues.apache.org/jira/browse/LUCENE-4922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>  Labels: gsoc2013, mentor, newdev
>
> My wish-list for an ideal SpatialPrefixTree has these properties:
> * Hilbert Curve ordering
> * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for 
> all in-between)
> * Compact binary encoding (so-called "Morton number")
> * Works for geodetic (i.e. lat & lon) and non-geodetic
> Some bonus wishes for use in geospatial:
> * Use an equal-area projection such that each cell has an equal area to all 
> others at the same level.
> * When advancing a grid level, if a cell's width is less than half its 
> height. then divide it as 4 vertically stacked instead of 2 by 2. The point 
> is to avoid super-skinny cells which occurs towards the poles and degrades 
> performance.
> All of this requires some basic performance benchmarks to measure the effects 
> of these characteristics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes

2013-04-11 Thread Hatim Hakeel (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629809#comment-13629809
 ] 

Hatim Hakeel commented on LUCENE-4922:
--

Hi David,

I believe this would require to implement a class that extends the 
SpatialPrefixTree class. It should support all porperties mentioned. We get it 
to perform Hilbert space filling curve based spatial data structure traversal 
in making range queries etc.
Apart from performance benchmarks to measure effects of the new feature, are 
the bonus wishes for use in geospatial to be considered as optimizations given 
availability of time?

Thanks,
Hatim.

> A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
> --
>
> Key: LUCENE-4922
> URL: https://issues.apache.org/jira/browse/LUCENE-4922
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/spatial
>Reporter: David Smiley
>Assignee: David Smiley
>  Labels: gsoc2013, mentor, newdev
>
> My wish-list for an ideal SpatialPrefixTree has these properties:
> * Hilbert Curve ordering
> * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for 
> all in-between)
> * Compact binary encoding (so-called "Morton number")
> * Works for geodetic (i.e. lat & lon) and non-geodetic
> Some bonus wishes for use in geospatial:
> * Use an equal-area projection such that each cell has an equal area to all 
> others at the same level.
> * When advancing a grid level, if a cell's width is less than half its 
> height. then divide it as 4 vertically stacked instead of 2 by 2. The point 
> is to avoid super-skinny cells which occurs towards the poles and degrades 
> performance.
> All of this requires some basic performance benchmarks to measure the effects 
> of these characteristics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org