[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-13 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541840#comment-14541840
 ] 

Uwe Schindler commented on LUCENE-6450:
---

bq. As a side note, I'm finishing up a patch that uses precision_step for 
indexing the longs at variable resolution to take advantage of the postings 
list and not visit every term

Cool! :-)

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-13 Thread Nicholas Knize (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541831#comment-14541831
 ] 

Nicholas Knize commented on LUCENE-6450:


bq. does lucene efficiently support field types of that length?

Yes. This patch (and PackedQuadTree) uses longs for encoding 2d points. I went 
ahead and opened a separate issue [LUCENE-6480 |  
https://issues.apache.org/jira/browse/LUCENE-6480] for investigating the 3d 
case so we can carry the discussion over there.  The goal for this field is to 
provide a framework for search so all we have to worry about is trying out 
different encoding techniques.

As a side note, I'm finishing up a patch that uses precision_step for indexing 
the longs at variable resolution to take advantage of the postings list and not 
visit every term. The index will be slightly bigger but it should provide the 
foundation for faster search on large polygons and bounding boxes. I'll add 
mercator projection after to reduce precision error over large search regions 
and then switch to geo3d and benchmark.

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-13 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541948#comment-14541948
 ] 

David Smiley commented on LUCENE-6450:
--

bq. As a side note, I'm finishing up a patch that uses precision_step for 
indexing the longs at variable resolution to take advantage of the postings 
list and not visit every term. The index will be slightly bigger but it should 
provide the foundation for faster search on large polygons and bounding boxes.

If I'm not mistaken, the term auto-prefixing that Mike worked on means we need 
not do that here, especially just for point data; no?

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-13 Thread Nicholas Knize (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541960#comment-14541960
 ] 

Nicholas Knize commented on LUCENE-6450:


yes yes!  That's the idea anyway.  I've tinkered with this a bit already.  It 
took the same amount of time to build the Automaton as it did the ranges (no 
surprises since it used the same logic) but queries were on the order of 10x 
slower (0.8sec/query on 60M points).  Thinking maybe there's some optimization 
to the automaton that needs to be done?  I figured first make progress here and 
post a separate issue for the automaton WIP.

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-13 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542429#comment-14542429
 ] 

Uwe Schindler commented on LUCENE-6450:
---

Looks OK regarding my comments about subclassing.

One thing: could you make the fields final in the query? Query should be 
immutable, so the min/maxLat/Lon soubles and polygon array are unmodifiable.

I will have a closer look later, I just skimmed through the patch.

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, 
 LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-13 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541578#comment-14541578
 ] 

Karl Wright commented on LUCENE-6450:
-

I have some ideas for a geohash given (x,y,z) values that may turn out to be of 
interest.  This geohash would have acceptable precision (a few meters) when 
packed in a long (64 bits).  Question: does lucene efficiently support field 
types of that length?


 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-12 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539533#comment-14539533
 ] 

Karl Wright commented on LUCENE-6450:
-

Thinking about this further, I think that the simplicity of the GeoPointField 
solution is based on two things:
(1) A good binary packing of a geohash value of a fixed depth
(2) The ability to reconstruct the lat/lon from the geohash value directly

For a Geo3d equivalent, nobody to date has come up with a decent (x,y,z) 
geohash with the right locality properties.  I'm not certain how important 
those locality properties actually *are* for GeoPointField, though.  If they 
are important, then effectively we'd need to pack the *same* lat/lon based 
geohash value that GeoPointField uses, but have a lightning fast way of 
converting it to (x,y,z).  A lookup table would suffice but would be huge at 
9mm resolution. :-)  Adjusting the resolution would be a potential solution.  
If locality is *not* needed, then really a geohash would be any workable 
fixed-resolution binary encoding of the (x,y,z) values of any given lat/lon.

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-11 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538052#comment-14538052
 ] 

Karl Wright commented on LUCENE-6450:
-

A word of caution.  For polygons, most cartographers would not consider the 
following to accurately represent a polygon on a sphere:

{code}
+public static boolean pointInPolygon(double[] x, double[] y, double lat, 
double lon) {
+assert x.length == y.length;
+boolean inPoly = false;
+
+for (int i = 1; i  x.length; i++) {
+if (x[i]  lon  x[i-1] = lon || x[i-1]  lon  x[i] = lon) {
+if (y[i] + (lon - x[i]) / (x[i-1] - x[i]) * (y[i-1] - y[i])  
lat) {
+inPoly = !inPoly;
+}
+}
+}
+return inPoly;
+}
{code}

This is a cartesian approximation only -- polygon edges would appear to be 
curved on a globe.  This will limit usage to applications for which cartesian 
approximations are acceptable.



 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-11 Thread Nicholas Knize (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538465#comment-14538465
 ] 

Nicholas Knize commented on LUCENE-6450:


bq. It's only loosely coupled to the spatial module right now.

I've seen the code. Its well designed.  My point is that its in the spatial 
module right now. It would need to be taken out of the spatial module for it to 
be usable in core and by this patch. Sounds like a separate issue for lucene 
committer buy in.

bq. Most GIS code can readily map a point on an earth model to a point on a 
sphere.

I'm not arguing with you. My point refers to the, maybe 2%, class of lucene geo 
users you mentioned. Those same Cartographers that would counter-argue that 
geo3d is not ISO 19107 compliant so sphere accuracy is not good enough for them 
(enjoy those passionate discussions at FOSS4G). Don't misunderstand me - I'm 
not calling your baby ugly. There's great stuff there and I'm all for 
investigating using it to improve the simple approach in this patch.  In the 
meantime, since it sounds like accuracy concerns for the large poly use case 
are for real?, it might be easier just to add the few lines of mercator 
re-projection code to the post-filter and investigate using geo3d in a phase 2 
performance improvement (if it becomes available to core).

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-11 Thread Nicholas Knize (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538321#comment-14538321
 ] 

Nicholas Knize commented on LUCENE-6450:


Yes. Geo3d would be a nice fit here for using planar geometry to better 
approximate the earth's surface to a sphere. That would require it be decoupled 
from the spatial module and committed stand alone to a core.geometry package. 
(maybe not a bad idea?)

Though it does diverge from the original intent of this patch. Which is provide 
core with a simple lightweight/scalable geo field and API that applies to the 
95% use case not requiring cartographic precision.  The advanced GIS use case 
can use the spatial module, but in that case I'd point out that most 
cartographers/photogrammetrists would still consider geo3d inaccurate since the 
sphere is not an accurate representation of the earth's surface. There would 
need to be support for reprojection using appropriate EPSG/CRS datum (something 
best fit to remain in the spatial module, not core).  For that I'm exploring 
the use of SIS (something I'll sandbox for collaboration).

In the meantime, if there's concern about this patch struggling with accuracy 
on large polygons, a relatively straightforward and fast approach should be to 
overestimate the bbox, reproject intermediate results to a cylindrical 
projection, and use the same simple cartesian based PIP.  Any enhancement that 
saves the complex computation to smaller result sets is a win.

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-11 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538584#comment-14538584
 ] 

Karl Wright commented on LUCENE-6450:
-

I agree that, to consider it seriously, geo3d would have to be moved to core.  
So somebody would need to understand the benefit.

If that happens, though, I could foresee an almost-completely-parallel 
implementation of the GeoPointField type, let's call it Geo3DPointField, which 
would encode X,Y,Z instead of lat and lon.  Almost all the same ideas and 
architecture would work, AFAICT.  The cons are that the index would be bigger 
(3 floating values instead of two, etc.), and slower to build (more math).  The 
search, on the other hand, would be not much slower I believe.

As for the politics of ISO 19107, let's remember that FOSS4G exists to 
implement standards.  I'm interested, though, in the search problem, as are 
you.  And there's a world of difference between the cartesian view of things 
and a 3D one.  I think there's room for both.




 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-11 Thread Nicholas Knize (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538618#comment-14538618
 ] 

Nicholas Knize commented on LUCENE-6450:


bq.  I could foresee an almost-completely-parallel implementation of the 
GeoPointField type, let's call it Geo3DPointField, which would encode X,Y,Z 
instead of lat and lon.

+++1  This was the intent of the PackedQuadTree patch for lucene spatial and 
can certainly be added to core.  Add a 3d morton encoding to GeoUtils using 3 
bits.  Thus naive morton Hypercube ordering w/ 0/1 representing Left/Right || 
Bottom/Top || Front/Hind at level 1 gives 000:111.  This is where locality 
really matters though, so phase 2 would extend this into something more 
organized, e.g., Hilbert ordering at level 1 is: 010, 011, 111, 110, 100, 101, 
001, 000.  There are numerous issues with this; ordering becomes non-trivial, 
encoded values do not fit nicely in Longs, term size becomes larger.

But I agree, I don't believe search will be much slower. The first phase here 
is to investigate improving this patch w/ Auto-prefix.  I think if we can get 
that working and leave prefix matching to the automaton then the hardest part 
becomes optimizing the encoding?  Maybe I'll open an issue around this for 
separate discussion so as to not dilute this issues comments.

On topic here; I propose we sandbox geo3d along w/ these field types.  Get 
something incubated before proposing the addition to core?

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-11 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538646#comment-14538646
 ] 

Karl Wright commented on LUCENE-6450:
-

bq. Maybe I'll open an issue around this for separate discussion so as to not 
dilute this issues comments.

+1

bq.  I propose we sandbox geo3d along w/ these field types. Get something 
incubated before proposing the addition to core?

+1 from me.  I'll be happy to provide Geo3D guidance if you need it; I'll be 
less helpful with encoding help.


 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-11 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538373#comment-14538373
 ] 

Karl Wright commented on LUCENE-6450:
-

bq. That would require it be decoupled from the spatial module and committed 
stand alone to a core.geometry package. (maybe not a bad idea?)

It's only loosely coupled to the spatial module right now.  There's a single 
class integrating it with spatial4j (not counting the random tests).

bq. but in that case I'd point out that most cartographers/photogrammetrists 
would still consider geo3d inaccurate since the sphere is not an accurate 
representation of the earth's surface.

Most GIS code can readily map a point on an earth model to a point on a sphere. 
 And even if you don't do that, and just use the sphere, your accuracy for 
whether a given lat/lon is within a given shape is a few meters at most.  
Whereas the error for a cartesian projection can be thousands of kilometers.  
Given that geo3d is also designed to be very fast, and equally compatible with 
geohash construction, I would think you might be interested in integrating your 
simple geopoint proposal with it.



 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538092#comment-14538092
 ] 

Michael McCandless commented on LUCENE-6450:


bq. This will limit usage to applications for which cartesian approximations 
are acceptable.

But such users that care about the earth curvature of this approximation can 
use geo3d right?

Also, isn't the curvature less substantial the smaller the perimeter of the 
polygon?

E.g. http://apartments.com makes a big deal about doing polygon searching for 
your apartment, but such users are likely to draw miniscule (relative to 
earth's curvature) polygons, on already-projected maps, so the curvature 
wouldn't matter for these users?

I'm very much a spatial newbie so I could be completely wrong :)

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-11 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538099#comment-14538099
 ] 

Karl Wright commented on LUCENE-6450:
-

If you are talking about tiny polygons, not near a pole, then you are probably 
fine.  Earlier comments in this ticket, however, mention navigation and 
scientific usage.  That's probably not reasonable.


 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527392#comment-14527392
 ] 

Michael McCandless commented on LUCENE-6450:


This new approach is nice!

I don't fully understand all the geo math, but I think I get the gist:
you recursively approximate the target shape using smaller and smaller
ranges from the morton encoding, and then record when that z-shape is
fully within the query and avoid the post-filtering for those ranges.

This visits fewer terms than the original patch, which did just a
single range that can (w/ the right 'adversary') visit a great many
false terms.

It's impressive how fast this is, without using any NumericField
prefix terms.  I think we can explore that later and we should commit
this approach now ..

Maybe add a test case w/ more data, e.g. randomized test?  It could
index a bunch of random points, and then run random rects/shapes and
do the dumb slow check every single doc check and confirm query hits
agree.


 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-04 Thread Nicholas Knize (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527410#comment-14527410
 ] 

Nicholas Knize commented on LUCENE-6450:


That's right.  The old patch was a naive scan the world approach.  Really 
unusable at scale.  As said this one approximates the bounding box as the set 
of ranges on the space filling curve. I think [~dsmiley] had also suggested 
random testing, which is definitely necessary. I'll add some randomized testing 
and post a new patch.

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527489#comment-14527489
 ] 

Uwe Schindler commented on LUCENE-6450:
---

Hi,

I will look into this tomorrow (it is too late) now... This looks like it has a 
completely separate TermsEnum and query impl. Why not extend MultiTermQuery 
directly and let NRQ live on its own?

Uwe

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527496#comment-14527496
 ] 

Uwe Schindler commented on LUCENE-6450:
---

bq. It's impressive how fast this is, without using any NumericField
prefix terms. I think we can explore that later and we should commit
this approach now ..

We should first compare how this behaves on *large* bboxes, so a random test / 
perf test spanning large parts of world and large indexes with maaany 
points would be good (whole atlantic, whole africa,...). It is also mentioned 
that it does not allow to cross date line, which is easy to do by splitting 
into 2 queries, one left of date line, one right. I can help with that. Then we 
should also test perf with queries spanning whole pacific :-)

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-04 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527300#comment-14527300
 ] 

David Smiley commented on LUCENE-6450:
--

Nice code Nick!  LGTM.

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-04 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527318#comment-14527318
 ] 

David Smiley commented on LUCENE-6450:
--

Just curious; how did that Python RTree benchmark compare? 
https://code.google.com/a/apache-extras.org/p/luceneutil/source/browse/src/python/SearchOSM.py?spec=svn188e330ea8c34a9720cbf0414d2ed19f6a843a3dr=188e330ea8c34a9720cbf0414d2ed19f6a843a3d#1

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527338#comment-14527338
 ] 

Michael McCandless commented on LUCENE-6450:


Here's the OSM subset I'm using for the benchmarks:
http://people.apache.org/~mikemccand/latlon.subsetPlusAllLondon.txt.lzma

It's a random 1/50th of the latest OSM export (as of last week), but
includes all points within London, UK.

The search benchmark then runs a fixed set (225 total) of axis-aligned
rectangle intersects queries around London.

Look for Index/SearchOSM/GeoPoint.java/py in luceneutil...

I ran the same benchmarks (except for Packed/QuadPrefixTree):

*Geopoint*

  Index time: 157.3 sec (incl. forceMerge)
  Index size: 1.8 GB
  Mean query time: .077 sec
  221,119,062 total hits

*GeoHashPrefixTree*

  Index time: 628.5 sec (incl. forceMerge)
  Index size: 4.2 GB
  Mean query time: .039 sec
  221,120,027 total hits

*libspatialindex* (using Python Rtree wrapper)

  Index time: 469.6 sec
  Index size: 2.6 GB
  Mean query time: .158 sec
  221,118,844 total hits

The first geopoint patch here got exactly the same total hit count as
libspatialindex, but now it's different, I think because of the
precision control to control how deep the ranges recurse.  I think
it's also expected geohash won't get the same hit count since it's
doing a bit of quantizing (level 11 ... not sure what that equates to
in meters).

I'm surprised the Rtree impl is so slow ...


 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-04 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527148#comment-14527148
 ] 

David Smiley commented on LUCENE-6450:
--

Can you please direct me to the luceneutil and geo benchmark?  I'm curious 
what that's about.

The numbers look nice.  Small indexes and fast index time :-)  It'd be 
interesting to try GeoHashPrefixTree, which will have smaller indexes than 
Quad.  I'll check out your code shortly.

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-05-04 Thread Nicholas Knize (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527159#comment-14527159
 ] 

Nicholas Knize commented on LUCENE-6450:


Updated w/ GeoHashPrefixTree benchmarks

reference to luceneutil: 
https://code.google.com/a/apache-extras.org/p/luceneutil/

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-04-24 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510688#comment-14510688
 ] 

Michael McCandless commented on LUCENE-6450:


This patch looks wonderful [~nknize]!

Hmm in this code:

{noformat}
+@Override
+protected final AcceptStatus accept(BytesRef term) {
+  AcceptStatus status = super.accept(term);
+  if (status != AcceptStatus.YES) {
+return status;
+  }
{noformat}

... do you also need to more carefully handle the
AcceptStatus.YES_AND_SEEK case?  Oh I see: NumericRangeTermsEnum never
returns this ... maybe add an assert status != AcceptStatus.YES_AND_SEEK?

Can we just expose simple ctors for these queries instead of the
static factory methods?

For GeoPolygonQuery, why do we have public factory method that takes
the bbox?  Shouldn't this be private (bbox is computed from the
polygon's points)?  Or is this for expert usage or something?

In GeoPolygonTermsEnum, the comment // final-filter by bounding box
should really be // second-pass filter by bounding box right?  The
final filter is the polygon check...

That GeoUtils.pointInPolygon method is magic to me :)  I had no idea
it was so simple to check if a point is inside a polygon.  Is there a
requirement that these poly points are clockwise or counter-clockwise
order or something?


 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery, GeoPolygonQuery, and GeoDistanceQuery classes to the 
 .search package. This field is intended as a straightforward lightweight type 
 for the most basic geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-04-24 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510717#comment-14510717
 ] 

Michael McCandless commented on LUCENE-6450:


Hmm: another question about overriding the accept() method for 
NumericRangeTermsEnum: it looks like you extract the full precision lat/lon 
from the incoming term, yet this term could be a shifted term right (lost some 
of its lower bits)?  Is the morton encoding ok with that lost precision?  I'm 
a little confused how it can work properly ... it seems like it needs to see 
the full precision terms under a shifted range...

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery, GeoPolygonQuery, and GeoDistanceQuery classes to the 
 .search package. This field is intended as a straightforward lightweight type 
 for the most basic geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-04-24 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510757#comment-14510757
 ] 

Uwe Schindler commented on LUCENE-6450:
---

Hi,
looks nice! I have not yet looked fully into it, but I am fine. I have the same 
question regarding shifted terms in the terms enum like Mike.

One small API issue: It is fine to subclass NRQ, but instead of making the 
constructor now protected, it should be package private (without access 
modifier).

The static ctors are needed in NRQ to be able to handle the different data 
types. For the geo queries which just have one way to instantiate them, a 
simple public ctor is ok, no static factory needed.

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery, GeoPolygonQuery, and GeoDistanceQuery classes to the 
 .search package. This field is intended as a straightforward lightweight type 
 for the most basic geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-04-24 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510761#comment-14510761
 ] 

Uwe Schindler commented on LUCENE-6450:
---

bq. mm: another question about overriding the accept() method for 
NumericRangeTermsEnum: it looks like you extract the full precision lat/lon 
from the incoming term, yet this term could be a shifted term right (lost some 
of its lower bits)? 

I think I figured out: the Geo queries always use precStep=64, so shifted terms 
will never appear. I just have the question (why do this?). The whole sense of 
NumericRangeQuery is to use the lower-precision terms to not visit too many 
terms... Could you, Nicholas, please explain, why we dont want to use the 
prefix terms? I agree it makes it more complicated, but I think the current 
encoding could use it. And the terms are not wasteful! :-)

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery, GeoPolygonQuery, and GeoDistanceQuery classes to the 
 .search package. This field is intended as a straightforward lightweight type 
 for the most basic geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-04-24 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511048#comment-14511048
 ] 

Michael McCandless commented on LUCENE-6450:


bq. I think I figured out: the Geo queries always use precStep=64, so shifted 
terms will never appear. 

Ahhh, ok, that makes sense.

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery, GeoPolygonQuery, and GeoDistanceQuery classes to the 
 .search package. This field is intended as a straightforward lightweight type 
 for the most basic geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-04-24 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510980#comment-14510980
 ] 

Robert Muir commented on LUCENE-6450:
-

I'm curious about the precisionstep as well. The extra terms should give range 
queries a huge speedup if we can use them.  

Since there is a custom encoding, i do wonder if we should add sortednumeric dv 
field as well. Sorting and faceting could then work easily etc. We could 
provide valuesource thingies, add support for this to expressions/, facet/, and 
so on.

In general, that is what i like overall about the patch, its simple and 
contained solves the 99% majority use case of spatial. And we could expose this 
everywhere (like adding syntax to queryparser and you name it: please no 
geo-geek syntax, same approach, 99% case for the masses).


 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery, GeoPolygonQuery, and GeoDistanceQuery classes to the 
 .search package. This field is intended as a straightforward lightweight type 
 for the most basic geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-04-24 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511247#comment-14511247
 ] 

Yonik Seeley commented on LUCENE-6450:
--

Great stuff!  Should this be used as the underlying implementation for Solr's 
LatLonType (which currently does not have multi-valued support)?  Any downsides 
for the single-valued case?

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-04-24 Thread Nicholas Knize (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511388#comment-14511388
 ] 

Nicholas Knize commented on LUCENE-6450:


It certainly could be used as the implementation for the LatLonType.  Might be 
worthwhile exploring as a separate issue?

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-04-24 Thread Nicholas Knize (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511182#comment-14511182
 ] 

Nicholas Knize commented on LUCENE-6450:


bq. ds. particularly once a point-radius (circle) query is added. Did you 
forget or are you planning to add that in the future? The other super-common 
use-case is distance sorting...

I'm working on adding both point-radius query and distance sorting. I wanted to 
get the first version out for initial feedback. Seemed to work out nicely with 
all of the great suggestions so far.

bq. uwe: so shifted terms will never appear. I just have the question (why do 
this?).

Space-filling curves are highly sensitive to precision - especially the morton 
or lebesgue curve since they don't do a great job of preserving locality. 
Indexing reduced precision terms can lead to (potentially significant) false 
positives (with nonlinear error). Here's a great bl.ock visualizing the range 
query from a bounding box over a morton curve:  
http://bl.ocks.org/jaredwinick/raw/5073432/  For an average example: encoding 
32.9482, -96.4538 with step = 32 results in two terms/geo points that are 500m 
a part. The error gets worse as this precision step is lowered. With the single 
high precision encoded term the error is 1e-7 decimal degrees.

bq. mm: Is there a requirement that these poly points are clockwise or 
counter-clockwise order or something?

There is. The points have to be cw or ccw and the polygon cannot be 
self-crossing.  It won't throw any exceptions, it just won't behave as 
expected. I went ahead and updated the javadoc comment to make sure that is 
clear.

bq. mm: For GeoPolygonQuery, why do we have public factory method that takes 
the bbox? Shouldn't this be private (bbox is computed from the polygon's 
points)? Or is this for expert usage or something?

The idea here is that polygons can contain a significant number of points, and 
users may already have the BBox (cached or otherwise precomputed). I thought 
this provided a nice way to save unnecessary processing if the caller can 
provide the bbox. 

bq. ds: Have you thought about a way to use GeoPointFieldType with 
pre-projected data

Yes, this can potentially be left as an enhancement but the intent is to have 
this apply to the most basic use cases. So I'm curious as to what the other's 
think about adding this capability or just leaving that to the spatial module. 

bq. ds: GeoPointFieldType has DocValues enabled yet I see that these queries 
don't use that; or did I miss something?

Not using them yet. The intent was to use them for sorting.

bq. ds: I would love to see some randomized testing of round-trip encode-decode 
of the morton numbers.

Agree.  I'll be adding randomized testing for sure.

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery, GeoPolygonQuery, and GeoDistanceQuery classes to the 
 .search package. This field is intended as a straightforward lightweight type 
 for the most basic geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-04-24 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511391#comment-14511391
 ] 

Michael McCandless commented on LUCENE-6450:


Thanks [~nknize], new patch looks great ... but can you add 
@lucene.experimental to all class-level javadocs so users know the index format 
is subject to change?

I think these classes really do belong in core: they cover the common case 
for spatial search.  But maybe we should start with sandbox for now since we 
may make changes that break the index format?

E.g. I think we should find a way to make use of index-time prefix terms (auto 
prefix or numeric field), because with the patch now we will visit O(N) terms 
and O(N) docs in the common case (no docs have exactly the same geo point), but 
if we can use prefix terms, we visit O(log(N)) terms and the same O(N) docs.  
The default block postings format is a far more efficient decode than the block 
terms dict, so offloading the work from terms dict - postings should be a big 
win (and the post-filtering work would be unchanged, but would have to use doc 
values not the term).

We could do smart things in that case, e.g. carefully pick which prefix terms 
to make use of because they are 100% contained by the shape, and then OR that 
with another query that matches the edge cells that must do post-filtering.

Maybe we try a different space filling curve, e.g. I think Hilbert curves would 
be good since they have better spatial locality?  They do have higher 
index-time cost to encode, which is fine, and if we have to cutover to doc 
values for post-filtering anyway (if we use the prefix terms) then we wouldn't 
need to pay a Hilbert decode cost at search time.

But this all should come later: I think this patch is a huge step forward 
already.

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-04-24 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511795#comment-14511795
 ] 

Uwe Schindler commented on LUCENE-6450:
---

bq. I'm curious about the precisionstep as well. The extra terms should give 
range queries a huge speedup if we can use them.

The problem is the following: The NRQ approach only works for ranges (at the 
bounds of the range, we use more precision terms, but in the center we use 
lower precision terms). The problem here is that we do the actual range to 
filter out those which can never match. But for those that are in the range we 
have to check if they are in the bbox. To do this, we need full precision. So 
we cannot use lower prec.

bq. Great stuff! Should this be used as the underlying implementation for 
Solr's LatLonType (which currently does not have multi-valued support)? Any 
downsides for the single-valued case?

The problem is: if you have a large bbox, and many distinct points you have to 
visit many terms, because this does not use trie algorithm of NRQ. It extends 
NRQ, but does not use the algorithm. It is a standard TermRangeQuery with some 
extra filtering. It does not even seek the terms enum! So for the single value 
case I would always prefer the 2 NRQ queries. In the worst case (bbox on whole 
earth), you have to visit *all* terms and get their postings = more or less 
the same like a default term range.

One workaround would be: If we would use hilbert curves, we can calculate the 
quadratic box around the center of the bbox that is representable as a single 
numeric range (one where no post filtering is needed). This range could be 
executed by the default NRQ algorithm with using shifted values. For the 
remaining stuff around we can visit only the high-prec terms. With the current 
Morton/Z-Curve we cannot do this. So if we don't fix this now, we must for sure 
put this into sandbox, so we have the chance to change the algorithm.

Another alternative is to just use plain NRQ (ideally also with more locality 
using hilber curves) and post filter the actual results (using doc values). 
This would also be preferable for polygons.

The current implementation is not useable for large bounding boxes covering 
many different positions! E.g. in my case (PANGAEA), we have lat/lon 
coordinates around the whole world including poles and scientists generally 
select large bboxes... It is perfectly fine for searching for shops in towns, 
of course :-)

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-04-24 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511796#comment-14511796
 ] 

Ishan Chattopadhyaya commented on LUCENE-6450:
--

+1, this looks good! Just skimmed through the patch, though.

 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: Trunk, 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
 LUCENE-6450.patch, LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the 
 overhead by adding a new GeoPointField type to core along with 
 GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
 field is intended as a straightforward lightweight type for the most basic 
 geo point use-cases without the overhead. 
 The field uses simple bit twiddling operations (currently morton hashing) to 
 encode lat/lon into a single long term.  The queries leverage simple 
 multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
 candidate terms deferring the more expensive mathematics to the smaller 
 candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core

2015-04-23 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510431#comment-14510431
 ] 

David Smiley commented on LUCENE-6450:
--

*This is really nice Nick; thanks for contributing!*

I looked over the patch.  I like the light-weight-ness and ease of use; and I 
suspect this will perform admirably.  It should address many use-cases, 
particularly once a point-radius (circle) query is added.  Did you forget or 
are you planning to add that in the future?  The other super-common use-case is 
distance sorting, and that's not here.  In pointing this out I don't mean to 
take away from the great stuff you have here.  Further comments:
* I can see that GeoPolygonQuery extends GeoBBoxQuery for re-use, but I think 
it's confusing.  Instead, how about a base AbstractGeoTermRangeQuery (or 
something like that) with GeoBBoxQuery being a fairly simple subclass?
* Somehow, I think the user should understand that these queries only work with 
a GeoPointFieldType.  Javadocs are minimally sufficient, but might the queries 
be named accordingly, such as GeoPointInPolygonQuery and GeoPointInBBoxQuery?  
Those names aren't even that long yet I think it's an important clarification 
to help differentiate geo/spatial stuff generally.
* GeoPointFieldType has DocValues enabled yet I see that these queries don't 
use that; or did I miss something?  Even if they don't, I can only surmise the 
intention to use them for distance sorting, although that's not here.
* If you derived those Morton number magic constants and related code yourself 
then you are a better man than I and most any other coder passerby to read 
this.  Otherwise, please reference your sources.
** I would love to see some randomized testing of round-trip encode-decode of 
the morton numbers.  I understand if there needs to be an error tolerance.
* I'd like to see more javadocs on spatial matters like:
** does the GeoBBoxQuery support dateline wrap?  Ditto for GeoPolygonQuery.
** are lat and lon in degrees or radians?
** What mathematical model does the polygon query operate in?  (Answer is 
Cartesian despite lat-lon surface of sphere coordinates: warn the user of the 
implications)
* Have you thought about a way to use GeoPointFieldType with pre-projected data 
(x,y with a pre-configured range boundary)?
* I'm wondering what [~mikemccand]'s opinion on what changes will be necessary 
if any to work with the experimental auto-prefix term stuff.

RE contributing to core: 

_I really wonder what other people think_.  If others think it's great then I'm 
definitely not going to stand in the way.  But I am concerned about confusion 
this may introduce about where spatial stuff is and how it's related.  Javadocs 
could help some.  I think it's a slippery slope as to identifying what the 
scope of spatial that you think should go in Lucene-core versus Lucene-spatial 
is.  Perhaps you might think it's due to the dependencies?  I think that's not 
a great differentiator, especially if one considers that the old spatial module 
(Lucene 3.x and prior by Patrick O'Leary) had no dependencies and I think you'd 
be hard pressed to think that belonged in Lucene-core if you were to see it.

This reminds me a little of some perceptual confusion of Solr's LatLonType 
(internally comprised of two double fields) versus a Solr field type for RPT.  
A new user is easily led to believe that they should use LatLonType if they 
have point data, especially because of it's name (hey yeah I have lat's and 
lon's), and say want to do simple bounding-box or point-radius queries.  Sure 
it works for that, and it may very well be the best choice, but depending on 
the scale and various factors it _may_ be less performant compared to RPT which 
perceptually appears as a more advanced choice to the user even though it's 
almost as easy to use for the simple cases.  By the same token, that could 
occur here if it's in core, but wouldn't if all this was wrapped up in a 
Lucene-spatial SpatialStrategy facade.  Not a big deal, I think; but a concern.

Thanks again for contributing this Nick.



 Add simple encoded GeoPointField type to core
 -

 Key: LUCENE-6450
 URL: https://issues.apache.org/jira/browse/LUCENE-6450
 Project: Lucene - Core
  Issue Type: New Feature
Affects Versions: 5.x
Reporter: Nicholas Knize
Priority: Minor
 Attachments: LUCENE-6450.patch


 At the moment all spatial capabilities, including basic point based indexing 
 and querying, require the lucene-spatial module. The spatial module, designed 
 to handle all things geo, requires dependency overhead (s4j, jts) to provide 
 spatial rigor for even the most simplistic spatial search use-cases (e.g., 
 lat/lon bounding box, point in poly, distance search). This feature trims the