[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541840#comment-14541840 ] Uwe Schindler commented on LUCENE-6450: --- bq. As a side note, I'm finishing up a patch that uses precision_step for indexing the longs at variable resolution to take advantage of the postings list and not visit every term Cool! :-) Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541831#comment-14541831 ] Nicholas Knize commented on LUCENE-6450: bq. does lucene efficiently support field types of that length? Yes. This patch (and PackedQuadTree) uses longs for encoding 2d points. I went ahead and opened a separate issue [LUCENE-6480 | https://issues.apache.org/jira/browse/LUCENE-6480] for investigating the 3d case so we can carry the discussion over there. The goal for this field is to provide a framework for search so all we have to worry about is trying out different encoding techniques. As a side note, I'm finishing up a patch that uses precision_step for indexing the longs at variable resolution to take advantage of the postings list and not visit every term. The index will be slightly bigger but it should provide the foundation for faster search on large polygons and bounding boxes. I'll add mercator projection after to reduce precision error over large search regions and then switch to geo3d and benchmark. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541948#comment-14541948 ] David Smiley commented on LUCENE-6450: -- bq. As a side note, I'm finishing up a patch that uses precision_step for indexing the longs at variable resolution to take advantage of the postings list and not visit every term. The index will be slightly bigger but it should provide the foundation for faster search on large polygons and bounding boxes. If I'm not mistaken, the term auto-prefixing that Mike worked on means we need not do that here, especially just for point data; no? Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541960#comment-14541960 ] Nicholas Knize commented on LUCENE-6450: yes yes! That's the idea anyway. I've tinkered with this a bit already. It took the same amount of time to build the Automaton as it did the ranges (no surprises since it used the same logic) but queries were on the order of 10x slower (0.8sec/query on 60M points). Thinking maybe there's some optimization to the automaton that needs to be done? I figured first make progress here and post a separate issue for the automaton WIP. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542429#comment-14542429 ] Uwe Schindler commented on LUCENE-6450: --- Looks OK regarding my comments about subclassing. One thing: could you make the fields final in the query? Query should be immutable, so the min/maxLat/Lon soubles and polygon array are unmodifiable. I will have a closer look later, I just skimmed through the patch. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541578#comment-14541578 ] Karl Wright commented on LUCENE-6450: - I have some ideas for a geohash given (x,y,z) values that may turn out to be of interest. This geohash would have acceptable precision (a few meters) when packed in a long (64 bits). Question: does lucene efficiently support field types of that length? Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539533#comment-14539533 ] Karl Wright commented on LUCENE-6450: - Thinking about this further, I think that the simplicity of the GeoPointField solution is based on two things: (1) A good binary packing of a geohash value of a fixed depth (2) The ability to reconstruct the lat/lon from the geohash value directly For a Geo3d equivalent, nobody to date has come up with a decent (x,y,z) geohash with the right locality properties. I'm not certain how important those locality properties actually *are* for GeoPointField, though. If they are important, then effectively we'd need to pack the *same* lat/lon based geohash value that GeoPointField uses, but have a lightning fast way of converting it to (x,y,z). A lookup table would suffice but would be huge at 9mm resolution. :-) Adjusting the resolution would be a potential solution. If locality is *not* needed, then really a geohash would be any workable fixed-resolution binary encoding of the (x,y,z) values of any given lat/lon. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538052#comment-14538052 ] Karl Wright commented on LUCENE-6450: - A word of caution. For polygons, most cartographers would not consider the following to accurately represent a polygon on a sphere: {code} +public static boolean pointInPolygon(double[] x, double[] y, double lat, double lon) { +assert x.length == y.length; +boolean inPoly = false; + +for (int i = 1; i x.length; i++) { +if (x[i] lon x[i-1] = lon || x[i-1] lon x[i] = lon) { +if (y[i] + (lon - x[i]) / (x[i-1] - x[i]) * (y[i-1] - y[i]) lat) { +inPoly = !inPoly; +} +} +} +return inPoly; +} {code} This is a cartesian approximation only -- polygon edges would appear to be curved on a globe. This will limit usage to applications for which cartesian approximations are acceptable. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538465#comment-14538465 ] Nicholas Knize commented on LUCENE-6450: bq. It's only loosely coupled to the spatial module right now. I've seen the code. Its well designed. My point is that its in the spatial module right now. It would need to be taken out of the spatial module for it to be usable in core and by this patch. Sounds like a separate issue for lucene committer buy in. bq. Most GIS code can readily map a point on an earth model to a point on a sphere. I'm not arguing with you. My point refers to the, maybe 2%, class of lucene geo users you mentioned. Those same Cartographers that would counter-argue that geo3d is not ISO 19107 compliant so sphere accuracy is not good enough for them (enjoy those passionate discussions at FOSS4G). Don't misunderstand me - I'm not calling your baby ugly. There's great stuff there and I'm all for investigating using it to improve the simple approach in this patch. In the meantime, since it sounds like accuracy concerns for the large poly use case are for real?, it might be easier just to add the few lines of mercator re-projection code to the post-filter and investigate using geo3d in a phase 2 performance improvement (if it becomes available to core). Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538321#comment-14538321 ] Nicholas Knize commented on LUCENE-6450: Yes. Geo3d would be a nice fit here for using planar geometry to better approximate the earth's surface to a sphere. That would require it be decoupled from the spatial module and committed stand alone to a core.geometry package. (maybe not a bad idea?) Though it does diverge from the original intent of this patch. Which is provide core with a simple lightweight/scalable geo field and API that applies to the 95% use case not requiring cartographic precision. The advanced GIS use case can use the spatial module, but in that case I'd point out that most cartographers/photogrammetrists would still consider geo3d inaccurate since the sphere is not an accurate representation of the earth's surface. There would need to be support for reprojection using appropriate EPSG/CRS datum (something best fit to remain in the spatial module, not core). For that I'm exploring the use of SIS (something I'll sandbox for collaboration). In the meantime, if there's concern about this patch struggling with accuracy on large polygons, a relatively straightforward and fast approach should be to overestimate the bbox, reproject intermediate results to a cylindrical projection, and use the same simple cartesian based PIP. Any enhancement that saves the complex computation to smaller result sets is a win. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538584#comment-14538584 ] Karl Wright commented on LUCENE-6450: - I agree that, to consider it seriously, geo3d would have to be moved to core. So somebody would need to understand the benefit. If that happens, though, I could foresee an almost-completely-parallel implementation of the GeoPointField type, let's call it Geo3DPointField, which would encode X,Y,Z instead of lat and lon. Almost all the same ideas and architecture would work, AFAICT. The cons are that the index would be bigger (3 floating values instead of two, etc.), and slower to build (more math). The search, on the other hand, would be not much slower I believe. As for the politics of ISO 19107, let's remember that FOSS4G exists to implement standards. I'm interested, though, in the search problem, as are you. And there's a world of difference between the cartesian view of things and a 3D one. I think there's room for both. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538618#comment-14538618 ] Nicholas Knize commented on LUCENE-6450: bq. I could foresee an almost-completely-parallel implementation of the GeoPointField type, let's call it Geo3DPointField, which would encode X,Y,Z instead of lat and lon. +++1 This was the intent of the PackedQuadTree patch for lucene spatial and can certainly be added to core. Add a 3d morton encoding to GeoUtils using 3 bits. Thus naive morton Hypercube ordering w/ 0/1 representing Left/Right || Bottom/Top || Front/Hind at level 1 gives 000:111. This is where locality really matters though, so phase 2 would extend this into something more organized, e.g., Hilbert ordering at level 1 is: 010, 011, 111, 110, 100, 101, 001, 000. There are numerous issues with this; ordering becomes non-trivial, encoded values do not fit nicely in Longs, term size becomes larger. But I agree, I don't believe search will be much slower. The first phase here is to investigate improving this patch w/ Auto-prefix. I think if we can get that working and leave prefix matching to the automaton then the hardest part becomes optimizing the encoding? Maybe I'll open an issue around this for separate discussion so as to not dilute this issues comments. On topic here; I propose we sandbox geo3d along w/ these field types. Get something incubated before proposing the addition to core? Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538646#comment-14538646 ] Karl Wright commented on LUCENE-6450: - bq. Maybe I'll open an issue around this for separate discussion so as to not dilute this issues comments. +1 bq. I propose we sandbox geo3d along w/ these field types. Get something incubated before proposing the addition to core? +1 from me. I'll be happy to provide Geo3D guidance if you need it; I'll be less helpful with encoding help. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538373#comment-14538373 ] Karl Wright commented on LUCENE-6450: - bq. That would require it be decoupled from the spatial module and committed stand alone to a core.geometry package. (maybe not a bad idea?) It's only loosely coupled to the spatial module right now. There's a single class integrating it with spatial4j (not counting the random tests). bq. but in that case I'd point out that most cartographers/photogrammetrists would still consider geo3d inaccurate since the sphere is not an accurate representation of the earth's surface. Most GIS code can readily map a point on an earth model to a point on a sphere. And even if you don't do that, and just use the sphere, your accuracy for whether a given lat/lon is within a given shape is a few meters at most. Whereas the error for a cartesian projection can be thousands of kilometers. Given that geo3d is also designed to be very fast, and equally compatible with geohash construction, I would think you might be interested in integrating your simple geopoint proposal with it. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538092#comment-14538092 ] Michael McCandless commented on LUCENE-6450: bq. This will limit usage to applications for which cartesian approximations are acceptable. But such users that care about the earth curvature of this approximation can use geo3d right? Also, isn't the curvature less substantial the smaller the perimeter of the polygon? E.g. http://apartments.com makes a big deal about doing polygon searching for your apartment, but such users are likely to draw miniscule (relative to earth's curvature) polygons, on already-projected maps, so the curvature wouldn't matter for these users? I'm very much a spatial newbie so I could be completely wrong :) Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538099#comment-14538099 ] Karl Wright commented on LUCENE-6450: - If you are talking about tiny polygons, not near a pole, then you are probably fine. Earlier comments in this ticket, however, mention navigation and scientific usage. That's probably not reasonable. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527392#comment-14527392 ] Michael McCandless commented on LUCENE-6450: This new approach is nice! I don't fully understand all the geo math, but I think I get the gist: you recursively approximate the target shape using smaller and smaller ranges from the morton encoding, and then record when that z-shape is fully within the query and avoid the post-filtering for those ranges. This visits fewer terms than the original patch, which did just a single range that can (w/ the right 'adversary') visit a great many false terms. It's impressive how fast this is, without using any NumericField prefix terms. I think we can explore that later and we should commit this approach now .. Maybe add a test case w/ more data, e.g. randomized test? It could index a bunch of random points, and then run random rects/shapes and do the dumb slow check every single doc check and confirm query hits agree. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527410#comment-14527410 ] Nicholas Knize commented on LUCENE-6450: That's right. The old patch was a naive scan the world approach. Really unusable at scale. As said this one approximates the bounding box as the set of ranges on the space filling curve. I think [~dsmiley] had also suggested random testing, which is definitely necessary. I'll add some randomized testing and post a new patch. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527489#comment-14527489 ] Uwe Schindler commented on LUCENE-6450: --- Hi, I will look into this tomorrow (it is too late) now... This looks like it has a completely separate TermsEnum and query impl. Why not extend MultiTermQuery directly and let NRQ live on its own? Uwe Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527496#comment-14527496 ] Uwe Schindler commented on LUCENE-6450: --- bq. It's impressive how fast this is, without using any NumericField prefix terms. I think we can explore that later and we should commit this approach now .. We should first compare how this behaves on *large* bboxes, so a random test / perf test spanning large parts of world and large indexes with maaany points would be good (whole atlantic, whole africa,...). It is also mentioned that it does not allow to cross date line, which is easy to do by splitting into 2 queries, one left of date line, one right. I can help with that. Then we should also test perf with queries spanning whole pacific :-) Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527300#comment-14527300 ] David Smiley commented on LUCENE-6450: -- Nice code Nick! LGTM. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527318#comment-14527318 ] David Smiley commented on LUCENE-6450: -- Just curious; how did that Python RTree benchmark compare? https://code.google.com/a/apache-extras.org/p/luceneutil/source/browse/src/python/SearchOSM.py?spec=svn188e330ea8c34a9720cbf0414d2ed19f6a843a3dr=188e330ea8c34a9720cbf0414d2ed19f6a843a3d#1 Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527338#comment-14527338 ] Michael McCandless commented on LUCENE-6450: Here's the OSM subset I'm using for the benchmarks: http://people.apache.org/~mikemccand/latlon.subsetPlusAllLondon.txt.lzma It's a random 1/50th of the latest OSM export (as of last week), but includes all points within London, UK. The search benchmark then runs a fixed set (225 total) of axis-aligned rectangle intersects queries around London. Look for Index/SearchOSM/GeoPoint.java/py in luceneutil... I ran the same benchmarks (except for Packed/QuadPrefixTree): *Geopoint* Index time: 157.3 sec (incl. forceMerge) Index size: 1.8 GB Mean query time: .077 sec 221,119,062 total hits *GeoHashPrefixTree* Index time: 628.5 sec (incl. forceMerge) Index size: 4.2 GB Mean query time: .039 sec 221,120,027 total hits *libspatialindex* (using Python Rtree wrapper) Index time: 469.6 sec Index size: 2.6 GB Mean query time: .158 sec 221,118,844 total hits The first geopoint patch here got exactly the same total hit count as libspatialindex, but now it's different, I think because of the precision control to control how deep the ranges recurse. I think it's also expected geohash won't get the same hit count since it's doing a bit of quantizing (level 11 ... not sure what that equates to in meters). I'm surprised the Rtree impl is so slow ... Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527148#comment-14527148 ] David Smiley commented on LUCENE-6450: -- Can you please direct me to the luceneutil and geo benchmark? I'm curious what that's about. The numbers look nice. Small indexes and fast index time :-) It'd be interesting to try GeoHashPrefixTree, which will have smaller indexes than Quad. I'll check out your code shortly. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527159#comment-14527159 ] Nicholas Knize commented on LUCENE-6450: Updated w/ GeoHashPrefixTree benchmarks reference to luceneutil: https://code.google.com/a/apache-extras.org/p/luceneutil/ Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510688#comment-14510688 ] Michael McCandless commented on LUCENE-6450: This patch looks wonderful [~nknize]! Hmm in this code: {noformat} +@Override +protected final AcceptStatus accept(BytesRef term) { + AcceptStatus status = super.accept(term); + if (status != AcceptStatus.YES) { +return status; + } {noformat} ... do you also need to more carefully handle the AcceptStatus.YES_AND_SEEK case? Oh I see: NumericRangeTermsEnum never returns this ... maybe add an assert status != AcceptStatus.YES_AND_SEEK? Can we just expose simple ctors for these queries instead of the static factory methods? For GeoPolygonQuery, why do we have public factory method that takes the bbox? Shouldn't this be private (bbox is computed from the polygon's points)? Or is this for expert usage or something? In GeoPolygonTermsEnum, the comment // final-filter by bounding box should really be // second-pass filter by bounding box right? The final filter is the polygon check... That GeoUtils.pointInPolygon method is magic to me :) I had no idea it was so simple to check if a point is inside a polygon. Is there a requirement that these poly points are clockwise or counter-clockwise order or something? Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery, GeoPolygonQuery, and GeoDistanceQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510717#comment-14510717 ] Michael McCandless commented on LUCENE-6450: Hmm: another question about overriding the accept() method for NumericRangeTermsEnum: it looks like you extract the full precision lat/lon from the incoming term, yet this term could be a shifted term right (lost some of its lower bits)? Is the morton encoding ok with that lost precision? I'm a little confused how it can work properly ... it seems like it needs to see the full precision terms under a shifted range... Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery, GeoPolygonQuery, and GeoDistanceQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510757#comment-14510757 ] Uwe Schindler commented on LUCENE-6450: --- Hi, looks nice! I have not yet looked fully into it, but I am fine. I have the same question regarding shifted terms in the terms enum like Mike. One small API issue: It is fine to subclass NRQ, but instead of making the constructor now protected, it should be package private (without access modifier). The static ctors are needed in NRQ to be able to handle the different data types. For the geo queries which just have one way to instantiate them, a simple public ctor is ok, no static factory needed. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery, GeoPolygonQuery, and GeoDistanceQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510761#comment-14510761 ] Uwe Schindler commented on LUCENE-6450: --- bq. mm: another question about overriding the accept() method for NumericRangeTermsEnum: it looks like you extract the full precision lat/lon from the incoming term, yet this term could be a shifted term right (lost some of its lower bits)? I think I figured out: the Geo queries always use precStep=64, so shifted terms will never appear. I just have the question (why do this?). The whole sense of NumericRangeQuery is to use the lower-precision terms to not visit too many terms... Could you, Nicholas, please explain, why we dont want to use the prefix terms? I agree it makes it more complicated, but I think the current encoding could use it. And the terms are not wasteful! :-) Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery, GeoPolygonQuery, and GeoDistanceQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511048#comment-14511048 ] Michael McCandless commented on LUCENE-6450: bq. I think I figured out: the Geo queries always use precStep=64, so shifted terms will never appear. Ahhh, ok, that makes sense. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery, GeoPolygonQuery, and GeoDistanceQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510980#comment-14510980 ] Robert Muir commented on LUCENE-6450: - I'm curious about the precisionstep as well. The extra terms should give range queries a huge speedup if we can use them. Since there is a custom encoding, i do wonder if we should add sortednumeric dv field as well. Sorting and faceting could then work easily etc. We could provide valuesource thingies, add support for this to expressions/, facet/, and so on. In general, that is what i like overall about the patch, its simple and contained solves the 99% majority use case of spatial. And we could expose this everywhere (like adding syntax to queryparser and you name it: please no geo-geek syntax, same approach, 99% case for the masses). Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery, GeoPolygonQuery, and GeoDistanceQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511247#comment-14511247 ] Yonik Seeley commented on LUCENE-6450: -- Great stuff! Should this be used as the underlying implementation for Solr's LatLonType (which currently does not have multi-valued support)? Any downsides for the single-valued case? Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511388#comment-14511388 ] Nicholas Knize commented on LUCENE-6450: It certainly could be used as the implementation for the LatLonType. Might be worthwhile exploring as a separate issue? Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511182#comment-14511182 ] Nicholas Knize commented on LUCENE-6450: bq. ds. particularly once a point-radius (circle) query is added. Did you forget or are you planning to add that in the future? The other super-common use-case is distance sorting... I'm working on adding both point-radius query and distance sorting. I wanted to get the first version out for initial feedback. Seemed to work out nicely with all of the great suggestions so far. bq. uwe: so shifted terms will never appear. I just have the question (why do this?). Space-filling curves are highly sensitive to precision - especially the morton or lebesgue curve since they don't do a great job of preserving locality. Indexing reduced precision terms can lead to (potentially significant) false positives (with nonlinear error). Here's a great bl.ock visualizing the range query from a bounding box over a morton curve: http://bl.ocks.org/jaredwinick/raw/5073432/ For an average example: encoding 32.9482, -96.4538 with step = 32 results in two terms/geo points that are 500m a part. The error gets worse as this precision step is lowered. With the single high precision encoded term the error is 1e-7 decimal degrees. bq. mm: Is there a requirement that these poly points are clockwise or counter-clockwise order or something? There is. The points have to be cw or ccw and the polygon cannot be self-crossing. It won't throw any exceptions, it just won't behave as expected. I went ahead and updated the javadoc comment to make sure that is clear. bq. mm: For GeoPolygonQuery, why do we have public factory method that takes the bbox? Shouldn't this be private (bbox is computed from the polygon's points)? Or is this for expert usage or something? The idea here is that polygons can contain a significant number of points, and users may already have the BBox (cached or otherwise precomputed). I thought this provided a nice way to save unnecessary processing if the caller can provide the bbox. bq. ds: Have you thought about a way to use GeoPointFieldType with pre-projected data Yes, this can potentially be left as an enhancement but the intent is to have this apply to the most basic use cases. So I'm curious as to what the other's think about adding this capability or just leaving that to the spatial module. bq. ds: GeoPointFieldType has DocValues enabled yet I see that these queries don't use that; or did I miss something? Not using them yet. The intent was to use them for sorting. bq. ds: I would love to see some randomized testing of round-trip encode-decode of the morton numbers. Agree. I'll be adding randomized testing for sure. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery, GeoPolygonQuery, and GeoDistanceQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511391#comment-14511391 ] Michael McCandless commented on LUCENE-6450: Thanks [~nknize], new patch looks great ... but can you add @lucene.experimental to all class-level javadocs so users know the index format is subject to change? I think these classes really do belong in core: they cover the common case for spatial search. But maybe we should start with sandbox for now since we may make changes that break the index format? E.g. I think we should find a way to make use of index-time prefix terms (auto prefix or numeric field), because with the patch now we will visit O(N) terms and O(N) docs in the common case (no docs have exactly the same geo point), but if we can use prefix terms, we visit O(log(N)) terms and the same O(N) docs. The default block postings format is a far more efficient decode than the block terms dict, so offloading the work from terms dict - postings should be a big win (and the post-filtering work would be unchanged, but would have to use doc values not the term). We could do smart things in that case, e.g. carefully pick which prefix terms to make use of because they are 100% contained by the shape, and then OR that with another query that matches the edge cells that must do post-filtering. Maybe we try a different space filling curve, e.g. I think Hilbert curves would be good since they have better spatial locality? They do have higher index-time cost to encode, which is fine, and if we have to cutover to doc values for post-filtering anyway (if we use the prefix terms) then we wouldn't need to pay a Hilbert decode cost at search time. But this all should come later: I think this patch is a huge step forward already. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511795#comment-14511795 ] Uwe Schindler commented on LUCENE-6450: --- bq. I'm curious about the precisionstep as well. The extra terms should give range queries a huge speedup if we can use them. The problem is the following: The NRQ approach only works for ranges (at the bounds of the range, we use more precision terms, but in the center we use lower precision terms). The problem here is that we do the actual range to filter out those which can never match. But for those that are in the range we have to check if they are in the bbox. To do this, we need full precision. So we cannot use lower prec. bq. Great stuff! Should this be used as the underlying implementation for Solr's LatLonType (which currently does not have multi-valued support)? Any downsides for the single-valued case? The problem is: if you have a large bbox, and many distinct points you have to visit many terms, because this does not use trie algorithm of NRQ. It extends NRQ, but does not use the algorithm. It is a standard TermRangeQuery with some extra filtering. It does not even seek the terms enum! So for the single value case I would always prefer the 2 NRQ queries. In the worst case (bbox on whole earth), you have to visit *all* terms and get their postings = more or less the same like a default term range. One workaround would be: If we would use hilbert curves, we can calculate the quadratic box around the center of the bbox that is representable as a single numeric range (one where no post filtering is needed). This range could be executed by the default NRQ algorithm with using shifted values. For the remaining stuff around we can visit only the high-prec terms. With the current Morton/Z-Curve we cannot do this. So if we don't fix this now, we must for sure put this into sandbox, so we have the chance to change the algorithm. Another alternative is to just use plain NRQ (ideally also with more locality using hilber curves) and post filter the actual results (using doc values). This would also be preferable for polygons. The current implementation is not useable for large bounding boxes covering many different positions! E.g. in my case (PANGAEA), we have lat/lon coordinates around the whole world including poles and scientists generally select large bboxes... It is perfectly fine for searching for shops in towns, of course :-) Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511796#comment-14511796 ] Ishan Chattopadhyaya commented on LUCENE-6450: -- +1, this looks good! Just skimmed through the patch, though. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: Trunk, 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, LUCENE-6450.patch, LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the overhead by adding a new GeoPointField type to core along with GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This field is intended as a straightforward lightweight type for the most basic geo point use-cases without the overhead. The field uses simple bit twiddling operations (currently morton hashing) to encode lat/lon into a single long term. The queries leverage simple multi-phase filtering that starts by leveraging NumericRangeQuery to reduce candidate terms deferring the more expensive mathematics to the smaller candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6450) Add simple encoded GeoPointField type to core
[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510431#comment-14510431 ] David Smiley commented on LUCENE-6450: -- *This is really nice Nick; thanks for contributing!* I looked over the patch. I like the light-weight-ness and ease of use; and I suspect this will perform admirably. It should address many use-cases, particularly once a point-radius (circle) query is added. Did you forget or are you planning to add that in the future? The other super-common use-case is distance sorting, and that's not here. In pointing this out I don't mean to take away from the great stuff you have here. Further comments: * I can see that GeoPolygonQuery extends GeoBBoxQuery for re-use, but I think it's confusing. Instead, how about a base AbstractGeoTermRangeQuery (or something like that) with GeoBBoxQuery being a fairly simple subclass? * Somehow, I think the user should understand that these queries only work with a GeoPointFieldType. Javadocs are minimally sufficient, but might the queries be named accordingly, such as GeoPointInPolygonQuery and GeoPointInBBoxQuery? Those names aren't even that long yet I think it's an important clarification to help differentiate geo/spatial stuff generally. * GeoPointFieldType has DocValues enabled yet I see that these queries don't use that; or did I miss something? Even if they don't, I can only surmise the intention to use them for distance sorting, although that's not here. * If you derived those Morton number magic constants and related code yourself then you are a better man than I and most any other coder passerby to read this. Otherwise, please reference your sources. ** I would love to see some randomized testing of round-trip encode-decode of the morton numbers. I understand if there needs to be an error tolerance. * I'd like to see more javadocs on spatial matters like: ** does the GeoBBoxQuery support dateline wrap? Ditto for GeoPolygonQuery. ** are lat and lon in degrees or radians? ** What mathematical model does the polygon query operate in? (Answer is Cartesian despite lat-lon surface of sphere coordinates: warn the user of the implications) * Have you thought about a way to use GeoPointFieldType with pre-projected data (x,y with a pre-configured range boundary)? * I'm wondering what [~mikemccand]'s opinion on what changes will be necessary if any to work with the experimental auto-prefix term stuff. RE contributing to core: _I really wonder what other people think_. If others think it's great then I'm definitely not going to stand in the way. But I am concerned about confusion this may introduce about where spatial stuff is and how it's related. Javadocs could help some. I think it's a slippery slope as to identifying what the scope of spatial that you think should go in Lucene-core versus Lucene-spatial is. Perhaps you might think it's due to the dependencies? I think that's not a great differentiator, especially if one considers that the old spatial module (Lucene 3.x and prior by Patrick O'Leary) had no dependencies and I think you'd be hard pressed to think that belonged in Lucene-core if you were to see it. This reminds me a little of some perceptual confusion of Solr's LatLonType (internally comprised of two double fields) versus a Solr field type for RPT. A new user is easily led to believe that they should use LatLonType if they have point data, especially because of it's name (hey yeah I have lat's and lon's), and say want to do simple bounding-box or point-radius queries. Sure it works for that, and it may very well be the best choice, but depending on the scale and various factors it _may_ be less performant compared to RPT which perceptually appears as a more advanced choice to the user even though it's almost as easy to use for the simple cases. By the same token, that could occur here if it's in core, but wouldn't if all this was wrapped up in a Lucene-spatial SpatialStrategy facade. Not a big deal, I think; but a concern. Thanks again for contributing this Nick. Add simple encoded GeoPointField type to core - Key: LUCENE-6450 URL: https://issues.apache.org/jira/browse/LUCENE-6450 Project: Lucene - Core Issue Type: New Feature Affects Versions: 5.x Reporter: Nicholas Knize Priority: Minor Attachments: LUCENE-6450.patch At the moment all spatial capabilities, including basic point based indexing and querying, require the lucene-spatial module. The spatial module, designed to handle all things geo, requires dependency overhead (s4j, jts) to provide spatial rigor for even the most simplistic spatial search use-cases (e.g., lat/lon bounding box, point in poly, distance search). This feature trims the