[jira] [Comment Edited] (LUCENE-6450) Add simple encoded GeoPointField type to core

Uwe Schindler (JIRA) Fri, 24 Apr 2015 14:31:02 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511795#comment-14511795
 ]


Uwe Schindler edited comment on LUCENE-6450 at 4/24/15 9:30 PM:
----------------------------------------------------------------

bq. I'm curious about the precisionstep as well. The extra terms should give 
range queries a huge speedup if we can use them.

The problem is the following: The NRQ approach only works for ranges (at the 
bounds of the range, we use more precision terms, but in the center we use 
lower precision terms). The problem here is that we do the actual range to 
filter out those which can never match. But for those that are in the range we 
have to check if they are in the bbox. To do this, we need full precision. So 
we cannot use lower prec.

bq. Great stuff! Should this be used as the underlying implementation for 
Solr's LatLonType (which currently does not have multi-valued support)? Any 
downsides for the single-valued case?

The problem is: if you have a large bbox, and many distinct points you have to 
visit many terms, because this does not use trie algorithm of NRQ. It extends 
NRQ, but does not use the algorithm. It is a standard TermRangeQuery with some 
extra filtering. It does not even seek the terms enum! So for the single value 
case I would always prefer the 2 NRQ queries on lat and lon separately. In the 
worst case (bbox on whole earth), you have to visit *all* terms and get their 
postings => more or less the same like a default term range.

One workaround would be: If we would use hilbert curves, we can calculate the 
quadratic box around the center of the bbox that is representable as a single 
numeric range (one where no post filtering is needed). This range could be 
executed by the default NRQ algorithm with using shifted values. For the 
remaining stuff around we can visit only the high-prec terms. With the current 
Morton/Z-Curve we cannot do this. So if we don't fix this now, we must for sure 
put this into sandbox, so we have the chance to change the algorithm.

Another alternative is to just use plain NRQ (ideally also with more locality 
using hilber curves) and post filter the actual results (using doc values). 
This would also be preferable for polygons.

The current implementation is not useable for large bounding boxes covering 
many different positions! E.g. in my case (PANGAEA), we have lat/lon 
coordinates around the whole world including poles and scientists generally 
select large bboxes... It is perfectly fine for searching for shops in towns, 
of course :-)


was (Author: thetaphi):
bq. I'm curious about the precisionstep as well. The extra terms should give 
range queries a huge speedup if we can use them.

The problem is the following: The NRQ approach only works for ranges (at the 
bounds of the range, we use more precision terms, but in the center we use 
lower precision terms). The problem here is that we do the actual range to 
filter out those which can never match. But for those that are in the range we 
have to check if they are in the bbox. To do this, we need full precision. So 
we cannot use lower prec.

bq. Great stuff! Should this be used as the underlying implementation for 
Solr's LatLonType (which currently does not have multi-valued support)? Any 
downsides for the single-valued case?

The problem is: if you have a large bbox, and many distinct points you have to 
visit many terms, because this does not use trie algorithm of NRQ. It extends 
NRQ, but does not use the algorithm. It is a standard TermRangeQuery with some 
extra filtering. It does not even seek the terms enum! So for the single value 
case I would always prefer the 2 NRQ queries. In the worst case (bbox on whole 
earth), you have to visit *all* terms and get their postings => more or less 
the same like a default term range.

One workaround would be: If we would use hilbert curves, we can calculate the 
quadratic box around the center of the bbox that is representable as a single 
numeric range (one where no post filtering is needed). This range could be 
executed by the default NRQ algorithm with using shifted values. For the 
remaining stuff around we can visit only the high-prec terms. With the current 
Morton/Z-Curve we cannot do this. So if we don't fix this now, we must for sure 
put this into sandbox, so we have the chance to change the algorithm.

Another alternative is to just use plain NRQ (ideally also with more locality 
using hilber curves) and post filter the actual results (using doc values). 
This would also be preferable for polygons.

The current implementation is not useable for large bounding boxes covering 
many different positions! E.g. in my case (PANGAEA), we have lat/lon 
coordinates around the whole world including poles and scientists generally 
select large bboxes... It is perfectly fine for searching for shops in towns, 
of course :-)

> Add simple encoded GeoPointField type to core
> ---------------------------------------------
>
>                 Key: LUCENE-6450
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6450
>             Project: Lucene - Core
>          Issue Type: New Feature
>    Affects Versions: Trunk, 5.x
>            Reporter: Nicholas Knize
>            Priority: Minor
>         Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, 
> LUCENE-6450.patch, LUCENE-6450.patch
>
>
> At the moment all spatial capabilities, including basic point based indexing 
> and querying, require the lucene-spatial module. The spatial module, designed 
> to handle all things geo, requires dependency overhead (s4j, jts) to provide 
> spatial rigor for even the most simplistic spatial search use-cases (e.g., 
> lat/lon bounding box, point in poly, distance search). This feature trims the 
> overhead by adding a new GeoPointField type to core along with 
> GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This 
> field is intended as a straightforward lightweight type for the most basic 
> geo point use-cases without the overhead. 
> The field uses simple bit twiddling operations (currently morton hashing) to 
> encode lat/lon into a single long term.  The queries leverage simple 
> multi-phase filtering that starts by leveraging NumericRangeQuery to reduce 
> candidate terms deferring the more expensive mathematics to the smaller 
> candidate sets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-6450) Add simple encoded GeoPointField type to core

Reply via email to