[ https://issues.apache.org/jira/browse/LUCENE-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511795#comment-14511795 ]
Uwe Schindler edited comment on LUCENE-6450 at 4/24/15 9:30 PM: ---------------------------------------------------------------- bq. I'm curious about the precisionstep as well. The extra terms should give range queries a huge speedup if we can use them. The problem is the following: The NRQ approach only works for ranges (at the bounds of the range, we use more precision terms, but in the center we use lower precision terms). The problem here is that we do the actual range to filter out those which can never match. But for those that are in the range we have to check if they are in the bbox. To do this, we need full precision. So we cannot use lower prec. bq. Great stuff! Should this be used as the underlying implementation for Solr's LatLonType (which currently does not have multi-valued support)? Any downsides for the single-valued case? The problem is: if you have a large bbox, and many distinct points you have to visit many terms, because this does not use trie algorithm of NRQ. It extends NRQ, but does not use the algorithm. It is a standard TermRangeQuery with some extra filtering. It does not even seek the terms enum! So for the single value case I would always prefer the 2 NRQ queries on lat and lon separately. In the worst case (bbox on whole earth), you have to visit *all* terms and get their postings => more or less the same like a default term range. One workaround would be: If we would use hilbert curves, we can calculate the quadratic box around the center of the bbox that is representable as a single numeric range (one where no post filtering is needed). This range could be executed by the default NRQ algorithm with using shifted values. For the remaining stuff around we can visit only the high-prec terms. With the current Morton/Z-Curve we cannot do this. So if we don't fix this now, we must for sure put this into sandbox, so we have the chance to change the algorithm. Another alternative is to just use plain NRQ (ideally also with more locality using hilber curves) and post filter the actual results (using doc values). This would also be preferable for polygons. The current implementation is not useable for large bounding boxes covering many different positions! E.g. in my case (PANGAEA), we have lat/lon coordinates around the whole world including poles and scientists generally select large bboxes... It is perfectly fine for searching for shops in towns, of course :-) was (Author: thetaphi): bq. I'm curious about the precisionstep as well. The extra terms should give range queries a huge speedup if we can use them. The problem is the following: The NRQ approach only works for ranges (at the bounds of the range, we use more precision terms, but in the center we use lower precision terms). The problem here is that we do the actual range to filter out those which can never match. But for those that are in the range we have to check if they are in the bbox. To do this, we need full precision. So we cannot use lower prec. bq. Great stuff! Should this be used as the underlying implementation for Solr's LatLonType (which currently does not have multi-valued support)? Any downsides for the single-valued case? The problem is: if you have a large bbox, and many distinct points you have to visit many terms, because this does not use trie algorithm of NRQ. It extends NRQ, but does not use the algorithm. It is a standard TermRangeQuery with some extra filtering. It does not even seek the terms enum! So for the single value case I would always prefer the 2 NRQ queries. In the worst case (bbox on whole earth), you have to visit *all* terms and get their postings => more or less the same like a default term range. One workaround would be: If we would use hilbert curves, we can calculate the quadratic box around the center of the bbox that is representable as a single numeric range (one where no post filtering is needed). This range could be executed by the default NRQ algorithm with using shifted values. For the remaining stuff around we can visit only the high-prec terms. With the current Morton/Z-Curve we cannot do this. So if we don't fix this now, we must for sure put this into sandbox, so we have the chance to change the algorithm. Another alternative is to just use plain NRQ (ideally also with more locality using hilber curves) and post filter the actual results (using doc values). This would also be preferable for polygons. The current implementation is not useable for large bounding boxes covering many different positions! E.g. in my case (PANGAEA), we have lat/lon coordinates around the whole world including poles and scientists generally select large bboxes... It is perfectly fine for searching for shops in towns, of course :-) > Add simple encoded GeoPointField type to core > --------------------------------------------- > > Key: LUCENE-6450 > URL: https://issues.apache.org/jira/browse/LUCENE-6450 > Project: Lucene - Core > Issue Type: New Feature > Affects Versions: Trunk, 5.x > Reporter: Nicholas Knize > Priority: Minor > Attachments: LUCENE-6450-5x.patch, LUCENE-6450-TRUNK.patch, > LUCENE-6450.patch, LUCENE-6450.patch > > > At the moment all spatial capabilities, including basic point based indexing > and querying, require the lucene-spatial module. The spatial module, designed > to handle all things geo, requires dependency overhead (s4j, jts) to provide > spatial rigor for even the most simplistic spatial search use-cases (e.g., > lat/lon bounding box, point in poly, distance search). This feature trims the > overhead by adding a new GeoPointField type to core along with > GeoBoundingBoxQuery and GeoPolygonQuery classes to the .search package. This > field is intended as a straightforward lightweight type for the most basic > geo point use-cases without the overhead. > The field uses simple bit twiddling operations (currently morton hashing) to > encode lat/lon into a single long term. The queries leverage simple > multi-phase filtering that starts by leveraging NumericRangeQuery to reduce > candidate terms deferring the more expensive mathematics to the smaller > candidate sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org