[ https://issues.apache.org/jira/browse/LUCENE-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Smiley updated LUCENE-4869: --------------------------------- Description: LUCENE-4644 implemented the "IsWithin" predicate for a RecursivePrefixTree based field. It's slow since it looks across the whole world to ensure it doesn't match docs with data anywhere outside the query shape. It can be configured to only look outside the query shape using a very small buffer distance, and that will filter out documents spanning the query shape boundary, but not indexed shapes comprised of multiple disjoint parts. The solution proposed here is to index a point per disjoint part in such a way that it can be easily retrieved (e.g. DocValues) and then a post-process of WithinPrefixTreeFilter would remove false-positives. This isn't particularly hard/advanced but it requires some advances in some APIs that aren't quite there yet. Spatial4j's ShapeCollection (aka WKT GeometryCollection or Multi*) needs to get released, it needs a vertex iterator. There needs to be code to read and write a set of points to a BinaryDocValues field (1/doc). And finally of course WithinPrefixTreeFilter needs to have a mode in which it uses the smallest buffer and then in the end checks the DocValues to remove false-postivies. was: LUCENE-4644 adds a useful initial capability to implement the "Within" predicate for a RecursivePrefixTree based field. But it will match false-positives for indexed shapes comprised of multiple disjoint parts. The solution to be worked out here is to index a point per disjoint part in such a way that it can be easily retrieved (e.g. DocValues) and then a post-process to WithinPrefixTreeFilter would remove false-positives. I didn't call this a 'bug' because this addresses a known temporary limitation, and Within is still useful despite this. Summary: Optimize IsWithin spatial RPT to use a point cache for false-positve removal (was: Fix the Within spatial predicate PrefixTree to remove false-positives) > Optimize IsWithin spatial RPT to use a point cache for false-positve removal > ---------------------------------------------------------------------------- > > Key: LUCENE-4869 > URL: https://issues.apache.org/jira/browse/LUCENE-4869 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/spatial > Reporter: David Smiley > > LUCENE-4644 implemented the "IsWithin" predicate for a RecursivePrefixTree > based field. It's slow since it looks across the whole world to ensure it > doesn't match docs with data anywhere outside the query shape. It can be > configured to only look outside the query shape using a very small buffer > distance, and that will filter out documents spanning the query shape > boundary, but not indexed shapes comprised of multiple disjoint parts. The > solution proposed here is to index a point per disjoint part in such a way > that it can be easily retrieved (e.g. DocValues) and then a post-process of > WithinPrefixTreeFilter would remove false-positives. > This isn't particularly hard/advanced but it requires some advances in some > APIs that aren't quite there yet. Spatial4j's ShapeCollection (aka WKT > GeometryCollection or Multi*) needs to get released, it needs a vertex > iterator. There needs to be code to read and write a set of points to a > BinaryDocValues field (1/doc). And finally of course WithinPrefixTreeFilter > needs to have a mode in which it uses the smallest buffer and then in the end > checks the DocValues to remove false-postivies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org