[jira] [Commented] (LUCENE-6685) GeoPointInBBox/Distance queries should have safeguards

Michael McCandless (JIRA) Sun, 26 Jul 2015 14:12:18 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14642094#comment-14642094
 ]


Michael McCandless commented on LUCENE-6685:
--------------------------------------------

bq. There was also a bug in the pointDistance query that added unnecessary high 
resolution ranges that fell within the bounding box but outside the actual 
pointRadius.

This was a performance bug but not a correctness bug?  Is there some
way we could get a test case that exposes it?

bq. Dynamically compute detail level based on query size (includes min/max 
bounds on detail level)

Hmm the javadocs refer to {{detailLevel}} as a parameter, but I think
you cannot change it right?  It's always 5% of the smallest lat or lon
range?  Or did you mean to make it configurable via the API?

This means the query is actually "pixelated" on the boundary?  So,
points outside of the shape but inside of a pixel crossing the shape's
boundary, will also be included?  Maybe improve the javadocs to
clarify this?

Also I found this confusing in the javadocs:

{quote}
The higher the resolution the greater the number of ranges along the query 
boundary. This results in visiting fewer terms
in the terms dictionary for the price of memory usage. 
{quote}

Shouldn't it be more not fewer?  I.e., higher resolution means better
accuracy (less pixelation), but slower CPU, more memory?

Hmm this (3) is a very low prec-step, i.e. index size will be much
larger (~ 22 terms per lat/lon I think, vs ~11 terms now):

{noformat}
 public final class GeoPointField extends Field {
-  public static final int PRECISION_STEP = 6;
+  public static final int PRECISION_STEP = 3;
{noformat}

Looks like this just added a typo?:

{noformat}
-// TODO: remove this?  Just absorb into its base class
+
+// TODO: remove this?  Just absorb into its base class5
{noformat}

Can you try to add { } around single-line-in-body if statements
(matches Lucene's coding style)?

{noformat}
@@ -436,9 +437,8 @@ public class TestGeoPointQuery extends LuceneTestCase {
                 verifyHits = new VerifyHits() {
                     @Override
                     protected Boolean shouldMatch(double pointLat, double 
pointLon) {
-                      if (Double.isNaN(pointLat) || Double.isNaN(pointLon)) {
+                      if (Double.isNaN(pointLat) || Double.isNaN(pointLon))
                         return null;
-                      }
                       if (radiusQueryCanBeWrong(centerLat, centerLon, 
pointLon, pointLat, radius)) {
                         return null;
                       } else {
{noformat}


> GeoPointInBBox/Distance queries should have safeguards
> ------------------------------------------------------
>
>                 Key: LUCENE-6685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6685
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: 5.3, Trunk
>
>         Attachments: LUCENE-6685.patch, LUCENE-6685.patch, LUCENE-6685.patch
>
>
> These queries build a big list of term ranges, where the size of the list is 
> in proportion to how many cells of the space filling curve are "crossed" by 
> the perimeter of the query (I think?).
> This can easily be 100s of MBs for a big enough query ... not to mention slow 
> to enumerate (we still do this again for each segment).
> I think the queries should have safeguards, much like we have 
> maxDeterminizedStates for Automaton based queries, to prevent accidental 
> OOMEs.
> But I think longer term we should either change the ranges to be enumerated 
> on-demand and never stored in entirety (like NumericRangeTermsEnum), or 
> change the query so it has a fixed budget of how many cells it's allowed to 
> visit and then within a crossing cell it uses doc values to post-filter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6685) GeoPointInBBox/Distance queries should have safeguards

Reply via email to