[ 
https://issues.apache.org/jira/browse/LUCENE-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030789#comment-17030789
 ] 

Robert Muir commented on LUCENE-9154:
-------------------------------------

{quote}
But then I do not really understand why we are trying to match our custom 
numerical representation against full doubles. 
{quote}

Easy: the java language only supports double and float. it has casts and 
conversion rules around that so programmers don't hit surprises.

If we changed this field to simply encode a float, and used float data type, 
lucene wouldn't be creating any inaccuracy anywhere. The user's compiler would 
guide them and it would be intuitive. The tradeoff is loss of more precision 
(in exchange for expanded range which is not useful).

On the other hand, if a user wants to try that out, they can index 2D 
FloatPoint today and issue bounding box (2-D) against it very easily. 

Today the user passes double, but gets precision that is between a float and a 
double, using only the space of a float: that's how this field was designed, to 
specialize for a specific use-case. 

Such precision loss only needs to happen at index time, that is when it is 
stored. And it is transparent to the user (or developer debugging tests) 
because they can look at the docvalues field to see what the value became. 
There is no need to arbitrarily introduce more inaccuracy at query-time, in 
fact it is necessary NOT TO: tests can be exact and not have "fudge factors" 
and so on.

> Remove encodeCeil()  to encode bounding box queries
> ---------------------------------------------------
>
>                 Key: LUCENE-9154
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9154
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Ignacio Vera
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We currently have the following logic in LatLonPoint#newBoxquery():
> {code:java}
>  // exact double values of lat=90.0D and lon=180.0D must be treated special 
> as they are not represented in the encoding
> // and should not drag in extra bogus junk! TODO: should encodeCeil just 
> throw ArithmeticException to be less trappy here?
> if (minLatitude == 90.0) {
>   // range cannot match as 90.0 can never exist
>   return new MatchNoDocsQuery("LatLonPoint.newBoxQuery with 
> minLatitude=90.0");
> }
> if (minLongitude == 180.0) {
>   if (maxLongitude == 180.0) {
>     // range cannot match as 180.0 can never exist
>     return new MatchNoDocsQuery("LatLonPoint.newBoxQuery with 
> minLongitude=maxLongitude=180.0");
>   } else if (maxLongitude < minLongitude) {
>     // encodeCeil() with dateline wrapping!
>     minLongitude = -180.0;
>   }
> }
> byte[] lower = encodeCeil(minLatitude, minLongitude);
> byte[] upper = encode(maxLatitude, maxLongitude);
> {code}
>  
> IMO opinion this is confusing and can lead to strange results. For example a 
> query with {{minLatitude = minLatitude = 90}} does not match points with 
> {{latitude = 90}}. On the other hand a query with {{minLatitude = 
> minLatitude}} = 89.99999996}} will match points at latitude = 90.
> I don't really understand the statement that says: {{90.0 can never exist}} 
> as this is as well true for values > 89.99999995809048 which is the maximum 
> quantize value. In this argument, this will be true for all values between 
> quantize coordinates as they do not exist in the index, why 90D is so 
> special? I guess because it cannot be ceil up without overflowing the 
> encoding.
> Another argument to remove this function is that it opens the room to have 
> false negatives in the result of the query. if a query has minLon = 
> 89.999999957, it won't match points with longitude = 89.999999957 as it is 
> rounded up to 89.99999995809048.
> The only merit I can see in the current approach is that if you only index 
> points that are already quantize, then all queries would be exact. But does 
> it make sense for someone to only index quantize values and then query by 
> non-quantize bounding boxes?
>  
> I hope I am missing something, but my proposal is to remove encodeCeil all 
> together and remove all the special handling at the positive pole and 
> positive dateline.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to