[ 
https://issues.apache.org/jira/browse/LUCENE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2395:
----------------------------------

    Description: 
In a chat with Chris Male and my own ideas when implementing for PANGAEA, I 
thought about the broken distance query in contrib. It lacks the following 
features:
- It needs a query/filter for the enclosing bbox (which is constant score)
- It needs a separate filter for filtering out hits to far away (inside bbox 
but outside distance limit)
- It has no scoring, so if somebody wants to sort by distance, he needs to use 
the custom sort. For that to work, spatial caches distance calculation (which 
is broken for multi-segment search)

The idea is now to combine all three things into one query, but customizeable:

We first thought about extending CustomScoreQuery and calculate the distance 
from FieldCache in the customScore method and return a score of 1 for 
distance=0, score=0 on the max distance and score<0 for farer hits, that are in 
the bounding box but not in the distance circle. To filter out such negative 
scores, we would need to override the scorer in CustomScoreQuery which is 
priate.

My proposal is now to use a very stripped down CustomScoreQuery (but not extend 
it) that does call a method getDistance(docId) in its scorer's advance and 
nextDoc that calculates the distance for the current doc. It stores this 
distance also in the scorer. If the distance > maxDistance it throws away the 
hit and calls nextDoc() again. The score() method will reurn per default 
weight.value*(maxDistance - distance)/maxDistance and uses the precalculated 
distance. So the distance is only calculated one time in nextDoc()/advance().

To be able to plug in custom scoring, the following methods in the query can be 
overridden:
- float getDistanceScore(double distance) - returns per default: (maxDistance - 
distance)/maxDistance; allows score customization
- DocIdSet getBoundingBoxDocIdSet(Reader, LatLng sw, LatLng ne) - returns an 
DocIdSet for the bounding box. Per default it returns e.g. the docIdSet of a 
NRF or a cartesian tier filter. You can even plug in any other DocIdSet, e.g. 
wrap a Query with QueryWrapperFilter
- support a setter for the GeoDistanceCalculator that is used by the scorer to 
get the distance.
- a LatLng provider (similar to CustomScoreProvider/ValueSource) that returns 
for a given doc id the lat/lng. This method is called per IndexReader one time 
in scorer creation and will retrieve the coordinates. By that we support 
FieldCache or whatever.

This query is almost finished in my head, it just needs coding :-)

  was:
In a chat with Chris Male and my own ideas when implemnting for PANGAEA, I 
thought about the broken distance query in contrib. It lacks the folloing 
features:
- It needs a query for the encldoing bbox (which is constant score)
- It needs a separate filter for filtering out distances
- It has no scoring, so if somebody wants to sort by distance, he needs to use 
the custom sort. For that to work, spatial caches distance calculation (which 
is borken for multi-segment search)

The idea is now to combine all three things into one query, but customizeable:

We first thought about extending CustomScoreQuery and calculate the distance 
from FieldCache in the customScore method and return a score of 1 for 
distance=0, score=0 on the max distance and score<0 for farer hits, that are in 
the bounding box but not in the distance circle. To filter out such negative 
scores, we would need to override the scorer in CustomScoreQuery which is 
priate.

My proposal is now to use a very stripped down CustomScoreQuery (but not extend 
it) that does call a method getDistance(docId) in its scorer's advance and 
nextDoc that calculates the distance for the current doc. It stores this 
distance also in the scorer. If the distance > maxDistance it throws away the 
hit and calls nextDoc() again. The score() method will reurn per default 
weight.value*(maxDistance - distance)/maxDistance and uses the precalculated 
distance. So the distance is only calculated one time in nextDoc()/advance().

To be able to plug in custom scoring, the following methods in the query can be 
overridden:
- float getDistanceScore(double distance) - returns per default: (maxDistance - 
distance)/maxDistance; allows score customization
- DocIdSet getBoundingBoxDocIdSet(Reader, LatLng sw, LatLng ne) - returns an 
DocIdSet for the bounding box. Per default it returns e.g. the docIdSet of a 
NRF or a cartesian tier filter. You can even plug in any other DocIdSet, e.g. 
wrap a Query with QueryWrapperFilter
- support a setter for the GeoDistanceCalculator that is used by the scorer to 
get the distance.

This query is almost finished in my head, it just needs coding :-)


> Add a scoring DistanceQuery that does not need caches and separate filters
> --------------------------------------------------------------------------
>
>                 Key: LUCENE-2395
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2395
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/spatial
>            Reporter: Uwe Schindler
>             Fix For: 3.1
>
>
> In a chat with Chris Male and my own ideas when implementing for PANGAEA, I 
> thought about the broken distance query in contrib. It lacks the following 
> features:
> - It needs a query/filter for the enclosing bbox (which is constant score)
> - It needs a separate filter for filtering out hits to far away (inside bbox 
> but outside distance limit)
> - It has no scoring, so if somebody wants to sort by distance, he needs to 
> use the custom sort. For that to work, spatial caches distance calculation 
> (which is broken for multi-segment search)
> The idea is now to combine all three things into one query, but customizeable:
> We first thought about extending CustomScoreQuery and calculate the distance 
> from FieldCache in the customScore method and return a score of 1 for 
> distance=0, score=0 on the max distance and score<0 for farer hits, that are 
> in the bounding box but not in the distance circle. To filter out such 
> negative scores, we would need to override the scorer in CustomScoreQuery 
> which is priate.
> My proposal is now to use a very stripped down CustomScoreQuery (but not 
> extend it) that does call a method getDistance(docId) in its scorer's advance 
> and nextDoc that calculates the distance for the current doc. It stores this 
> distance also in the scorer. If the distance > maxDistance it throws away the 
> hit and calls nextDoc() again. The score() method will reurn per default 
> weight.value*(maxDistance - distance)/maxDistance and uses the precalculated 
> distance. So the distance is only calculated one time in nextDoc()/advance().
> To be able to plug in custom scoring, the following methods in the query can 
> be overridden:
> - float getDistanceScore(double distance) - returns per default: (maxDistance 
> - distance)/maxDistance; allows score customization
> - DocIdSet getBoundingBoxDocIdSet(Reader, LatLng sw, LatLng ne) - returns an 
> DocIdSet for the bounding box. Per default it returns e.g. the docIdSet of a 
> NRF or a cartesian tier filter. You can even plug in any other DocIdSet, e.g. 
> wrap a Query with QueryWrapperFilter
> - support a setter for the GeoDistanceCalculator that is used by the scorer 
> to get the distance.
> - a LatLng provider (similar to CustomScoreProvider/ValueSource) that returns 
> for a given doc id the lat/lng. This method is called per IndexReader one 
> time in scorer creation and will retrieve the coordinates. By that we support 
> FieldCache or whatever.
> This query is almost finished in my head, it just needs coding :-)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to