[jira] Commented: (SOLR-2155) Geospatial search using geohash prefixes

Bill Bell (JIRA) Fri, 11 Feb 2011 18:24:23 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993802#comment-12993802
 ]


Bill Bell commented on SOLR-2155:
---------------------------------

David,

THis seems to be pretty fast since the results are constrained by d=<km> first, 
and then finding the closest points by distance from pt. It is at least as fast 
at geodist(). geodist() uses the same algorithm and if you were to duplicate 
the lat,long in separate rows, you would be searching on the same number of 
fields. The one area we could improve performance would be in the split() regex 
call. We could put them into separate fields to speed that up, but I am not an 
expert on the API to get dynamic fields. For example: <dynamicField 
name="storemv_*"  type="string"    indexed="true"  stored="true"/>. My question 
is: "what is the API call to get the fields stored for a document beginning 
with "storemv_" ?  If we do that we can use a copy field for lat,long values.

I copied the Haversine function that Grant added in 
./java/org/apache/solr/search/function/distance/HaversineConstFunction.java, 
since I felt geodist() and geomultidist() could use the same distance 
calculation since it is named the same. But you are right we should just 
convert both programs to use the DistanceUtils class.

I cannot see how we can get accurate distances using boxes (but you know more 
about geohash then I do), it would only be an approximation. The boxes work 
great for filtering. Then we need something to calculate the distance from pt 
to the value in the index. If you want to approximate the distance then boxes 
would work, but you kinda have that with the filter right? The use case that I 
am trying to solve is: Millions of locations. But the user only selects 
d=10,20,50, or 100 and these results are smaller than the overall population of 
points. Sort then by distances.

There is a use case that says show me the top 100 closest documents, and I 
don't care about the exact order. You solved that already with the filter.

I would vote for making geomultidist() work faster, but I need accurate 
distances. This code is pretty good, we can create a few test cases, and submit 
to be included since it works with LatLon and geohash...  For LatLon this is 
pretty the best it gets.

Bill






> Geospatial search using geohash prefixes
> ----------------------------------------
>
>                 Key: SOLR-2155
>                 URL: https://issues.apache.org/jira/browse/SOLR-2155
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: David Smiley
>         Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
> GeoHashPrefixFilter.patch, SOLR.2155.p2.patch
>
>
> There currently isn't a solution in Solr for doing geospatial filtering on 
> documents that have a variable number of points.  This scenario occurs when 
> there is location extraction (i.e. via a "gazateer") occurring on free text.  
> None, one, or many geospatial locations might be extracted from any given 
> document and users want to limit their search results to those occurring in a 
> user-specified area.
> I've implemented this by furthering the GeoHash based work in Lucene/Solr 
> with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
> earth.  Each successive character added further subdivides the box into a 4x8 
> (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
> step in this scheme is figuring out which geohash grid squares cover the 
> user's search query.  I've added various extra methods to GeoHashUtils (and 
> added tests) to assist in this purpose.  The next step is an actual Lucene 
> Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
> TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
> matching geohash grid is found, the points therein are compared against the 
> user's query to see if it matches.  I created an abstraction GeoShape 
> extended by subclasses named PointDistance... and CartesianBox.... to support 
> different queried shapes so that the filter need not care about these details.
> This work was presented at LuceneRevolution in Boston on October 8th.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (SOLR-2155) Geospatial search using geohash prefixes

Reply via email to