[jira] Commented: (SOLR-2155) Geospatial search using geohash prefixes

David Smiley (JIRA) Fri, 11 Feb 2011 13:15:23 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993703#comment-12993703
 ]


David Smiley commented on SOLR-2155:
------------------------------------

Nice, Bill. Why are you asking for the field to be character delimited instead 
of asking for separate values (which translates to separate indexed terms)?  
And I noticed your patch included haversine code; were you unaware of the same 
code in a utility function in a DistanceUtils class (from memory)?

Any way... I was thinking of this problem last night. The main challenge with 
distance sorting I see is scalability, not coming up with something that merely 
works.  If the use-case is wanting to see the top X results out of potentially 
a million, then I think a fast solution would be code that only calculates that 
top X, and that leverages the geospatial index (geohashes). It could start with 
the boxes covering the filter area and then it could keep contracting the grid 
coverage area to the point that any further contraction wouldn't meet the 
desired top-X threshold.  To do this efficiently, it needs a single filter 
bitset of all doc ids that are actually in the search results, and it needs to 
know the center of the user query, and the bounding box of the user query for 
its starting point. This might be pretty fast, but it wouldn't be very 
cacheable if further search refinements occur while keeping the same geospatial 
filter.  So the code would be simpler if my filter here recognized that a sort 
is ultimately required which would cause it to go through every point (down to 
the full precision) and put the doc ids in a sorted list.  That's probably the 
best approach, on balance.

> Geospatial search using geohash prefixes
> ----------------------------------------
>
>                 Key: SOLR-2155
>                 URL: https://issues.apache.org/jira/browse/SOLR-2155
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: David Smiley
>         Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
> GeoHashPrefixFilter.patch, SOLR.2155.p2.patch
>
>
> There currently isn't a solution in Solr for doing geospatial filtering on 
> documents that have a variable number of points.  This scenario occurs when 
> there is location extraction (i.e. via a "gazateer") occurring on free text.  
> None, one, or many geospatial locations might be extracted from any given 
> document and users want to limit their search results to those occurring in a 
> user-specified area.
> I've implemented this by furthering the GeoHash based work in Lucene/Solr 
> with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
> earth.  Each successive character added further subdivides the box into a 4x8 
> (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
> step in this scheme is figuring out which geohash grid squares cover the 
> user's search query.  I've added various extra methods to GeoHashUtils (and 
> added tests) to assist in this purpose.  The next step is an actual Lucene 
> Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
> TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
> matching geohash grid is found, the points therein are compared against the 
> user's query to see if it matches.  I created an abstraction GeoShape 
> extended by subclasses named PointDistance... and CartesianBox.... to support 
> different queried shapes so that the filter need not care about these details.
> This work was presented at LuceneRevolution in Boston on October 8th.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (SOLR-2155) Geospatial search using geohash prefixes

Reply via email to