[
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993703#comment-12993703
]
David Smiley commented on SOLR-2155:
------------------------------------
Nice, Bill. Why are you asking for the field to be character delimited instead
of asking for separate values (which translates to separate indexed terms)?
And I noticed your patch included haversine code; were you unaware of the same
code in a utility function in a DistanceUtils class (from memory)?
Any way... I was thinking of this problem last night. The main challenge with
distance sorting I see is scalability, not coming up with something that merely
works. If the use-case is wanting to see the top X results out of potentially
a million, then I think a fast solution would be code that only calculates that
top X, and that leverages the geospatial index (geohashes). It could start with
the boxes covering the filter area and then it could keep contracting the grid
coverage area to the point that any further contraction wouldn't meet the
desired top-X threshold. To do this efficiently, it needs a single filter
bitset of all doc ids that are actually in the search results, and it needs to
know the center of the user query, and the bounding box of the user query for
its starting point. This might be pretty fast, but it wouldn't be very
cacheable if further search refinements occur while keeping the same geospatial
filter. So the code would be simpler if my filter here recognized that a sort
is ultimately required which would cause it to go through every point (down to
the full precision) and put the doc ids in a sorted list. That's probably the
best approach, on balance.
> Geospatial search using geohash prefixes
> ----------------------------------------
>
> Key: SOLR-2155
> URL: https://issues.apache.org/jira/browse/SOLR-2155
> Project: Solr
> Issue Type: Improvement
> Reporter: David Smiley
> Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch,
> GeoHashPrefixFilter.patch, SOLR.2155.p2.patch
>
>
> There currently isn't a solution in Solr for doing geospatial filtering on
> documents that have a variable number of points. This scenario occurs when
> there is location extraction (i.e. via a "gazateer") occurring on free text.
> None, one, or many geospatial locations might be extracted from any given
> document and users want to limit their search results to those occurring in a
> user-specified area.
> I've implemented this by furthering the GeoHash based work in Lucene/Solr
> with a geohash prefix based filter. A geohash refers to a lat-lon box on the
> earth. Each successive character added further subdivides the box into a 4x8
> (or 8x4 depending on the even/odd length of the geohash) grid. The first
> step in this scheme is figuring out which geohash grid squares cover the
> user's search query. I've added various extra methods to GeoHashUtils (and
> added tests) to assist in this purpose. The next step is an actual Lucene
> Filter, GeoHashPrefixFilter, that uses these geohash prefixes in
> TermsEnum.seek() to skip to relevant grid squares in the index. Once a
> matching geohash grid is found, the points therein are compared against the
> user's query to see if it matches. I created an abstraction GeoShape
> extended by subclasses named PointDistance... and CartesianBox.... to support
> different queried shapes so that the filter need not care about these details.
> This work was presented at LuceneRevolution in Boston on October 8th.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]