[
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993802#comment-12993802
]
Bill Bell commented on SOLR-2155:
---------------------------------
David,
THis seems to be pretty fast since the results are constrained by d=<km> first,
and then finding the closest points by distance from pt. It is at least as fast
at geodist(). geodist() uses the same algorithm and if you were to duplicate
the lat,long in separate rows, you would be searching on the same number of
fields. The one area we could improve performance would be in the split() regex
call. We could put them into separate fields to speed that up, but I am not an
expert on the API to get dynamic fields. For example: <dynamicField
name="storemv_*" type="string" indexed="true" stored="true"/>. My question
is: "what is the API call to get the fields stored for a document beginning
with "storemv_" ? If we do that we can use a copy field for lat,long values.
I copied the Haversine function that Grant added in
./java/org/apache/solr/search/function/distance/HaversineConstFunction.java,
since I felt geodist() and geomultidist() could use the same distance
calculation since it is named the same. But you are right we should just
convert both programs to use the DistanceUtils class.
I cannot see how we can get accurate distances using boxes (but you know more
about geohash then I do), it would only be an approximation. The boxes work
great for filtering. Then we need something to calculate the distance from pt
to the value in the index. If you want to approximate the distance then boxes
would work, but you kinda have that with the filter right? The use case that I
am trying to solve is: Millions of locations. But the user only selects
d=10,20,50, or 100 and these results are smaller than the overall population of
points. Sort then by distances.
There is a use case that says show me the top 100 closest documents, and I
don't care about the exact order. You solved that already with the filter.
I would vote for making geomultidist() work faster, but I need accurate
distances. This code is pretty good, we can create a few test cases, and submit
to be included since it works with LatLon and geohash... For LatLon this is
pretty the best it gets.
Bill
> Geospatial search using geohash prefixes
> ----------------------------------------
>
> Key: SOLR-2155
> URL: https://issues.apache.org/jira/browse/SOLR-2155
> Project: Solr
> Issue Type: Improvement
> Reporter: David Smiley
> Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch,
> GeoHashPrefixFilter.patch, SOLR.2155.p2.patch
>
>
> There currently isn't a solution in Solr for doing geospatial filtering on
> documents that have a variable number of points. This scenario occurs when
> there is location extraction (i.e. via a "gazateer") occurring on free text.
> None, one, or many geospatial locations might be extracted from any given
> document and users want to limit their search results to those occurring in a
> user-specified area.
> I've implemented this by furthering the GeoHash based work in Lucene/Solr
> with a geohash prefix based filter. A geohash refers to a lat-lon box on the
> earth. Each successive character added further subdivides the box into a 4x8
> (or 8x4 depending on the even/odd length of the geohash) grid. The first
> step in this scheme is figuring out which geohash grid squares cover the
> user's search query. I've added various extra methods to GeoHashUtils (and
> added tests) to assist in this purpose. The next step is an actual Lucene
> Filter, GeoHashPrefixFilter, that uses these geohash prefixes in
> TermsEnum.seek() to skip to relevant grid squares in the index. Once a
> matching geohash grid is found, the points therein are compared against the
> user's query to see if it matches. I created an abstraction GeoShape
> extended by subclasses named PointDistance... and CartesianBox.... to support
> different queried shapes so that the filter need not care about these details.
> This work was presented at LuceneRevolution in Boston on October 8th.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]