[
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994256#comment-12994256
]
Bill Bell commented on SOLR-2155:
---------------------------------
I did more research. You cannot get from doc to multiple values in the cache
for a field. It does not exist for what I can see. The "docToTermOrd" property
(type Direct8) is an array that is indexed by the document ID, and has one
value (the term ord). It does not appear to be easy to get a list since there
is one value. This was created to easily count the number of documents for
facets (does it have 1 or more). I could do something like the following (but
it would be really slow).
Document doc = searcher.doc(id, fields);
It would be better if you copied each lat long into the index with a prefix
added to the sfield. Like "store_1", "store_2", "store_3", when you index the
values. Then I can grab them easily. Of course you could also just sore them in
one field like that I did but name it store_1 : "lat,lon|lat,lon". If we did
this during indexing it would make it easier for people to use (not having to
copy it) with bars. Asking for 2,3,4 term lists by document ID is probably
slower than just doing the "|" separation.
I keep going back to my patch, and I think it is still pretty good. I hope
others have not went down this same path, since it was not fun.
Improvements potential:
1. Auto populate sfieldmulti when indexing geohash field into "|"
2. Multi-thread the brute force looking for lat longs
3. Use DistanceUtils for hsin
4. Remove split() to improve performance
Bill
> Geospatial search using geohash prefixes
> ----------------------------------------
>
> Key: SOLR-2155
> URL: https://issues.apache.org/jira/browse/SOLR-2155
> Project: Solr
> Issue Type: Improvement
> Reporter: David Smiley
> Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch,
> GeoHashPrefixFilter.patch, SOLR.2155.p2.patch
>
>
> There currently isn't a solution in Solr for doing geospatial filtering on
> documents that have a variable number of points. This scenario occurs when
> there is location extraction (i.e. via a "gazateer") occurring on free text.
> None, one, or many geospatial locations might be extracted from any given
> document and users want to limit their search results to those occurring in a
> user-specified area.
> I've implemented this by furthering the GeoHash based work in Lucene/Solr
> with a geohash prefix based filter. A geohash refers to a lat-lon box on the
> earth. Each successive character added further subdivides the box into a 4x8
> (or 8x4 depending on the even/odd length of the geohash) grid. The first
> step in this scheme is figuring out which geohash grid squares cover the
> user's search query. I've added various extra methods to GeoHashUtils (and
> added tests) to assist in this purpose. The next step is an actual Lucene
> Filter, GeoHashPrefixFilter, that uses these geohash prefixes in
> TermsEnum.seek() to skip to relevant grid squares in the index. Once a
> matching geohash grid is found, the points therein are compared against the
> user's query to see if it matches. I created an abstraction GeoShape
> extended by subclasses named PointDistance... and CartesianBox.... to support
> different queried shapes so that the filter need not care about these details.
> This work was presented at LuceneRevolution in Boston on October 8th.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]