[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

David Smiley (Commented) (JIRA) Mon, 26 Sep 2011 22:57:41 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115244#comment-13115244
 ]


David Smiley commented on SOLR-2155:
------------------------------------

Your use-case is a feature I have intended to have LSP address in a direct 
manner when I have time. In the mean time, there are a couple approaches that 
should work.

The first approach that comes to mind is to use the LSP QuadPrefixTree with 
LSP's ability to index rectangles.  You would treat the x dimension as time, 
and ignore the y dimension (use 0). What helps make this possible is LSP's 
unique ability to index shapes other than points, and in an efficient manner.  
The only spatial filter query operation that LSP supports right now is an 
intersection. If your query is simply a point (a specific time) then this is 
fine, or if it is a time duration and you want all stores that were open for at 
least part of this time, then it's fine. If your query is a time duration and 
you want it to reside completely _within_ an indexed time duration, then 
no-can-do for now.  Based on the nature of your use-case, it may suffice to use 
multiple spatial filter queries, each one a point (time) at each hour interval 
of the desired query duration.

The second approach is similar to your suggestion but for y = closing time, not 
the delta.  y should always be > x. I just did some sample Venn diagrams to 
verify this approach. If you want to find documents with an indexed duration 
that completely overlaps your query time, then you do a bounding box filter 
query from x=0-starttime and y=endtime-max (where max is the maximum indexable 
time). When you initialize the LSP QuadPrefixTree you need to tell it the range 
of values.  Some time ago when writing tests, I discovered it simply can't 
handle Double.MAX_VALUE, but I imagine it will handle your 30,000.  If you want 
to use this patch (SOLR-2155) and not LSP then you will instead have to map 
your times to latitude-longitude ranges and use a Geohash grid length with 
granularity sufficient to differentiate your smallest unit of time (5min).

I think the 2nd approach is simplest and ideal based on what you've said about 
your needs.

If you want help with LSP then email me directly: [email protected]
                
> Geospatial search using geohash prefixes
> ----------------------------------------
>
>                 Key: SOLR-2155
>                 URL: https://issues.apache.org/jira/browse/SOLR-2155
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: David Smiley
>            Assignee: Grant Ingersoll
>         Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
> GeoHashPrefixFilter.patch, 
> SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, 
> SOLR.2155.p3tests.patch
>
>
> There currently isn't a solution in Solr for doing geospatial filtering on 
> documents that have a variable number of points.  This scenario occurs when 
> there is location extraction (i.e. via a "gazateer") occurring on free text.  
> None, one, or many geospatial locations might be extracted from any given 
> document and users want to limit their search results to those occurring in a 
> user-specified area.
> I've implemented this by furthering the GeoHash based work in Lucene/Solr 
> with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
> earth.  Each successive character added further subdivides the box into a 4x8 
> (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
> step in this scheme is figuring out which geohash grid squares cover the 
> user's search query.  I've added various extra methods to GeoHashUtils (and 
> added tests) to assist in this purpose.  The next step is an actual Lucene 
> Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
> TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
> matching geohash grid is found, the points therein are compared against the 
> user's query to see if it matches.  I created an abstraction GeoShape 
> extended by subclasses named PointDistance... and CartesianBox.... to support 
> different queried shapes so that the filter need not care about these details.
> This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

Reply via email to