[
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115244#comment-13115244
]
David Smiley commented on SOLR-2155:
------------------------------------
Your use-case is a feature I have intended to have LSP address in a direct
manner when I have time. In the mean time, there are a couple approaches that
should work.
The first approach that comes to mind is to use the LSP QuadPrefixTree with
LSP's ability to index rectangles. You would treat the x dimension as time,
and ignore the y dimension (use 0). What helps make this possible is LSP's
unique ability to index shapes other than points, and in an efficient manner.
The only spatial filter query operation that LSP supports right now is an
intersection. If your query is simply a point (a specific time) then this is
fine, or if it is a time duration and you want all stores that were open for at
least part of this time, then it's fine. If your query is a time duration and
you want it to reside completely _within_ an indexed time duration, then
no-can-do for now. Based on the nature of your use-case, it may suffice to use
multiple spatial filter queries, each one a point (time) at each hour interval
of the desired query duration.
The second approach is similar to your suggestion but for y = closing time, not
the delta. y should always be > x. I just did some sample Venn diagrams to
verify this approach. If you want to find documents with an indexed duration
that completely overlaps your query time, then you do a bounding box filter
query from x=0-starttime and y=endtime-max (where max is the maximum indexable
time). When you initialize the LSP QuadPrefixTree you need to tell it the range
of values. Some time ago when writing tests, I discovered it simply can't
handle Double.MAX_VALUE, but I imagine it will handle your 30,000. If you want
to use this patch (SOLR-2155) and not LSP then you will instead have to map
your times to latitude-longitude ranges and use a Geohash grid length with
granularity sufficient to differentiate your smallest unit of time (5min).
I think the 2nd approach is simplest and ideal based on what you've said about
your needs.
If you want help with LSP then email me directly: [email protected]
> Geospatial search using geohash prefixes
> ----------------------------------------
>
> Key: SOLR-2155
> URL: https://issues.apache.org/jira/browse/SOLR-2155
> Project: Solr
> Issue Type: Improvement
> Reporter: David Smiley
> Assignee: Grant Ingersoll
> Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch,
> GeoHashPrefixFilter.patch,
> SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch,
> SOLR.2155.p3tests.patch
>
>
> There currently isn't a solution in Solr for doing geospatial filtering on
> documents that have a variable number of points. This scenario occurs when
> there is location extraction (i.e. via a "gazateer") occurring on free text.
> None, one, or many geospatial locations might be extracted from any given
> document and users want to limit their search results to those occurring in a
> user-specified area.
> I've implemented this by furthering the GeoHash based work in Lucene/Solr
> with a geohash prefix based filter. A geohash refers to a lat-lon box on the
> earth. Each successive character added further subdivides the box into a 4x8
> (or 8x4 depending on the even/odd length of the geohash) grid. The first
> step in this scheme is figuring out which geohash grid squares cover the
> user's search query. I've added various extra methods to GeoHashUtils (and
> added tests) to assist in this purpose. The next step is an actual Lucene
> Filter, GeoHashPrefixFilter, that uses these geohash prefixes in
> TermsEnum.seek() to skip to relevant grid squares in the index. Once a
> matching geohash grid is found, the points therein are compared against the
> user's query to see if it matches. I created an abstraction GeoShape
> extended by subclasses named PointDistance... and CartesianBox.... to support
> different queried shapes so that the filter need not care about these details.
> This work was presented at LuceneRevolution in Boston on October 8th.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]