[jira] [Updated] (SOLR-2155) Geospatial search using geohash prefixes

2012-04-18 Thread David Smiley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-2155:
---

Description: 
{panel:title=NOTICE} The status of this issue is a plugin for Solr 3.x located 
here: https://github.com/dsmiley/SOLR-2155.  Look at the introductory readme 
and download the plugin .jar file.  Lucene 4's new spatial module is largely 
based on this code.  The Solr 4 glue for it should come very soon but as of 
this writing it's hosted temporarily at https://github.com/spatial4j.  For more 
information on using SOLR-2155 with Solr 3, see 
http://wiki.apache.org/solr/SpatialSearch#SOLR-2155  This JIRA issue is closed 
because it won't be committed in its current form.
{panel}

There currently isn't a solution in Solr for doing geospatial filtering on 
documents that have a variable number of points.  This scenario occurs when 
there is location extraction (i.e. via a gazateer) occurring on free text.  
None, one, or many geospatial locations might be extracted from any given 
document and users want to limit their search results to those occurring in a 
user-specified area.

I've implemented this by furthering the GeoHash based work in Lucene/Solr with 
a geohash prefix based filter.  A geohash refers to a lat-lon box on the earth. 
 Each successive character added further subdivides the box into a 4x8 (or 8x4 
depending on the even/odd length of the geohash) grid.  The first step in this 
scheme is figuring out which geohash grid squares cover the user's search 
query.  I've added various extra methods to GeoHashUtils (and added tests) to 
assist in this purpose.  The next step is an actual Lucene Filter, 
GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to 
skip to relevant grid squares in the index.  Once a matching geohash grid is 
found, the points therein are compared against the user's query to see if it 
matches.  I created an abstraction GeoShape extended by subclasses named 
PointDistance... and CartesianBox to support different queried shapes so 
that the filter need not care about these details.

This work was presented at LuceneRevolution in Boston on October 8th.

  was:
{panel:title=NOTICE} There is a .zip attached to the issue with a .jar you can 
drop in to Solr 3.x.  Lucene 4's new spatial module is largely based on this 
code.  The Solr 4 glue for it should come very soon but as of this writing it's 
hosted temporarily at Spatial4j.com.  For more information on using SOLR-2155 
with Solr 3, see http://wiki.apache.org/solr/SpatialSearch#SOLR-2155  This JIRA 
issue is closed because it won't be committed in its current form.
{panel}

There currently isn't a solution in Solr for doing geospatial filtering on 
documents that have a variable number of points.  This scenario occurs when 
there is location extraction (i.e. via a gazateer) occurring on free text.  
None, one, or many geospatial locations might be extracted from any given 
document and users want to limit their search results to those occurring in a 
user-specified area.

I've implemented this by furthering the GeoHash based work in Lucene/Solr with 
a geohash prefix based filter.  A geohash refers to a lat-lon box on the earth. 
 Each successive character added further subdivides the box into a 4x8 (or 8x4 
depending on the even/odd length of the geohash) grid.  The first step in this 
scheme is figuring out which geohash grid squares cover the user's search 
query.  I've added various extra methods to GeoHashUtils (and added tests) to 
assist in this purpose.  The next step is an actual Lucene Filter, 
GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to 
skip to relevant grid squares in the index.  Once a matching geohash grid is 
found, the points therein are compared against the user's query to see if it 
matches.  I created an abstraction GeoShape extended by subclasses named 
PointDistance... and CartesianBox to support different queried shapes so 
that the filter need not care about these details.

This work was presented at LuceneRevolution in Boston on October 8th.


I updated this issue's leading description info box to point to my GitHub repo 
for this code and its evolution.  It's also where the releases are posted now.

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
Assignee: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, 
 SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, 
 SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, 
 Solr2155-1.0.3-project.zip, Solr2155-1.0.4-project.zip, 
 

[jira] [Updated] (SOLR-2155) Geospatial search using geohash prefixes

2012-03-29 Thread David Smiley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-2155:
---

Description: 
{panel:title=NOTICE} There is a .zip attached to the issue with a .jar you can 
drop in to Solr 3.x.  Lucene 4's new spatial module is largely based on this 
code.  The Solr 4 glue for it should come very soon but as of this writing it's 
hosted temporarily at Spatial4j.com.  For more information on using SOLR-2155 
with Solr 3, see http://wiki.apache.org/solr/SpatialSearch#SOLR-2155  This JIRA 
issue is closed because it won't be committed in its current form.
{panel}

There currently isn't a solution in Solr for doing geospatial filtering on 
documents that have a variable number of points.  This scenario occurs when 
there is location extraction (i.e. via a gazateer) occurring on free text.  
None, one, or many geospatial locations might be extracted from any given 
document and users want to limit their search results to those occurring in a 
user-specified area.

I've implemented this by furthering the GeoHash based work in Lucene/Solr with 
a geohash prefix based filter.  A geohash refers to a lat-lon box on the earth. 
 Each successive character added further subdivides the box into a 4x8 (or 8x4 
depending on the even/odd length of the geohash) grid.  The first step in this 
scheme is figuring out which geohash grid squares cover the user's search 
query.  I've added various extra methods to GeoHashUtils (and added tests) to 
assist in this purpose.  The next step is an actual Lucene Filter, 
GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to 
skip to relevant grid squares in the index.  Once a matching geohash grid is 
found, the points therein are compared against the user's query to see if it 
matches.  I created an abstraction GeoShape extended by subclasses named 
PointDistance... and CartesianBox to support different queried shapes so 
that the filter need not care about these details.

This work was presented at LuceneRevolution in Boston on October 8th.

  was:
There currently isn't a solution in Solr for doing geospatial filtering on 
documents that have a variable number of points.  This scenario occurs when 
there is location extraction (i.e. via a gazateer) occurring on free text.  
None, one, or many geospatial locations might be extracted from any given 
document and users want to limit their search results to those occurring in a 
user-specified area.

I've implemented this by furthering the GeoHash based work in Lucene/Solr with 
a geohash prefix based filter.  A geohash refers to a lat-lon box on the earth. 
 Each successive character added further subdivides the box into a 4x8 (or 8x4 
depending on the even/odd length of the geohash) grid.  The first step in this 
scheme is figuring out which geohash grid squares cover the user's search 
query.  I've added various extra methods to GeoHashUtils (and added tests) to 
assist in this purpose.  The next step is an actual Lucene Filter, 
GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to 
skip to relevant grid squares in the index.  Once a matching geohash grid is 
found, the points therein are compared against the user's query to see if it 
matches.  I created an abstraction GeoShape extended by subclasses named 
PointDistance... and CartesianBox to support different queried shapes so 
that the filter need not care about these details.

This work was presented at LuceneRevolution in Boston on October 8th.


 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, 
 SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, 
 SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, 
 Solr2155-1.0.3-project.zip, Solr2155-1.0.4-project.zip, 
 Solr2155-for-1.0.2-3.x-port.patch


 {panel:title=NOTICE} There is a .zip attached to the issue with a .jar you 
 can drop in to Solr 3.x.  Lucene 4's new spatial module is largely based on 
 this code.  The Solr 4 glue for it should come very soon but as of this 
 writing it's hosted temporarily at Spatial4j.com.  For more information on 
 using SOLR-2155 with Solr 3, see 
 http://wiki.apache.org/solr/SpatialSearch#SOLR-2155  This JIRA issue is 
 closed because it won't be committed in its current form.
 {panel}
 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be 

[jira] [Updated] (SOLR-2155) Geospatial search using geohash prefixes

2012-03-19 Thread David Smiley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-2155:
---

Attachment: Solr2155-1.0.4-project.zip

I am attaching Solr2155-1.0.4-project.zip.
Changes:
* Fixed bug in which the Solr's XML response showed the field as a geohash 
instead of lat-lon. This bug was not present for other response formats.
* Include pre-built .jar in the zip for convenience.  README.txt enhanced a 
little too.

And FYI I added some info about SOLR-2155 on [Solr's SpatialSearch wiki page| 
http://wiki.apache.org/solr/SpatialSearch#SOLR-2155].

As I was looking through the source, I realized I _incorrectly_ once stated in 
the comments here that the stored value returned from a search would be the 
same no matter what geohash length you configure. That's not true; you'd have 
to use another field for the stored value if you want to retain the original 
precision.

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, 
 SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, 
 SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, 
 Solr2155-1.0.3-project.zip, Solr2155-1.0.4-project.zip, 
 Solr2155-for-1.0.2-3.x-port.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2155) Geospatial search using geohash prefixes

2012-03-17 Thread Harley Parks (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harley Parks updated SOLR-2155:
---

Comment: was deleted

(was: I used a fresh install of solr 3.4, Just in case our implementation had 
something to do with it, still not able to return the lat,long and only get the 
geohash.

what's interesting is I can change the field's type between solr-2155's geohash 
 solr's geohash, the query returned format changes from geohash to lat,long 
without reindexing.

Where in the code does the geohash value get returned in a query?

Help.)

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, 
 SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, 
 SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, 
 Solr2155-1.0.3-project.zip, Solr2155-for-1.0.2-3.x-port.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2155) Geospatial search using geohash prefixes

2012-03-13 Thread Harley Parks (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harley Parks updated SOLR-2155:
---

Comment: was deleted

(was: I am missing the  package solr2155.lucene.spatial.geometry.shape;

)

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, 
 SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, 
 SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, 
 Solr2155-1.0.3-project.zip, Solr2155-for-1.0.2-3.x-port.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2155) Geospatial search using geohash prefixes

2011-10-05 Thread Mikhail Khludnev (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-2155:
---

Attachment: Solr2155-for-1.0.2-3.x-port.patch

Hi, 
Solr2155-for-1.0.2-3.x-port.patch has the small amendments for the backport:
# exception text for the absent sfield local param;
# add cache enabling recommendation into README.txt (cache name is confusing a 
little)
# fix for UnsupportedOpEx on debugQuery=on for geodist func (but my toString() 
impl seems overcomplicated)

David, 
Please let me know if I can apply this into any codebase.

Thanks for backport!

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
Assignee: Grant Ingersoll
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, 
 SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, 
 SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, 
 Solr2155-for-1.0.2-3.x-port.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2155) Geospatial search using geohash prefixes

2011-10-05 Thread David Smiley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-2155:
---

Attachment: Solr2155-1.0.3-project.zip

Thanks Mikhail; I've uploaded a new version with these changes. I tweaked the 
formatting and another trivial thing or two.

I don't see the point in explicitly configuring the cache named 
fieldValueCache since Solr will create it for you automatically with 
reasonable defaults. But I kept your tip in the README any way.

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
Assignee: Grant Ingersoll
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, 
 SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, 
 SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, 
 Solr2155-1.0.3-project.zip, Solr2155-for-1.0.2-3.x-port.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2155) Geospatial search using geohash prefixes

2011-09-29 Thread David Smiley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-2155:
---

Attachment: Solr2155-1.0.2-project.zip

I ported SOLR-2155 to Solr 3.x and did so in a manner that plugs into an 
unpatched Solr. Any source that the patch modified was copied and moved into 
another package so I could keep this capability independent. The attached zip 
Solr2155-1.0.2-project.zip. Is a maven based project including .git/ for 
history.  You'll need to run mvn package to generate a jar file that you can 
throw into your classpath.  There is skimpy README.txt that tells you want to 
do to your schema  solrconfig files. With this in place, you have multi-value 
geospatial filter  sort for indexed points. And if you use my query parser 
then you get explicit bounding box query filter capability.

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
Assignee: Grant Ingersoll
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, 
 SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, 
 SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-31 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-2155:
---

Attachment: SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch

Attached is a new patch. The highlights are:
 * Requires the latest Solr trunk -- probably anything in the last few months: 
If this is ultimately going to get committed then this needed to happen.  There 
are only some slight differences so if you really need an earlier trunk then 
I'm sure you'll figure it out.
 * Adds support for sorting, including multi-value: Use the existing geodist() 
function query with a lat-lon constant and a reference to your geohash based 
field. Note that this works by loading all points from the field into memory, 
resolving each underlying full-length geohash into the lat  lon into a data 
structure which is a ListPoint2D[].  This is improved over Bill's patch, 
surely, but it could use some optimization.  It's not optimized for the 
single-value case either; that's a definite TODO.
 * Polygon/WKT features have been omitted due to LGPL licensing concerns of 
JTS. I've left hooks for their implementation to make adding on this capability 
that already existed easy. You'll easily figure it out if you are so inclined.  
I might ad this as a patch shortly (not to be committed) when I get some time; 
but longer term it will re-surface under a separate project.  Don't worry; 
it'll be painless to use if you need it.
 * This might be controversial but as part of this patch, I removed the 
ghhsin() and geohash() function queries. Their presence was confusing; I simply 
don't see what point there is too them now that this patch fleshes out the 
geohash capability.
 * I decided to pre-register my SpatialGeoHashFilterQParser as geohashfilt, 
instead of requiring you to do so in solrconfig.xml.  You could use geofilt 
for point-radius queries but I prefer this one since I can specify the bbox 
explicitly.

There are a few slight changes to GeoHashPrefixFilter that crept in from 
unfinished work (notably tying sorting to filtering in an efficient way) but it 
is harmless.

Bill, thanks for kick-starting the multi-value sorting. I re-used most of your 
code.

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
Assignee: Grant Ingersoll
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, 
 SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, 
 SOLR.2155.p3tests.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2155) Geospatial search using geohash prefixes

2011-02-15 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2155:


Attachment: SOLR.2155.p3tests.patch

Test cases for geomultidist() function.

Add this and SOLR.2155.p3.patch

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2155) Geospatial search using geohash prefixes

2011-02-14 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2155:


Attachment: SOLR.2155.p3.patch

This is the patch with some speed improvements. 

Example call:

http://localhost:8983/solr/select?q=*:*fq={!geofilt}sfieldmulti=storemvpt=43.17614,-90.57341d=100sfield=storesort=geomultidist%28%29%20ascsfieldmultidir=asc



 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, SOLR.2155.p2.patch, SOLR.2155.p3.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2155) Geospatial search using geohash prefixes

2011-02-14 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2155:


Attachment: (was: SOLR.2155.p2.patch)

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, SOLR.2155.p3.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2155) Geospatial search using geohash prefixes

2011-02-11 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2155:


Attachment: SOLR.2155.p2.patch

New file. This works with geohash and normal LatLon. Here is an example with 
LatLon. The new field added is storemv. It is bar delimited. New fields:

sfieldmultidir - asc or desc
sfieldmulti - name of the field

Can use for sorting or scoring. It will check all points in sfieldmulti field 
and find closest or farthest points.

{code}

http://localhost:8983/solr/select?rows=1000q=_val_:%22geomultidist%28%29%22fl=storemv,score,storefq={!geofilt}sfieldmultidir=ascsfieldmulti=storemvpt=45.17614,-93.87341d=1sfield=storesort=geomultidist%28%29%20asc

{code}



 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, SOLR.2155.p2.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2155) Geospatial search using geohash prefixes

2011-02-11 Thread Bill Bell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bell updated SOLR-2155:


Attachment: (was: SOLR.2155.p2.patch)

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch, SOLR.2155.p2.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2155) Geospatial search using geohash prefixes

2011-01-28 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-2155:
---

Attachment: GeoHashPrefixFilter.patch

Here is another patch.  By the way, I'm using revision 1055285 of trunk.
* Removed @author tags.
* Introduced a constant threshold at which a term scan is done instead of 
divide  conquer. GRIDLEN_SCAN_THRESHOLD.  It used to be 2, meaning if maxlen 
is 9 then once we get to grid level 7 then the remaining leaves are scanned 
manually instead of making more boxes. I should make this configurable but it 
is not at this time.
* By setting GRIDLEN_SCAN_THRESHOLD to 4, I found the performance to be 
superior for the geonames data when the query shape was more complex than a 
bbox.  I haven't truly tuned this though.
* Added polygon search based on JTS that will handle any WKT (well known 
text) query string!  The JTS library (LGPL licensed) is downloaded similarly to 
how the bdb contrib module downloads sleepycat.  The only limitation with 
this is that I don't do any special world boundary processing, which mainly 
matters at the dateline.  That's a TODO.
* Added SpatialGeohashFilterQParser.  I don't like SpatialFilterQParser. This 
one handles, point-radius, bounding box, polygon, and WKT geometry inputs.  The 
argument and inputs were developed to be made easily compatible with the geo 
extension to the open-search spec. If JTS is not on the classpath then this 
query parser should still work provided you don't do polygon or WKT (not 
verified but should work in theory).
* Added a test for doing a polygon search.  And I made the existing lat-lon 
test get executed for both geohash and latlon type.

Here is an updated benchmark.  I'm doing geohash of length 9 and this time with 
the threshold mentioned above at 4.  The query is a *circle* (no bbox).  This 
triggers the LatLonType field to do a completely different algorithm in which 
it _loads every value into memory_ via the field cache and does a brute force 
match.  This GeoHash prefix filter has never used the field cache!  It uses 
Lucene's index.  The places/query (which is an average) actually varied by 
one between both implementations.  Could indicate a bug or some math rounding 
issue at the edge. And another point is that these benchmarks almost certainly 
resulted in my OS disk cache putting the relevant index files into memory.

||km||places/query||ms/query (LatLon)||ms/query (geohash)||
|11|587| 10.0| 4.8|
|44| 3,404| 11.5| 4.3|
|230|45,536| 21.8|24.0|
|1800| 1,319,692|   288.5|142.3|

I'm pretty happy with it at this point and I'll sit on it for a while, 
gathering feedback.

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
 GeoHashPrefixFilter.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2155) Geospatial search using geohash prefixes

2011-01-18 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-2155:
---

Attachment: GeoHashPrefixFilter.patch

Attached is my latest patch for geohash prefix based geospatial search.  This 
patch is a performance-centric update.  GeoHashes are still used, but I'm 
indexing a token for every geohash length per point indexed.  So a geohash 
length of 9 results in 9 tokens.  This solves performance issues when a huge 
number of points were matching a query.  See the below table:

||km||places/query||ms/query (LatLon)||ms/query (geohash)||
|11|692|3.8| 5.0|
|44|4,043|  4.8|6.0|
|230|   57,200| 15.0|17.5|
|1800|  1,405,767|  94.0|71.0|

The LatLon is using a pair of trie doubles at a precisionStep of 8.  I tried 6 
 16 but 8 was about right.  The GeoHash length (a new configurable option) was 
chosen to be 9 which has plenty of precision for most uses (I recall it's a 
couple meters or less; I forget).  The queries are bounding lat-lon boxes.

What isn't in this performance table is the impact of this new algorithm on 
more complicated spatial queries.  It's superior to the algorithm that existed 
before it and it should also be superior to LatLonType.  Grid boxes that are 
completely within the query shape get efficiently added in one fell swoop.

Code details:
* Most of the former patch, which included a lot of additions to GeoHashUtils 
is no longer present in this new patch.  This was basically a rewrite.
* I abstracted use of GeoHashUtils to GridNode.GridReferenceSystem class so 
that in the future I can tinker with alternate more efficient encodings without 
breaking any code here.
* I needed a shape interface or abstract class and so I decided to embrace  
extend org.apache.lucene.spatial.geometry.shape.Geometry2D instead of having my 
own like I did before.  I added PointDistanceGeom  MultiGeom.
* There is an extensive random data filter test in SpatialFilterTest that I 
added.  It's hard to follow but it teased out a few bugs.

Next patch real soon:
* I'm going to modify the build.xml to grab the LGPL licensed JTS library which 
has well-tested  high performance geometry code.  In particular, I'll use it 
to implement a polygon shape.  (Already done in another codebase; just needs to 
be ported to this patch)
* I'm going to include an alternative query parser to what comes with Solr.  
This one will do all of point-distance, lat-lon box, and polygon. (Already done 
in another codebase; just needs to be ported to this patch).

Future:
* Replace geohash with something more efficient.  Some basic testing suggests 
to me I could double-or-better the performance.
* Compatibility with distance sorting / relevancy boosting when not 
multi-valued.

I'd really like input from other geospatial birds-of-a-feather in Solr, 
especially committers.

As an aside, MongoDB has chosen a similar algorithm.

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-2155) Geospatial search using geohash prefixes

2010-10-12 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-2155:
---

Attachment: GeoHashPrefixFilter.patch

This attached patch is tested, both at the GeoHashUtils level and via 
SpatialFilterTest.  I added tests to both of these.  I added ASF headers.

 Geospatial search using geohash prefixes
 

 Key: SOLR-2155
 URL: https://issues.apache.org/jira/browse/SOLR-2155
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
 Attachments: GeoHashPrefixFilter.patch


 There currently isn't a solution in Solr for doing geospatial filtering on 
 documents that have a variable number of points.  This scenario occurs when 
 there is location extraction (i.e. via a gazateer) occurring on free text.  
 None, one, or many geospatial locations might be extracted from any given 
 document and users want to limit their search results to those occurring in a 
 user-specified area.
 I've implemented this by furthering the GeoHash based work in Lucene/Solr 
 with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
 earth.  Each successive character added further subdivides the box into a 4x8 
 (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
 step in this scheme is figuring out which geohash grid squares cover the 
 user's search query.  I've added various extra methods to GeoHashUtils (and 
 added tests) to assist in this purpose.  The next step is an actual Lucene 
 Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
 TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
 matching geohash grid is found, the points therein are compared against the 
 user's query to see if it matches.  I created an abstraction GeoShape 
 extended by subclasses named PointDistance... and CartesianBox to support 
 different queried shapes so that the filter need not care about these details.
 This work was presented at LuceneRevolution in Boston on October 8th.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org