Re: LocalLucene or GeoHash for spatial search ?

2008-12-29 Thread patrick o'leary




Hey Marc

LocalLucene has been rewritten since then to use a Cartesian grid for
it's boundary box look ups
http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html

GeoHash is method of consistent hashing to produce an id where the
length of the id
gives way to the precision of the point, as in 123ab6789 might be
(42.12345, -73.12345)
and 123ab would be (42.12, -73.12)

It's a great way to store individual points or areas in a compressed
format, kind of like a tiny url to a particular point on the globe.

Locallucene works differently by placing points within boxes at
different zoom levels.
At minimum zoom level 0 (_localTier0) everything exists within 1 box, 
zoom level 1it's 4 boxes
zoom level 2 it's 16 boxes
.
zoom level 15 it's 1,073,741,824 boxes

Obviously the index will only contain box id's for the boxes that have
points inside them (thus if your indexing only
the land mass of the planet, your only going to use at most 30% of
those boxes)

Based on the radius of your search, locallucene will select the
appropriate zoom level to find your results in.

So locallucene can benefit from changing our notation for box id's to
something similar to geohash to reduce index size,
the concept for search is different. A couple of us are looking at
including geohash into the locallucene code base, it would make
our distance calculation less memory intensive having to load only one
field cache for a point rather than the current 2 lat  long
fields we use, but I have to test the decoding speed to see if it slows
us down.

GeoHash's main benefit comes in the form of lookup by id, say for an
image or tile map at a point or for geocoding.
It probably has more benefits than that, and I'm sure someone will
correct me on that.

I should also warn you, that I'm the guy who wrote locallucene so I
have a natural bias towards it, but I'll be honest this is how I see
most geo searches working. 

- P

squaro wrote:

  Hello everybody

I would like to have your mind about spatial search techniques using Lucene

According to you is it better to use 
http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene.htm
LocalLucene  or encoding lat and long with  http://geohash.org/ Geohash  (
and then use a RangeFilter between the two boundaries hash) ?

In my mind I think using geohash should be better because the comparaison is
done on one field only.

What is your opinion about it ?

Best regards

Marc
  


-- 
Patrick O'Leary

AOL Local Search Technologies
Phone: + 1 703 265 8763

You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles.
 Do you understand this? 
And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat.
  - Albert Einstein

View
Patrick O Leary's profile





Re: LocalLucene or GeoHash for spatial search ?

2008-12-29 Thread Robert Muir
guys figured i would pass this along:
http://www.geospatialsemanticweb.com/2008/05/29/geohash-for-spatial-index-and-search

one comment there makes me a little afraid to use geohash for spatial
search:

That doesn't work too well for London, which straddles 0 longitude–either
side of 0 flips the MSB. These two places are pretty close to each other:

http://geohash.org/u10hb7951
http://geohash.org/gcpuzewfz


On Mon, Dec 29, 2008 at 12:34 PM, patrick o'leary polear...@aol.com wrote:

  Hey Marc

 LocalLucene has been rewritten since then to use a Cartesian grid for it's
 boundary box look ups
 http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html

 GeoHash is method of consistent hashing to produce an id where the length
 of the id
 gives way to the precision of the point, as in 123ab6789 might be
 (42.12345, -73.12345)
 and 123ab would be (42.12, -73.12)

 It's a great way to store individual points or areas in a compressed
 format, kind of like a tiny url to a particular point on the globe.

 Locallucene works differently by placing points within boxes at different
 zoom levels.
 At minimum zoom level 0 (_localTier0) everything exists within 1 box,
 zoom level 1it's 4 boxes
 zoom level 2 it's 16 boxes
 .
 zoom level 15 it's 1,073,741,824 boxes

 Obviously the index will only contain box id's for the boxes that have
 points inside them (thus if your indexing only
 the land mass of the planet, your only going to use at most 30% of those
 boxes)

 Based on the radius of your search, locallucene will select the appropriate
 zoom level to find your results in.

 So locallucene can benefit from changing our notation for box id's to
 something similar to geohash to reduce index size,
 the concept for search is different. A couple of us are looking at
 including geohash into the locallucene code base, it would make
 our distance calculation less memory intensive having to load only one
 field cache for a point rather than the current 2 lat  long
 fields we use, but I have to test the decoding speed to see if it slows us
 down.

 GeoHash's main benefit comes in the form of lookup by id, say for an image
 or tile map at a point or for geocoding.
 It probably has more benefits than that, and I'm sure someone will correct
 me on that.

 I should also warn you, that I'm the guy who wrote locallucene so I have a
 natural bias towards it, but I'll be honest this is how I see
 most geo searches working.

 - P

 squaro wrote:

 Hello everybody

 I would like to have your mind about spatial search techniques using Lucene

 According to you is it better to use 
 http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene.htm
 LocalLucene  or encoding lat and long with  http://geohash.org/ Geohash  (
 and then use a RangeFilter between the two boundaries hash) ?

 In my mind I think using geohash should be better because the comparaison is
 done on one field only.

 What is your opinion about it ?

 Best regards

 Marc



 --

 Patrick O'Leary

 AOL Local Search Technologies
 Phone: + 1 703 265 8763

 You see, wire telegraph is a kind of a very, very long cat. You pull his tail 
 in New York and his head is meowing in Los Angeles.
  Do you understand this?
 And radio operates exactly the same way: you send signals here, they receive 
 them there. The only difference is that there is no cat.
   - Albert Einstein

 [image: View Patrick O Leary's LinkedIn profile]View Patrick O Leary's
 profile http://www.linkedin.com/in/pjaol




-- 
Robert Muir
rcm...@gmail.com
btn_in_20x15.gif

Re: LocalLucene or GeoHash for spatial search ?

2008-12-29 Thread Ryan McKinley

geohash and the tier systems (local lucene) each have their place.

Geohash is attractive since it simple and could slip into lucene  
easily.  The tier system is more complex, but supports more accurate  
calculations and better behavior around the edges (even in New  
Zealand and London)


I hope the spatial contrib will explore many approaches.  Obviously  
not every approach will be generally applicable, but it is good to  
have in the toolbox.


Also check:
http://wiki.apache.org/lucene-java/SpatialSearch

ryan


On Dec 29, 2008, at 2:35 PM, Robert Muir wrote:


guys figured i would pass this along:
http://www.geospatialsemanticweb.com/2008/05/29/geohash-for-spatial-index-and-search

one comment there makes me a little afraid to use geohash for  
spatial search:
That doesn't work too well for London, which straddles 0 longitude– 
either side of 0 flips the MSB. These two places are pretty close to  
each other:


http://geohash.org/u10hb7951
http://geohash.org/gcpuzewfz



On Mon, Dec 29, 2008 at 12:34 PM, patrick o'leary  
polear...@aol.com wrote:

Hey Marc

LocalLucene has been rewritten since then to use a Cartesian grid  
for it's boundary box look ups

http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html

GeoHash is method of consistent hashing to produce an id where the  
length of the id
gives way to the precision of the point, as in 123ab6789 might be  
(42.12345, -73.12345)

and 123ab would be (42.12, -73.12)

It's a great way to store individual points or areas in a compressed  
format, kind of like a tiny url to a particular point on the globe.


Locallucene works differently by placing points within boxes at  
different zoom levels.

At minimum zoom level 0 (_localTier0) everything exists within 1 box,
zoom level 1it's 4 boxes
zoom level 2 it's 16 boxes
.
zoom level 15 it's 1,073,741,824 boxes

Obviously the index will only contain box id's for the boxes that  
have points inside them (thus if your indexing only
the land mass of the planet, your only going to use at most 30% of  
those boxes)


Based on the radius of your search, locallucene will select the  
appropriate zoom level to find your results in.


So locallucene can benefit from changing our notation for box id's  
to something similar to geohash to reduce index size,
the concept for search is different. A couple of us are looking at  
including geohash into the locallucene code base, it would make
our distance calculation less memory intensive having to load only  
one field cache for a point rather than the current 2 lat  long
fields we use, but I have to test the decoding speed to see if it  
slows us down.


GeoHash's main benefit comes in the form of lookup by id, say for an  
image or tile map at a point or for geocoding.
It probably has more benefits than that, and I'm sure someone will  
correct me on that.


I should also warn you, that I'm the guy who wrote locallucene so I  
have a natural bias towards it, but I'll be honest this is how I see

most geo searches working.

- P


squaro wrote:


Hello everybody

I would like to have your mind about spatial search techniques  
using Lucene


According to you is it better to use
http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene.htm
LocalLucene  or encoding lat and long with  http://geohash.org/  
Geohash  (

and then use a RangeFilter between the two boundaries hash) ?

In my mind I think using geohash should be better because the  
comparaison is

done on one field only.

What is your opinion about it ?

Best regards

Marc



--
Patrick O'Leary

AOL Local Search Technologies
Phone: + 1 703 265 8763

You see, wire telegraph is a kind of a very, very long cat. You pull  
his tail in New York and his head is meowing in Los Angeles.

 Do you understand this?
And radio operates exactly the same way: you send signals here, they  
receive them there. The only difference is that there is no cat.

  - Albert Einstein
btn_in_20x15.gifView Patrick O Leary's profile



--
Robert Muir
rcm...@gmail.com