Re: LocalLucene or GeoHash for spatial search ?
Hey Marc LocalLucene has been rewritten since then to use a Cartesian grid for it's boundary box look ups http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html GeoHash is method of consistent hashing to produce an id where the length of the id gives way to the precision of the point, as in 123ab6789 might be (42.12345, -73.12345) and 123ab would be (42.12, -73.12) It's a great way to store individual points or areas in a compressed format, kind of like a tiny url to a particular point on the globe. Locallucene works differently by placing points within boxes at different zoom levels. At minimum zoom level 0 (_localTier0) everything exists within 1 box, zoom level 1it's 4 boxes zoom level 2 it's 16 boxes . zoom level 15 it's 1,073,741,824 boxes Obviously the index will only contain box id's for the boxes that have points inside them (thus if your indexing only the land mass of the planet, your only going to use at most 30% of those boxes) Based on the radius of your search, locallucene will select the appropriate zoom level to find your results in. So locallucene can benefit from changing our notation for box id's to something similar to geohash to reduce index size, the concept for search is different. A couple of us are looking at including geohash into the locallucene code base, it would make our distance calculation less memory intensive having to load only one field cache for a point rather than the current 2 lat long fields we use, but I have to test the decoding speed to see if it slows us down. GeoHash's main benefit comes in the form of lookup by id, say for an image or tile map at a point or for geocoding. It probably has more benefits than that, and I'm sure someone will correct me on that. I should also warn you, that I'm the guy who wrote locallucene so I have a natural bias towards it, but I'll be honest this is how I see most geo searches working. - P squaro wrote: Hello everybody I would like to have your mind about spatial search techniques using Lucene According to you is it better to use http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene.htm LocalLucene or encoding lat and long with http://geohash.org/ Geohash ( and then use a RangeFilter between the two boundaries hash) ? In my mind I think using geohash should be better because the comparaison is done on one field only. What is your opinion about it ? Best regards Marc -- Patrick O'Leary AOL Local Search Technologies Phone: + 1 703 265 8763 You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat. - Albert Einstein View Patrick O Leary's profile
Re: LocalLucene or GeoHash for spatial search ?
guys figured i would pass this along: http://www.geospatialsemanticweb.com/2008/05/29/geohash-for-spatial-index-and-search one comment there makes me a little afraid to use geohash for spatial search: That doesn't work too well for London, which straddles 0 longitude–either side of 0 flips the MSB. These two places are pretty close to each other: http://geohash.org/u10hb7951 http://geohash.org/gcpuzewfz On Mon, Dec 29, 2008 at 12:34 PM, patrick o'leary polear...@aol.com wrote: Hey Marc LocalLucene has been rewritten since then to use a Cartesian grid for it's boundary box look ups http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html GeoHash is method of consistent hashing to produce an id where the length of the id gives way to the precision of the point, as in 123ab6789 might be (42.12345, -73.12345) and 123ab would be (42.12, -73.12) It's a great way to store individual points or areas in a compressed format, kind of like a tiny url to a particular point on the globe. Locallucene works differently by placing points within boxes at different zoom levels. At minimum zoom level 0 (_localTier0) everything exists within 1 box, zoom level 1it's 4 boxes zoom level 2 it's 16 boxes . zoom level 15 it's 1,073,741,824 boxes Obviously the index will only contain box id's for the boxes that have points inside them (thus if your indexing only the land mass of the planet, your only going to use at most 30% of those boxes) Based on the radius of your search, locallucene will select the appropriate zoom level to find your results in. So locallucene can benefit from changing our notation for box id's to something similar to geohash to reduce index size, the concept for search is different. A couple of us are looking at including geohash into the locallucene code base, it would make our distance calculation less memory intensive having to load only one field cache for a point rather than the current 2 lat long fields we use, but I have to test the decoding speed to see if it slows us down. GeoHash's main benefit comes in the form of lookup by id, say for an image or tile map at a point or for geocoding. It probably has more benefits than that, and I'm sure someone will correct me on that. I should also warn you, that I'm the guy who wrote locallucene so I have a natural bias towards it, but I'll be honest this is how I see most geo searches working. - P squaro wrote: Hello everybody I would like to have your mind about spatial search techniques using Lucene According to you is it better to use http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene.htm LocalLucene or encoding lat and long with http://geohash.org/ Geohash ( and then use a RangeFilter between the two boundaries hash) ? In my mind I think using geohash should be better because the comparaison is done on one field only. What is your opinion about it ? Best regards Marc -- Patrick O'Leary AOL Local Search Technologies Phone: + 1 703 265 8763 You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat. - Albert Einstein [image: View Patrick O Leary's LinkedIn profile]View Patrick O Leary's profile http://www.linkedin.com/in/pjaol -- Robert Muir rcm...@gmail.com btn_in_20x15.gif
Re: LocalLucene or GeoHash for spatial search ?
geohash and the tier systems (local lucene) each have their place. Geohash is attractive since it simple and could slip into lucene easily. The tier system is more complex, but supports more accurate calculations and better behavior around the edges (even in New Zealand and London) I hope the spatial contrib will explore many approaches. Obviously not every approach will be generally applicable, but it is good to have in the toolbox. Also check: http://wiki.apache.org/lucene-java/SpatialSearch ryan On Dec 29, 2008, at 2:35 PM, Robert Muir wrote: guys figured i would pass this along: http://www.geospatialsemanticweb.com/2008/05/29/geohash-for-spatial-index-and-search one comment there makes me a little afraid to use geohash for spatial search: That doesn't work too well for London, which straddles 0 longitude– either side of 0 flips the MSB. These two places are pretty close to each other: http://geohash.org/u10hb7951 http://geohash.org/gcpuzewfz On Mon, Dec 29, 2008 at 12:34 PM, patrick o'leary polear...@aol.com wrote: Hey Marc LocalLucene has been rewritten since then to use a Cartesian grid for it's boundary box look ups http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html GeoHash is method of consistent hashing to produce an id where the length of the id gives way to the precision of the point, as in 123ab6789 might be (42.12345, -73.12345) and 123ab would be (42.12, -73.12) It's a great way to store individual points or areas in a compressed format, kind of like a tiny url to a particular point on the globe. Locallucene works differently by placing points within boxes at different zoom levels. At minimum zoom level 0 (_localTier0) everything exists within 1 box, zoom level 1it's 4 boxes zoom level 2 it's 16 boxes . zoom level 15 it's 1,073,741,824 boxes Obviously the index will only contain box id's for the boxes that have points inside them (thus if your indexing only the land mass of the planet, your only going to use at most 30% of those boxes) Based on the radius of your search, locallucene will select the appropriate zoom level to find your results in. So locallucene can benefit from changing our notation for box id's to something similar to geohash to reduce index size, the concept for search is different. A couple of us are looking at including geohash into the locallucene code base, it would make our distance calculation less memory intensive having to load only one field cache for a point rather than the current 2 lat long fields we use, but I have to test the decoding speed to see if it slows us down. GeoHash's main benefit comes in the form of lookup by id, say for an image or tile map at a point or for geocoding. It probably has more benefits than that, and I'm sure someone will correct me on that. I should also warn you, that I'm the guy who wrote locallucene so I have a natural bias towards it, but I'll be honest this is how I see most geo searches working. - P squaro wrote: Hello everybody I would like to have your mind about spatial search techniques using Lucene According to you is it better to use http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene.htm LocalLucene or encoding lat and long with http://geohash.org/ Geohash ( and then use a RangeFilter between the two boundaries hash) ? In my mind I think using geohash should be better because the comparaison is done on one field only. What is your opinion about it ? Best regards Marc -- Patrick O'Leary AOL Local Search Technologies Phone: + 1 703 265 8763 You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat. - Albert Einstein btn_in_20x15.gifView Patrick O Leary's profile -- Robert Muir rcm...@gmail.com