Re: Joining data using Latitude, Longitude

2015-03-13 Thread Ankur Srivastava
Hi Everyone, Thank you for your suggestions, based on that I was able to move forward. I am now generating a Geohash for all the lats and lons in our reference data and then creating a trie of all the Geohash's I am then broadcasting that trie and then using it to search the nearest Geohash for

Re: Joining data using Latitude, Longitude

2015-03-12 Thread Andrew Musselman
Ted Dunning and Ellen Friedman's Time Series Databases has a section on this with some approaches to geo-encoding: https://www.mapr.com/time-series-databases-new-ways-store-and-access-data http://info.mapr.com/rs/mapr/images/Time_Series_Databases.pdf On Tue, Mar 10, 2015 at 3:53 PM, John Meehan

Re: Joining data using Latitude, Longitude

2015-03-11 Thread Ankur Srivastava
Thank you everyone!! I have started implementing the join using the geohash and using the first 4 alphabets of the HASH as the key. Can I assign a Confidence factor in terms of distance based on number of characters matching in the HASH code? I will also look at the other options listed here.

Re: Joining data using Latitude, Longitude

2015-03-11 Thread Manas Kar
There are few techniques currently available. Geomesa which uses GeoHash also can be proved useful.( https://github.com/locationtech/geomesa) Other potential candidate is https://github.com/Esri/gis-tools-for-hadoop especially https://github.com/Esri/geometry-api-java for inner customization. If

Re: Joining data using Latitude, Longitude

2015-03-10 Thread Akhil Das
Are you using SparkSQL for the join? In that case I'm not quiet sure you have a lot of options to join on the nearest co-ordinate. If you are using the normal Spark code (by creating key-pair on lat,lon) you can apply certain logic like trimming the lat,lon etc. If you want more specific computing

Re: Joining data using Latitude, Longitude

2015-03-10 Thread John Meehan
There are some techniques you can use If you geohash http://en.wikipedia.org/wiki/Geohash the lat-lngs. They will naturally be sorted by proximity (with some edge cases so watch out). If you go the join route, either by trimming the lat-lngs or geohashing them, you’re essentially grouping

Joining data using Latitude, Longitude

2015-03-09 Thread Ankur Srivastava
Hi, I am trying to join data based on the latitude and longitude. I have reference data which has city information with their latitude and longitude. I have a data source with user information with their latitude and longitude. I want to find the nearest city to the user's latitude and