Re: zip code search within x miles

paris lundis Mon, 18 Apr 2005 21:17:02 -0700

I'll share my creative workaround for geographic searches within x miles.

This is a theory I developed years ago and have used in live production
systems for the last five years. This method to be described is used
to produce approximately 100-200k sets of live nearby data per day online.

My approach is necessary for ColdFusion application server display of relative nearby geographic data on my very busy website: http://www.pubcrawler.com I use MySQL for the database storage with some MS SQL.

First, there are several different formulas readily available as math by which to re-engineer for your particular use. The formulas should be able to be ported to any platform/language given the math functions needed are available.

My concept is simple and I believe users are willing to accept some small margin of error. This meaning, a place say a mile or two over further in one direction.

Zipcodes are notoriously odd in their adoption in areas. They follow a North to South and East to West incrementation. Certain inferences can simply be made about a high zipcode or a low one belonging to either coast.

Nearby zipcodes are minimally useful, other than to say here are other zipcodes (say for instance of a real estate application). The wild goose chase.

By all means, you need to reduce the amount of math that goes into preparing a query and say a page in your application.

What I do, to avoid pre-compilation - that is making all the possible or common possible results already plotted and thereby pre-planned - is work on establishing the minimums and maximums at four values - these are a min and max for both lat and lon:

NW        |         NE
               |
----------------------
               |
SW          |        SE

I establish these as the max and mins based on converting the distance (fraction of mile to many miles) to degrees and doing the math to calculate each of the four points.

With these four points I can make a very simple and fast comparison where latitude =< maxlat and latitude => minlat and longitude =< maxlon and longitude >= minlon

This approach works on a square and gives you an error margin which is the difference of placing the distance exhibited as a circle within the same space exhibited as a square. The error or overlap areas are minimal and for terms of consumer based services will suffice. These areas are triangles with the inward side being an arc.

I use this approach to relate okay sized sets of data to one another on the fly (500k restaurants, 6000 beer establishments, 13k bed and breakfasts, 50k hotels, etc.)

All the math is and can be done on the fly this way on commodity hardware (800Mhz server) on the application server level. Bringing this to a pure SQL level would increase the speed further.

A correlation for nearby hotels with this approach will run in a multi-use environment on average in about four seconds. With further optimization this can be reduce to around 1 second..

This approach is a compromise to more exact math offered elsewhere on the internet.

Applying this same thing to apply to the 80k estimated US zipcodes currently would be pie and would run faster just due to the reduced dataset.

Zipcode databases are readily available with quarterly subscription updates available today. I believe the US Postal Service even sells a dataset now.


Richard Lynch wrote:

You can buy up-to-the-minute zips, or snag TIGER (or gazateer?) data
that's a bit old for free.

It's a 1-1 mapping of zip to long/lat.

The tricky bit is this.

There's about 65,000 zips, even in the out-dated list for free.

Let's say you've got, oh, 2000 records to search through.

You're gonna end up doing a JOIN with:
2,000 X 65,000 == 130,000,000 tuples (records/results).

130 MILLION tuples is *way* too many for your basic $20/month site.

So what you do is break the rules.

That's right, it's rule-breaking time.

First, add a longitude and latitude column to your 2000 record table,
default to NULL.

Next, add a trigger (in MySQL 4.1+ I think) so that any time you change
the country or zip, the latitude/longitude gets looked up in the zips
table, and filled in.

If you can't do triggers (in MySQL pre-4.1) you'll have to arrange your
Busicness Logic such that any time you UPDATE country or zip in your 2000
records table, you reset longitude/latitude to NULL.

Next, write a cron job to snag, say, 10 records from your table "WHERE
latitude is NULL" in random order.

Lookup the country and zip in your zips table, and get your $latitude and
$longitude for that country/zip then do:

update EXAMPLE set latitude = $latitude, longitude = $longitude where
country = '$country' and zip = '$zip'

Errr, I kinda slipped into PHP there, but you probably know what I mean. $x is a variable holding the values you looked up.

Run that cron job a whole lot at first, by hand, to get mostly all the
2000 records fixed up with the right long/lat.

The point of this is that *NOW* you only have to search through 2000
records for your distance function, which sure beats a join with 130
million records, eh?

Also, if you just want the NEAREST matches, forget all that complicated
crap about Great Circle distance and expensive trigonometric calculations.

You can just get the zip they want to search by, lookup $longitude and
$latitude for that zip, and then use:

select {application_data_here), abs(latitude - $latitude) + abs(longitude
- $longitude) as closest from ... order by closest

If MySQL won't let you use 'closest' in the where clause, just move the
calculations there.

I've done this before for clients who didn't want to muck with all this. It's a few hours' work. Call it $200.

PS I'm working on a GLOBAL POSTAL system which will data-gather OSS
long/lat and country/zip for the entire world, rather than the current
mess of having only the US for free, and maybe Canada and Australia, but
not much else is free, and often not even available.

I guess I'd better post here when I launch that.  No time-line available yet.

On Fri, April 15, 2005 3:37 pm, Scott Haneda said:

How are sites doing the search by zip and coming up with results within x
miles?  Is there some OSS zip code download that has been created for
this?
--
-------------------------------------------------------------
Scott Haneda                                Tel: 415.898.2602
<http://www.newgeo.com>                     Novato, CA U.S.A.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: zip code search within x miles

Reply via email to