> Not to nitpick, but the Maps TOS basically says, "See our primary TOS"

> - which explicitly forbids commercial usage.

Thanks for the update.

> radius searches, lat/long is rather important.

Yup.

> incorrect, which is worse than missing! 

Agreed.

> So I have about 75k addresses that need to be redone. 

That's enough to consider a commercial service with value-added data
corrections and warranty.  Both Eagle and Geocoder.US offer that.


> The few commercial services I've 
> looked at would be quite expensive.

Depends what you consider expensive, I guess. I like  Geocoder's price
better, don't know if their corrections are as accurate or not.  Someone
posted a quote on Eagle's pricing which is better than I thought I
remembered. If there's a business value to accurate data, either should
be well worth it.

> One option we've used in the past is just doing zipcode centroid 
> matching. You can get this information for ~$100. 

Can't you do that for free with the USPO web service?  But $100 is less
than the coding cost to scrape USPO, so ... yeah.

> But obviously this is 
> less accurate. In my case, I'm not sure if the hit in accuracy is too 
> much. I need to do more checking.

Yes, the question is how short the radius is, what the density of datums
are, what industry you're catloging is.  If the normal radius is say 25
miles, grouping all items in zipcode 12345 to the same lat/lon at either
the zipcode centroid or the post-office lat-lon should be just fine. If
you have sufficient density that returning the closest 10 hits would all
be within a mile, this would increase the error dramatically.

You've got 75k data items.  What is the clustering?  Of the 99999
possible zipcodes, how many are represented in your 75k items, and how
many have high counts?  If spread evenly over the (back of the envelope,
3k*1k=) 3M sq miles of the lower 48, that's (75k/3kk = 25/k = 1/40) one
datum per 40 square miles, would be ok.  However, that's several times
the number of McDonalds restaurants in the country. So they're probably
fairly dense in the areas they cluster.  

I do something similar to Zip centroids -- for masking personal data,
when plotting HOUSES on my map, I plot the house on the 6-char
Maidenhead grid locator, which rounds the location, and the MHG is
sometimes taken from a POBox and sometimes from a street address. Good
enough usually. 

> Has anyone used commercial services and been happy with the 
> price/results? 

I've not paid for it, but I have used Eagle's free sample and GeoCoder's
online service sporadically.  I've found GeoCoder is more likely right
than MapQuest.

> In the case of using the TIGER/Line dataset, how 
> accurate is it?

Not sure how different GeoCoder's public webservice dataset is from raw
Tiger/line set. I'm assuming they're showing their value add, but don't
know that. I've found their web service is touchy about giving the right
abbreviations and is helped by giving it zipcodes.  It took me several
tries to get "1 Stanwood Street, Gloucester, MA", not sure why, works
now.  If you use Geo::Coder::Us and TIGER/Line, you'll want to put a
USPO or your own cleanup routine in front of it to force Street to St,
and strip the .'s off abbrevs and add ,'s where it wants 'em (or parse
'em yourself into fields).  If your data has already been through the
USPO canonicalizer and stuffed into DB fields, you should be able to
stuff it into Geo::Coder::US API much more cleanly than using the
web-service with raw data like I'm doing, and get a much better hit
rate.  if I were doing more addresses, I'd have a script that used USPO
first before GeoCoder.us, but I'm only doing 3-4 a week on a busy week
;-)

PUBLIC INVITATION ...

And do come visit our public radio field day events weekend of June
25/26, roughly noon Saturday to noon Sunday, at any of the sites
geocoded and plotted with Perl at http://ema.arrl.org/fd/fd_dir.php --
I'll be under the icon labeled "BARC", Brookline's Larz Andersen park,
where we have the ONLY overnight permit of the year. (If you dig through
ALL the linkage from that icon, you'll find  a couple pictures of me ...
that's an easter egg challenge!)  The jittered icons are on
http://ema.arrl.org/fd/tour.php which shows who's trying to visit more
than one site.  What am I talking about? See the front page of the
sub-site for an explanation; see the About page for credits on basemaps
and tools.

Cheers,

>> Bill Ricker
>> Not speaking for The Firm
>> aka N1VUX
Yes, I *am* posting from two accounts
 
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to