> Not to nitpick, but the Maps TOS basically says, "See our primary TOS"
> - which explicitly forbids commercial usage. Thanks for the update. > radius searches, lat/long is rather important. Yup. > incorrect, which is worse than missing! Agreed. > So I have about 75k addresses that need to be redone. That's enough to consider a commercial service with value-added data corrections and warranty. Both Eagle and Geocoder.US offer that. > The few commercial services I've > looked at would be quite expensive. Depends what you consider expensive, I guess. I like Geocoder's price better, don't know if their corrections are as accurate or not. Someone posted a quote on Eagle's pricing which is better than I thought I remembered. If there's a business value to accurate data, either should be well worth it. > One option we've used in the past is just doing zipcode centroid > matching. You can get this information for ~$100. Can't you do that for free with the USPO web service? But $100 is less than the coding cost to scrape USPO, so ... yeah. > But obviously this is > less accurate. In my case, I'm not sure if the hit in accuracy is too > much. I need to do more checking. Yes, the question is how short the radius is, what the density of datums are, what industry you're catloging is. If the normal radius is say 25 miles, grouping all items in zipcode 12345 to the same lat/lon at either the zipcode centroid or the post-office lat-lon should be just fine. If you have sufficient density that returning the closest 10 hits would all be within a mile, this would increase the error dramatically. You've got 75k data items. What is the clustering? Of the 99999 possible zipcodes, how many are represented in your 75k items, and how many have high counts? If spread evenly over the (back of the envelope, 3k*1k=) 3M sq miles of the lower 48, that's (75k/3kk = 25/k = 1/40) one datum per 40 square miles, would be ok. However, that's several times the number of McDonalds restaurants in the country. So they're probably fairly dense in the areas they cluster. I do something similar to Zip centroids -- for masking personal data, when plotting HOUSES on my map, I plot the house on the 6-char Maidenhead grid locator, which rounds the location, and the MHG is sometimes taken from a POBox and sometimes from a street address. Good enough usually. > Has anyone used commercial services and been happy with the > price/results? I've not paid for it, but I have used Eagle's free sample and GeoCoder's online service sporadically. I've found GeoCoder is more likely right than MapQuest. > In the case of using the TIGER/Line dataset, how > accurate is it? Not sure how different GeoCoder's public webservice dataset is from raw Tiger/line set. I'm assuming they're showing their value add, but don't know that. I've found their web service is touchy about giving the right abbreviations and is helped by giving it zipcodes. It took me several tries to get "1 Stanwood Street, Gloucester, MA", not sure why, works now. If you use Geo::Coder::Us and TIGER/Line, you'll want to put a USPO or your own cleanup routine in front of it to force Street to St, and strip the .'s off abbrevs and add ,'s where it wants 'em (or parse 'em yourself into fields). If your data has already been through the USPO canonicalizer and stuffed into DB fields, you should be able to stuff it into Geo::Coder::US API much more cleanly than using the web-service with raw data like I'm doing, and get a much better hit rate. if I were doing more addresses, I'd have a script that used USPO first before GeoCoder.us, but I'm only doing 3-4 a week on a busy week ;-) PUBLIC INVITATION ... And do come visit our public radio field day events weekend of June 25/26, roughly noon Saturday to noon Sunday, at any of the sites geocoded and plotted with Perl at http://ema.arrl.org/fd/fd_dir.php -- I'll be under the icon labeled "BARC", Brookline's Larz Andersen park, where we have the ONLY overnight permit of the year. (If you dig through ALL the linkage from that icon, you'll find a couple pictures of me ... that's an easter egg challenge!) The jittered icons are on http://ema.arrl.org/fd/tour.php which shows who's trying to visit more than one site. What am I talking about? See the front page of the sub-site for an explanation; see the About page for credits on basemaps and tools. Cheers, >> Bill Ricker >> Not speaking for The Firm >> aka N1VUX Yes, I *am* posting from two accounts _______________________________________________ Boston-pm mailing list [email protected] http://mail.pm.org/mailman/listinfo/boston-pm

