I use Geonames for this sort of thing a lot. With cities and administrative divisions being offered in a machine-readable format, it's pretty easy to encode places in a format that adheres to AACR2 or other cataloging rules. There are of course problems disambiguating city names when no country is given, but I get a pretty accurate response in general: probably greater than 76% when I have both the city and country or city and geographic region.
Ethan On Mon, Sep 17, 2012 at 3:16 PM, Eric Lease Morgan <emor...@nd.edu> wrote: > On Sep 17, 2012, at 3:12 PM, <ddwigg...@historicnewengland.org> wrote: > > > But I'm having trouble coming up with an algorithm that can consistently > spit these out in the form we'd want to display given the data available in > TGN. > > > A dense but rich, just-published article from D-Lib Magazine about > geocoding -- Fulltext Geocoding Versus Spatial Metadata for Large Text > Archives -- may give some guidance. From the conclusion: > > Spatial information is playing an increasing role in the access > and mediation of information, driving interest in methods capable > of extracting spatial information from the textual contents of > large document archives. Automated approaches, even using fairly > basic algorithms, can achieve upwards of 76% accuracy when > recognizing, disambiguating, and converting to mappable > coordinates the references to individual cities and landmarks > buried deep within the text of a document. The workflow of a > typical geocoding system involves identifying potential > candidates from the text, checking those candidates for potential > matches in a gazetteer, and disambiguating and confirming those > candidates -- http://bit.ly/Ufl5k9 > > -- > ELM >