On Sep 17, 2012, at 3:12 PM, <ddwigg...@historicnewengland.org> wrote:
> But I'm having trouble coming up with an algorithm that can consistently spit > these out in the form we'd want to display given the data available in TGN. A dense but rich, just-published article from D-Lib Magazine about geocoding -- Fulltext Geocoding Versus Spatial Metadata for Large Text Archives -- may give some guidance. From the conclusion: Spatial information is playing an increasing role in the access and mediation of information, driving interest in methods capable of extracting spatial information from the textual contents of large document archives. Automated approaches, even using fairly basic algorithms, can achieve upwards of 76% accuracy when recognizing, disambiguating, and converting to mappable coordinates the references to individual cities and landmarks buried deep within the text of a document. The workflow of a typical geocoding system involves identifying potential candidates from the text, checking those candidates for potential matches in a gazetteer, and disambiguating and confirming those candidates -- http://bit.ly/Ufl5k9 -- ELM