El 2012-01-14 15:59, Andrew Turner escribió:
On Fri, Jan 13, 2012 at 6:00 PM, slesage <sles...@geo.gob.bo> wrote:
Hi,

does anybody knows about some opensource software dedicated to automatic
geocoding of text documents ? The idea of that "black box" would be:
* give, as an input, a text document or a PDF,
* receive, as an output, a list of place names with their coordinates / a
map of POI corresponding to that places.

Using the geonames database (http://www.geonames.org/), the solution appears
to be only a fulltext search, that could be done using Lucene
(https://lucene.apache.org/java/docs/index.html).

I found the metacarta solution
(http://www.metacarta.com/products-platform-geotag.htm) but couldn't find
any opensource solution.

The reason that there isn't an open-source solution is because it is
Very Difficult. Even geocoding is difficult and until a short while
ago there weren't any decent open-source geocoders. So we worked with
Schuyler (formerly of Metacarta) to build an open-source one [1].

Your idea of using Geonames gazeteer with Apache Lucene is interesting
and I think I've seen it suggested before. However, at best it will
find location names but will be missing any logic for disambiguation
or words or relative locations. So you could likely find that "Paris"
was mentioned, but not sure if it's Paris, France or Paris, Texas, US.

Gisgraphy [2] is an open-source option that says it provides Full-text
searching. I don't know more about it though.

Definitely share what else you find or try.

Andrew

Thanks for the links, Andrew, I will investigate them. I had seen Gisgraphy before, but did not understand well what is its purpose exactly. Did anybody use it ? It seems to be developped by only one person, do you think the community is broader ?

In order to refine my ideas on a geocoding tool, I think it would be very difficult to do a totally automatic processing, because of disambiguation and fixing of false positives/false negatives. A semi-automatic approach would certainly be much more efficient, with a posterior validation by the user and a learning engine to record these decisions.

I think that kind of processing would be most efficient interfaced as a plugin for a text editor, allowing: * geocoding of a word selected by the user (selection -> right clic -> georeference, etc.) * geocoding of a whole text, with a bubble for each word, and three buttons for post-validation: "OK", "disambiguate" (your example of Paris, Texas), "not a location"

I don't know if that sounds interesting or not. But without a doubt, that means a lot of development! In order not to reinvent the wheel, could anybody give me more hints on the two initiatives you mentionned (geocoding, gisgraphy) so I could better determine to which one it would be better to contribute ?

Thanks

Sylvain Lesage
_______________________________________________
Discuss mailing list
Discuss@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/discuss

Reply via email to