Hi Christian Ledermann, all This would be great and definitely a major improvement for Apache Stanbol! Writing an EnhancementEngine based on this should be relatively simple.
The only thing I would suggest is to not use the integrated OpenNLP NER based LocationExtractor implementation of CLAVIN. Stanbol provides much more option regarding NLP so having an own Stanbol specific implementation of the LocationExtractor interface [1] would allow to use also different NER implementation as well as custom build NER models for OpenNLP. Such an implementation would need to parses fise:TextAnnotations with dc:type dbpedia:Place from the enhancement metadata and returns them as "List<LocationOccurrence>". Christian would to have time to work on that? I would definitely help you with this best Rupert [1] https://github.com/Berico-Technologies/CLAVIN/blob/master/src/main/java/com/berico/clavin/extractor/LocationExtractor.java On Thu, May 30, 2013 at 10:23 AM, Olivier Rossel <[email protected]> wrote: > +1 > the demo looks great!!!! > > > > On Thu, May 30, 2013 at 9:51 AM, Christian Ledermann < > [email protected]> wrote: > >> I just stumbled over this: >> >> " >> CLAVIN (Cartographic Location And Vicinity INdexer) is an >> award-winning open source >> (apache 2 license) >> software package for document geotagging and geoparsing that employs >> context-based geographic entity resolution. >> >> It extracts location names from unstructured text and resolves them >> against a gazetteer to produce data-rich geographic entities. >> >> CLAVIN does not simply "look up" location names – it uses intelligent >> heuristics to identify exactly which "Springfield" (for example) was >> intended by the author, based on the context of the document. CLAVIN >> also employs fuzzy search to handle incorrectly-spelled location >> names, and it recognizes alternative names (e.g., "Ivory Coast" and >> "Côte d'Ivoire") as referring to the same geographic entity. >> >> By enriching text documents with structured geo data, CLAVIN enables >> hierarchical geospatial search and advanced geospatial analytics on >> unstructured data. >> " >> >> http://clavin.bericotechnologies.com/ >> >> >> Maybe this could be used as an enhancer for stanbol? >> >> >> >> -- >> Best Regards, >> >> Christian Ledermann >> >> Nairobi - Kenya >> Mobile : +254 702978914 >> >> <*)))>{ >> >> If you save the living environment, the biodiversity that we have left, >> you will also automatically save the physical environment, too. But If >> you only save the physical environment, you will ultimately lose both. >> >> 1) Don’t drive species to extinction >> >> 2) Don’t destroy a habitat that species rely on. >> >> 3) Don’t change the climate in ways that will result in the above. >> >> }<(((*> >> -- | Rupert Westenthaler [email protected] | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen
