[ https://issues.apache.org/jira/browse/TIKA-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann updated TIKA-1106: ------------------------------------ Issue Type: New Feature (was: Wish) > CLAVIN Integration > ------------------ > > Key: TIKA-1106 > URL: https://issues.apache.org/jira/browse/TIKA-1106 > Project: Tika > Issue Type: New Feature > Components: parser > Affects Versions: 1.3 > Environment: All > Reporter: Adam Estrada > Assignee: Chris A. Mattmann > Priority: Minor > Labels: entity, geospatial, new-parser > Fix For: 1.8 > > > I've been evaluating CLAVIN as a way to extract location information from > unstructured text. It seems like meshing it with Tika in some way would make > a lot of sense. From CLAVIN website... > {quote} > CLAVIN (*Cartographic Location And Vicinity INdexer*) is an open source > software package for document geotagging and geoparsing that employs > context-based geographic entity resolution. It combines a variety of open > source tools with natural language processing techniques to extract location > names from unstructured text documents and resolve them against gazetteer > records. Importantly, CLAVIN does not simply "look up" location names; > rather, it uses intelligent heuristics in an attempt to identify precisely > which "Springfield" (for example) was intended by the author, based on the > context of the document. CLAVIN also employs fuzzy search to handle > incorrectly-spelled location names, and it recognizes alternative names > (e.g., "Ivory Coast" and "Côte d'Ivoire") as referring to the same geographic > entity. By enriching text documents with structured geo data, CLAVIN enables > hierarchical geospatial search and advanced geospatial analytics on > unstructured data. > {quote} > There was only one other instance of the word "clavin" mentioned in the ASF > jira site so I thought it was definitely worth posting here. > https://github.com/Berico-Technologies/CLAVIN -- This message was sent by Atlassian JIRA (v6.3.4#6332)