Anya Yun Li created TIKA-1614:
---------------------------------

             Summary: Geo Topic Parser
                 Key: TIKA-1614
                 URL: https://issues.apache.org/jira/browse/TIKA-1614
             Project: Tika
          Issue Type: New Feature
          Components: parser
            Reporter: Anya Yun Li


##Description

This program aims to provide the support to identify geonames for any 
unstructured text data in the project NSF polar research. 
https://github.com/NSF-Polar-Cyberinfrastructure/datavis-hackathon/issues/1

This project is a content-based geotagging solution, made of a variaty of NLP 
tools and could be used for any geotagging purposes. 

##Workingflow

1. Plain text input is passed to geoparser

2. Location names are extracted from the text using OpenNLP NER

3. Provide two roles: 
        * The most frequent location name choosed as the best match for the 
input text
        * Other extracted locations are treated as alternatives (equal)

4. location extracted above, search the best GeoName object and return the 
resloved objects with fields (name in gazetteer, longitude, latitude)

##How to Use
*Cautions*: This program requires at least 1.2 GB disk space for building 
Lucene Index

```Java
        function A(stream){
                Metadata metadata = new Metadata();
        ParseContext context=new ParseContext();
        GeoParserConfig config= new GeoParserConfig();
        config.setGazetterPath(gazetteerPath);
        config.setNERModelPath(nerPath);
        context.set(GeoParserConfig.class, config);
               
        geoparser.parse(
                stream,
                new BodyContentHandler(),
                metadata,
                context);
   
       for(String name: metadata.names()){
           String value=metadata.get(name);
           System.out.println(name +" " + value);          
       }
    }
```
This parser generates useful geographical information to Tika's Metadata 
Object. 

Fields for best matched location:
```
Geographic_NAME
Geographic_LONGTITUDE
Geographic_LATITUDE
```
Fields for alternatives:
```
Geographic_NAME1
Geographic_LONGTITUDE1
Geographic_LATITUDE1

Geographic_NAME2
Geographic_LONGTITUDE2
Geographic_LATITUDE2

...

```
If you have any questions, contact me: anyayu...@gmail.com





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to