[ 
https://issues.apache.org/jira/browse/TIKA-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557791#comment-14557791
 ] 

Chris A. Mattmann commented on TIKA-1614:
-----------------------------------------

Just some minor cleanup and tests pass, patch about to be attached:

{noformat}
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Tika parent ................................. SUCCESS [  1.516 s]
[INFO] Apache Tika core ................................... SUCCESS [ 20.335 s]
[INFO] Apache Tika parsers ................................ SUCCESS [02:27 min]
[INFO] Apache Tika XMP .................................... SUCCESS [  2.535 s]
[INFO] Apache Tika serialization .......................... SUCCESS [  2.583 s]
[INFO] Apache Tika batch .................................. SUCCESS [02:05 min]
[INFO] Apache Tika application ............................ SUCCESS [ 41.779 s]
[INFO] Apache Tika OSGi bundle ............................ SUCCESS [ 22.209 s]
[INFO] Apache Tika translate .............................. SUCCESS [  3.316 s]
[INFO] Apache Tika server ................................. SUCCESS [ 22.884 s]
[INFO] Apache Tika examples ............................... SUCCESS [  6.708 s]
[INFO] Apache Tika Java-7 Components ...................... SUCCESS [  2.623 s]
[INFO] Apache Tika ........................................ SUCCESS [  0.028 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 06:39 min
[INFO] Finished at: 2015-05-24T09:55:56-07:00
[INFO] Final Memory: 116M/1523M

{noformat}


> Geo Topic Parser
> ----------------
>
>                 Key: TIKA-1614
>                 URL: https://issues.apache.org/jira/browse/TIKA-1614
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Anya Yun Li
>            Assignee: Chris A. Mattmann
>              Labels: memex
>
> ##Description
> This program aims to provide the support to identify geonames for any 
> unstructured text data in the project NSF polar research. 
> https://github.com/NSF-Polar-Cyberinfrastructure/datavis-hackathon/issues/1
> This project is a content-based geotagging solution, made of a variaty of NLP 
> tools and could be used for any geotagging purposes. 
> ##Workingflow
> 1. Plain text input is passed to geoparser
> 2. Location names are extracted from the text using OpenNLP NER
> 3. Provide two roles: 
>       * The most frequent location name choosed as the best match for the 
> input text
>       * Other extracted locations are treated as alternatives (equal)
> 4. location extracted above, search the best GeoName object and return the 
> resloved objects with fields (name in gazetteer, longitude, latitude)
> ##How to Use
> *Cautions*: This program requires at least 1.2 GB disk space for building 
> Lucene Index
> ```Java
>       function A(stream){
>               Metadata metadata = new Metadata();
>         ParseContext context=new ParseContext();
>         GeoParserConfig config= new GeoParserConfig();
>         config.setGazetterPath(gazetteerPath);
>         config.setNERModelPath(nerPath);
>         context.set(GeoParserConfig.class, config);
>                
>         geoparser.parse(
>                 stream,
>                 new BodyContentHandler(),
>                 metadata,
>                 context);
>    
>        for(String name: metadata.names()){
>          String value=metadata.get(name);
>          System.out.println(name +" " + value);          
>        }
>     }
> ```
> This parser generates useful geographical information to Tika's Metadata 
> Object. 
> Fields for best matched location:
> ```
> Geographic_NAME
> Geographic_LONGTITUDE
> Geographic_LATITUDE
> ```
> Fields for alternatives:
> ```
> Geographic_NAME1
> Geographic_LONGTITUDE1
> Geographic_LATITUDE1
> Geographic_NAME2
> Geographic_LONGTITUDE2
> Geographic_LATITUDE2
> ...
> ```
> If you have any questions, contact me: anyayu...@gmail.com



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to