[
https://issues.apache.org/jira/browse/TIKA-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557920#comment-14557920
]
Hudson commented on TIKA-1614:
------------------------------
SUCCESS: Integrated in tika-trunk-jdk1.7 #704 (See
[https://builds.apache.org/job/tika-trunk-jdk1.7/704/])
fix for TIKA-1614 Geo Topic Parser contributed by aranyali
<[email protected]> and modified and updated by Chris Mattmann thi closes
#43. (mattmann: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1681541)
* /tika/trunk/tika-app/pom.xml
* /tika/trunk/tika-bundle/pom.xml
* /tika/trunk/tika-parsers/pom.xml
* /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/geo
* /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/geo/topic
*
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/geo/topic/GeoParser.java
*
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/geo/topic/GeoParserConfig.java
*
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/geo/topic/GeoTag.java
*
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/geo/topic/NameEntityExtractor.java
*
/tika/trunk/tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser
* /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/geo
* /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/geo/topic
*
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/geo/topic/GeoParserTest.java
> Geo Topic Parser
> ----------------
>
> Key: TIKA-1614
> URL: https://issues.apache.org/jira/browse/TIKA-1614
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Reporter: Anya Yun Li
> Assignee: Chris A. Mattmann
> Labels: memex
> Fix For: 1.9
>
> Attachments: TIKA-1614.Mattmann.Li.052405.patch.txt
>
>
> ##Description
> This program aims to provide the support to identify geonames for any
> unstructured text data in the project NSF polar research.
> https://github.com/NSF-Polar-Cyberinfrastructure/datavis-hackathon/issues/1
> This project is a content-based geotagging solution, made of a variaty of NLP
> tools and could be used for any geotagging purposes.
> ##Workingflow
> 1. Plain text input is passed to geoparser
> 2. Location names are extracted from the text using OpenNLP NER
> 3. Provide two roles:
> * The most frequent location name choosed as the best match for the
> input text
> * Other extracted locations are treated as alternatives (equal)
> 4. location extracted above, search the best GeoName object and return the
> resloved objects with fields (name in gazetteer, longitude, latitude)
> ##How to Use
> *Cautions*: This program requires at least 1.2 GB disk space for building
> Lucene Index
> ```Java
> function A(stream){
> Metadata metadata = new Metadata();
> ParseContext context=new ParseContext();
> GeoParserConfig config= new GeoParserConfig();
> config.setGazetterPath(gazetteerPath);
> config.setNERModelPath(nerPath);
> context.set(GeoParserConfig.class, config);
>
> geoparser.parse(
> stream,
> new BodyContentHandler(),
> metadata,
> context);
>
> for(String name: metadata.names()){
> String value=metadata.get(name);
> System.out.println(name +" " + value);
> }
> }
> ```
> This parser generates useful geographical information to Tika's Metadata
> Object.
> Fields for best matched location:
> ```
> Geographic_NAME
> Geographic_LONGTITUDE
> Geographic_LATITUDE
> ```
> Fields for alternatives:
> ```
> Geographic_NAME1
> Geographic_LONGTITUDE1
> Geographic_LATITUDE1
> Geographic_NAME2
> Geographic_LONGTITUDE2
> Geographic_LATITUDE2
> ...
> ```
> If you have any questions, contact me: [email protected]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)