Revision: 17725
http://sourceforge.net/p/gate/code/17725
Author: ian_roberts
Date: 2014-03-21 15:06:40 +0000 (Fri, 21 Mar 2014)
Log Message:
-----------
Basic PR to send a document to the TextRazor online service for annotation
(http://www.textrazor.com). The PR produces "TREntity" output annotations, a
JAPE grammar is also provided to convert these to ANNIE-style Person, Location
and Organization annotations.
Added Paths:
-----------
gate/trunk/plugins/Tagger_TextRazor/
gate/trunk/plugins/Tagger_TextRazor/.classpath
gate/trunk/plugins/Tagger_TextRazor/.project
gate/trunk/plugins/Tagger_TextRazor/build.xml
gate/trunk/plugins/Tagger_TextRazor/creole.xml
gate/trunk/plugins/Tagger_TextRazor/doc/
gate/trunk/plugins/Tagger_TextRazor/lib/
gate/trunk/plugins/Tagger_TextRazor/lib/textrazor-1.0.0.jar
gate/trunk/plugins/Tagger_TextRazor/resources/
gate/trunk/plugins/Tagger_TextRazor/resources/jape/
gate/trunk/plugins/Tagger_TextRazor/resources/jape/TextRazor-to-ANNIE.jape
gate/trunk/plugins/Tagger_TextRazor/src/
gate/trunk/plugins/Tagger_TextRazor/src/gate/
gate/trunk/plugins/Tagger_TextRazor/src/gate/textrazor/
gate/trunk/plugins/Tagger_TextRazor/src/gate/textrazor/TextRazorServicePR.java
Index: gate/trunk/plugins/Tagger_TextRazor
===================================================================
--- gate/trunk/plugins/Tagger_TextRazor 2014-03-21 15:00:22 UTC (rev 17724)
+++ gate/trunk/plugins/Tagger_TextRazor 2014-03-21 15:06:40 UTC (rev 17725)
Property changes on: gate/trunk/plugins/Tagger_TextRazor
___________________________________________________________________
Added: svn:ignore
## -0,0 +1,2 ##
+classes
+TextRazorTagger.jar
Added: gate/trunk/plugins/Tagger_TextRazor/.classpath
===================================================================
--- gate/trunk/plugins/Tagger_TextRazor/.classpath
(rev 0)
+++ gate/trunk/plugins/Tagger_TextRazor/.classpath 2014-03-21 15:06:40 UTC
(rev 17725)
@@ -0,0 +1,8 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<classpath>
+ <classpathentry kind="src" path="src"/>
+ <classpathentry exported="true" kind="lib"
path="lib/textrazor-1.0.0.jar"/>
+ <classpathentry kind="con"
path="org.eclipse.jdt.launching.JRE_CONTAINER/org.eclipse.jdt.internal.debug.ui.launcher.StandardVMType/JavaSE-1.6"/>
+ <classpathentry combineaccessrules="false" exported="true" kind="src"
path="/GATE"/>
+ <classpathentry kind="output" path="classes"/>
+</classpath>
Property changes on: gate/trunk/plugins/Tagger_TextRazor/.classpath
___________________________________________________________________
Added: svn:mime-type
## -0,0 +1 ##
+text/xml
\ No newline at end of property
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: gate/trunk/plugins/Tagger_TextRazor/.project
===================================================================
--- gate/trunk/plugins/Tagger_TextRazor/.project
(rev 0)
+++ gate/trunk/plugins/Tagger_TextRazor/.project 2014-03-21 15:06:40 UTC
(rev 17725)
@@ -0,0 +1,17 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<projectDescription>
+ <name>GATE-plugin-Tagger_TextRazor</name>
+ <comment></comment>
+ <projects>
+ </projects>
+ <buildSpec>
+ <buildCommand>
+ <name>org.eclipse.jdt.core.javabuilder</name>
+ <arguments>
+ </arguments>
+ </buildCommand>
+ </buildSpec>
+ <natures>
+ <nature>org.eclipse.jdt.core.javanature</nature>
+ </natures>
+</projectDescription>
Property changes on: gate/trunk/plugins/Tagger_TextRazor/.project
___________________________________________________________________
Added: svn:mime-type
## -0,0 +1 ##
+text/xml
\ No newline at end of property
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: gate/trunk/plugins/Tagger_TextRazor/build.xml
===================================================================
--- gate/trunk/plugins/Tagger_TextRazor/build.xml
(rev 0)
+++ gate/trunk/plugins/Tagger_TextRazor/build.xml 2014-03-21 15:06:40 UTC
(rev 17725)
@@ -0,0 +1,90 @@
+<project name="Tagger_TextRazor" basedir=".">
+ <!-- Prevent Ant from warning about includeantruntime not being set -->
+ <property name="build.sysclasspath" value="ignore" />
+
+ <property file="build.properties" />
+
+ <property name="gate.home" location="../.." />
+ <property name="gate.lib" location="${gate.home}/lib" />
+ <property name="lib" location="lib" />
+ <property name="gate.jar" location="${gate.home}/bin/gate.jar" />
+ <property name="src.dir" location="src" />
+ <property name="classes.dir" location="classes" />
+ <property name="jar.location" location="TextRazorTagger.jar" />
+ <property name="doc.dir" location="doc" />
+ <property name="javadoc.dir" location="${doc.dir}/javadoc" />
+ <property name="test.dir" location="test" />
+ <property name="test.reports.dir" location="${test.dir}/reports" />
+ <property name="test.src.dir" location="${test.dir}/src" />
+ <property name="test.classes.dir" location="${test.dir}/classes" />
+
+ <!-- Path to compile - includes gate.jar and GATE/lib/*.jar -->
+ <path id="compile.classpath">
+ <pathelement location="${gate.jar}" />
+ <fileset dir="${gate.lib}">
+ <include name="**/*.jar" />
+ <include name="**/*.zip" />
+ </fileset>
+ <fileset dir="${lib}">
+ <include name="**/*.jar" />
+ </fileset>
+ </path>
+
+ <!-- create build directory structure -->
+ <target name="prepare">
+ <mkdir dir="${classes.dir}" />
+ </target>
+
+ <!-- compile the source -->
+ <target name="compile" depends="prepare">
+ <javac classpathref="compile.classpath" srcdir="${src.dir}"
destdir="${classes.dir}" debug="true" debuglevel="lines,source" source="1.5" />
+ </target>
+
+ <!-- create the JAR file -->
+ <target name="jar" depends="compile">
+ <jar destfile="${jar.location}" update="false" basedir="${classes.dir}" />
+ </target>
+
+ <!-- remove the generated .class files -->
+ <target name="clean.classes">
+ <delete dir="${classes.dir}" />
+ </target>
+
+ <!-- Clean up - remove .class and .jar files -->
+ <target name="clean" depends="clean.classes">
+ <delete file="${jar.location}" />
+ </target>
+
+ <!-- Targets used by the main GATE build file:
+ build: build the plugin - just calls "jar" target
+ test : run the unit tests - there aren't any
+ distro.prepare: remove intermediate files that shouldn't be in the
+ distribution
+ -->
+
+ <!-- Build JavaDoc documentation -->
+ <target name="doc.prepare">
+ <mkdir dir="${javadoc.dir}" />
+ </target>
+
+ <target name="javadoc" depends="doc.prepare">
+ <javadoc destdir="${javadoc.dir}" packagenames="*"
classpathref="compile.classpath" encoding="UTF-8" windowtitle="TextRazor Tagger
JavaDoc" source="1.6" public="true">
+ <sourcepath>
+ <pathelement location="${src.dir}" />
+ </sourcepath>
+ <link href="http://docs.oracle.com/javase/6/docs/api/" />
+ <link href="http://gate.ac.uk/gate/doc/javadoc/" />
+ </javadoc>
+ </target>
+
+ <target name="build" depends="jar" />
+
+ <!-- Remove JUnit test results -->
+ <target name="distro.prepare" depends="clean.classes">
+ <delete>
+ <fileset dir="." includes="TEST*.xml" />
+ </delete>
+ </target>
+
+ <target name="test" />
+</project>
Property changes on: gate/trunk/plugins/Tagger_TextRazor/build.xml
___________________________________________________________________
Added: svn:mime-type
## -0,0 +1 ##
+text/xml
\ No newline at end of property
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: gate/trunk/plugins/Tagger_TextRazor/creole.xml
===================================================================
--- gate/trunk/plugins/Tagger_TextRazor/creole.xml
(rev 0)
+++ gate/trunk/plugins/Tagger_TextRazor/creole.xml 2014-03-21 15:06:40 UTC
(rev 17725)
@@ -0,0 +1,5 @@
+<?xml version="1.0"?>
+<CREOLE-DIRECTORY>
+ <JAR scan="true">TextRazorTagger.jar</JAR>
+ <JAR>lib/textrazor-1.0.0.jar</JAR>
+</CREOLE-DIRECTORY>
Added: gate/trunk/plugins/Tagger_TextRazor/lib/textrazor-1.0.0.jar
===================================================================
(Binary files differ)
Index: gate/trunk/plugins/Tagger_TextRazor/lib/textrazor-1.0.0.jar
===================================================================
--- gate/trunk/plugins/Tagger_TextRazor/lib/textrazor-1.0.0.jar 2014-03-21
15:00:22 UTC (rev 17724)
+++ gate/trunk/plugins/Tagger_TextRazor/lib/textrazor-1.0.0.jar 2014-03-21
15:06:40 UTC (rev 17725)
Property changes on: gate/trunk/plugins/Tagger_TextRazor/lib/textrazor-1.0.0.jar
___________________________________________________________________
Added: svn:mime-type
## -0,0 +1 ##
+application/zip
\ No newline at end of property
Added:
gate/trunk/plugins/Tagger_TextRazor/resources/jape/TextRazor-to-ANNIE.jape
===================================================================
--- gate/trunk/plugins/Tagger_TextRazor/resources/jape/TextRazor-to-ANNIE.jape
(rev 0)
+++ gate/trunk/plugins/Tagger_TextRazor/resources/jape/TextRazor-to-ANNIE.jape
2014-03-21 15:06:40 UTC (rev 17725)
@@ -0,0 +1,50 @@
+Imports: {
+ import static gate.Utils.*;
+}
+
+Phase: TextRazorToANNIE
+Options: control = all
+Input: TREntity
+
+/*
+ * Basic heuristic rules to convert specific types of TextRazor entities into
+ * the equivalent ANNIE-style annotation types. So far we simply turn
+ *
+ * - DBpedia Person into ANNIE Person
+ * - DBPedia Place into ANNIE Location
+ * - DBPedia Company and FreeBase organization into ANNIE Organization
+ */
+Rule: EntityToANNIE
+(
+ {TREntity}
+):tr
+-->
+:tr {
+ Annotation ent = getOnlyAnn(trAnnots);
+ List<String> types = (List<String>)ent.getFeatures().get("type");
+ if(types == null) { types = Collections.emptyList(); }
+ List<String> freebaseTypes =
(List<String>)ent.getFeatures().get("freebaseTypes");
+ if(freebaseTypes == null) { freebaseTypes = Collections.emptyList(); }
+ String ent_id = (String)ent.getFeatures().get("ent_id");
+ String link = (String)ent.getFeatures().get("link");
+
+ if(types.contains("Place")) {
+ addAnn(outputAS, ent, "Location", featureMap(
+ "ent_id", ent_id,
+ "link", link));
+ }
+
+ if(types.contains("Person")) {
+ addAnn(outputAS, ent, "Person", featureMap(
+ "ent_id", ent_id,
+ "link", link));
+ }
+
+ if(types.contains("Company") ||
+ freebaseTypes.contains("/organization/organization")) {
+ addAnn(outputAS, ent, "Organization", featureMap(
+ "ent_id", ent_id,
+ "link", link));
+ }
+
+}
Added:
gate/trunk/plugins/Tagger_TextRazor/src/gate/textrazor/TextRazorServicePR.java
===================================================================
---
gate/trunk/plugins/Tagger_TextRazor/src/gate/textrazor/TextRazorServicePR.java
(rev 0)
+++
gate/trunk/plugins/Tagger_TextRazor/src/gate/textrazor/TextRazorServicePR.java
2014-03-21 15:06:40 UTC (rev 17725)
@@ -0,0 +1,170 @@
+/*
+ * Copyright (c) 2009-2014, The University of Sheffield.
+ *
+ * This file is part of GATE (see http://gate.ac.uk/), and is free software,
+ * Licensed under the GNU Library General Public License, Version 3, June 2007
+ * (in the distribution as file licence.html, and also available at
+ * http://gate.ac.uk/gate/licence.html).
+ */
+package gate.textrazor;
+
+import gate.AnnotationSet;
+import gate.Resource;
+import gate.Utils;
+import gate.creole.ExecutionException;
+import gate.creole.ResourceInstantiationException;
+import gate.creole.metadata.CreoleParameter;
+import gate.creole.metadata.CreoleResource;
+import gate.creole.metadata.Optional;
+import gate.creole.metadata.RunTime;
+import gate.event.ProgressListener;
+import gate.util.InvalidOffsetException;
+
+import java.text.NumberFormat;
+import java.util.Arrays;
+
+import com.textrazor.AnalysisException;
+import com.textrazor.NetworkException;
+import com.textrazor.TextRazor;
+import com.textrazor.annotations.AnalyzedText;
+import com.textrazor.annotations.Entity;
+
+/**
+ * The PR uses TextRazor online service to annotate documents.
+ *
+ * @author Ian Roberts
+ */
+@CreoleResource(name = "TextRazor Service PR",
+ comment = "Runs the TextRazor annotation service (http://textrazor.com) on a
GATE document")
+public class TextRazorServicePR extends gate.creole.AbstractLanguageAnalyser
+ implements ProgressListener {
+
+ private static final long serialVersionUID = 6295675573632131139L;
+
+ /**
+ * API key. One has to obtain this from TextRazor by creating an account
+ * online
+ */
+ private String apiKey;
+
+ /**
+ * Name of the annotation set where new annotations should be created.
+ */
+ private String outputASName;
+
+ /**
+ * TextRazor service
+ */
+ private TextRazor client = null;
+
+ /** Initialise this resource, and return it. */
+ public Resource init() throws ResourceInstantiationException {
+ if(getApiKey() == null || getApiKey().isEmpty()) { throw new
ResourceInstantiationException(
+ "Invalid API key. Please visit TextRazor web site for more
information"); }
+ // intiate the service
+ client = new TextRazor(getApiKey());
+ client.setExtractors(Arrays.asList("words", "entities"));
+ client.setCleanupHTML(false);
+ return this;
+ }
+
+ /* this method is called to reinitialize the resource */
+ public void reInit() throws ResourceInstantiationException {
+ // reinitialization code
+ init();
+ }
+
+ /**
+ * Should be called to execute this PR on a document.
+ */
+ public void execute() throws ExecutionException {
+ fireStatusChanged("Checking runtime parameters");
+ progressChanged(0);
+ // if no document provided
+ if(document == null) { throw new ExecutionException("Document is null!"); }
+ // start time
+ long startTime;
+ try {
+ // obtain the content
+ String documentContent = document.getContent().toString();
+ if(documentContent.trim().length() == 0) return;
+ // annotation set to use
+ AnnotationSet set =
+ outputASName == null || outputASName.trim().length() == 0 ? document
+ .getAnnotations() : document.getAnnotations(outputASName);
+ startTime = System.currentTimeMillis();
+ // now process the text
+ // post the content to a service and obtain output
+ // what we get back is the mathcing text which uri in them
+ AnalyzedText result = client.analyze(documentContent);
+ fireStatusChanged("Copying annotations on the document");
+
+ if(result.getResponse().getEntities() == null) {
+ System.out.println("No entities found");
+ } else {
+ for(Entity ent : result.getResponse().getEntities()) {
+ set.add((long)ent.getStartingPos(), (long)ent.getEndingPos(),
"TREntity", Utils.featureMap(
+ "type", ent.getType(),
+ "freebaseTypes", ent.getFreebaseTypes(),
+ "confidence", ent.getConfidenceScore(),
+ "ann_id", ent.getId(),
+ "ent_id", ent.getEntityId(),
+ "link", ent.getWikiLink()));
+ }
+ }
+ } catch(NetworkException e) {
+ throw new ExecutionException(e);
+ } catch(InvalidOffsetException e) {
+ throw new ExecutionException(e);
+ } catch(AnalysisException e) {
+ throw new ExecutionException(e);
+ }
+ // progress
+ progressChanged(100);
+ fireProcessFinished();
+ // let everyone who is interested know that we have now finished
+ fireStatusChanged(document.getName() +
+ " tagged with TextRazorServicePR in " +
+ NumberFormat.getInstance().format(
+ (double)(System.currentTimeMillis() - startTime) / 1000) + "
seconds!");
+ }
+
+
+ public String getOutputASName() {
+ return outputASName;
+ }
+
+ @RunTime
+ @CreoleParameter
+ @Optional
+ public void setOutputASName(String outputASName) {
+ this.outputASName = outputASName;
+ }
+
+ /**
+ * API key. One has to obtain this from TextRazor by creating an account
+ * online
+ */
+ public String getApiKey() {
+ return apiKey;
+ }
+
+ /**
+ * API key. One has to obtain this from TextRazor by creating an account
+ * online
+ */
+ @CreoleParameter(comment = "API key. One has to obtain this from TextRazor
by creating an account online")
+ public void setApiKey(String apiKey) {
+ this.apiKey = apiKey;
+ }
+
+ @Override
+ public void progressChanged(int i) {
+ fireProgressChanged(i);
+ }
+
+ @Override
+ public void processFinished() {
+ fireProcessFinished();
+ }
+} // class
Property changes on:
gate/trunk/plugins/Tagger_TextRazor/src/gate/textrazor/TextRazorServicePR.java
___________________________________________________________________
Added: svn:keywords
## -0,0 +1 ##
+Id
\ No newline at end of property
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
This was sent by the SourceForge.net collaborative development platform, the
world's largest Open Source development site.
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs