[gate-cvs] SF.net SVN: gate:[18099] userguide/trunk

leondz Tue, 17 Jun 2014 03:45:16 -0700

Revision: 18099
          http://sourceforge.net/p/gate/code/18099
Author:   leondz
Date:     2014-06-17 10:44:51 +0000 (Tue, 17 Jun 2014)
Log Message:
-----------
update to reflect Stanf_CoreNLP plugin


Modified Paths:
--------------
    userguide/trunk/misc-creole.tex
    userguide/trunk/parsers.tex
    userguide/trunk/plugin-name-map.tex
    userguide/trunk/social-media.tex
    userguide/trunk/tao_main.tex

Modified: userguide/trunk/misc-creole.tex
===================================================================
--- userguide/trunk/misc-creole.tex     2014-06-17 01:19:44 UTC (rev 18098)
+++ userguide/trunk/misc-creole.tex     2014-06-17 10:44:51 UTC (rev 18099)
@@ -2854,11 +2854,19 @@
 %%
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\sect[sec:misc:creole:stanford]{Stanford Part-of-Speech Tagger}
+\sect[sec:misc:creole:stanford]{Stanford CoreNLP}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
+GATE supports some of the NLP tools from Stanford, collectively known as 
+Stanford CoreNLP. It currently supports named entity recognition, 
+part-of-speech tagging, and parsing. Note that Stanford CoreNLP models are
+often not compatible between its different versions.
+
+
+\subsect[sec:misc:creole:stanford:pos]{Stanford Tagger}
+This tool is a cyclic-dependency based machine-learning PoS 
tagger~\cite{Toutanova2003a}.
 To use the Stanford Part-of-Speech 
tagger\footnote{\url{http://www-nlp.stanford.edu/software/tagger.shtml}}
-within GATE you need first to load the \verb|Tagger_Stanford| plugin.
+within GATE you need first to load the \verb|Stanford_CoreNLP| plugin.
 
 The PR is configured using the following initialization time parameters:
 
@@ -2895,10 +2903,52 @@
   will be preserved.  Furthermore, the pre-existing tags for these tokens will
   be fed through to the tagger and may influence the tags of their
   surrounding context by constraining the possible sequences of tags for the
-  sentence as a whole.  If false, existing category features are ignored and
-  overwritten with the output of the tagger.  Defaults to true.
+  sentence as a whole (see also~\cite{Derczynski2013c}).  If false, existing
+  category features are ignored and overwritten with the output of the tagger.
+  Defaults to true.
 \end{itemize}
 
+\subsect[sec:misc:creole:stanford:pos]{Stanford Parser}
+
+The GATE interface to the Stanford Parser is detailed in Section 
\ref{sec:parsers:stanford}.
+
+\subsect[sec:misc:creole:stanford:pos]{Stanford Named Entity Recognition}
+
+Stanford NER provides a CRF-based approach to finding named entity 
chunks~\cite{Finkel2005StanfordNER},
+based on an externally-learned model file.
+
+
+The PR is configured using the following initialization time parameters:
+
+\begin{itemize}
+\item \textbf{modelFile:} the URL to the named entity recognition model. This 
defaults to a
+  fast English model but further models for other languages are available from 
downloads on the
+  \htlink{http://nlp.stanford.edu/software/CRF-NER.shtml}{Stanford NER 
homepage}.
+\end{itemize}
+
+Further configuration of the NER tool is via the following runtime parameters:
+
+\begin{itemize}
+\item \textbf{baseSentenceAnnotationType:} the input annotation type which
+  represents sentences; defaults to Sentence.
+\item \textbf{baseTokenAnnotationType:} the input annotation type which
+  represents tokens; defaults to Token
+\item \textbf{failOnMissingInputAnnotations:} if true and no annotations of
+  the types specified in the previous two options are found then an an
+  exception will be thrown halting any further processing. If false, a warning
+  will be printed instead and processing will continue. Defaults to true to 
help
+  quickly catch misconfiguration during application development.
+\item \textbf{inputASName:} the name of the annotation set that serves as input
+  to the tagger (i.e. where the tagger will look for sentences and tokens to
+  process); defaults to the default unnamed annotation set.
+\item \textbf{outputASName:} the name of the annotation set into which the
+  results of running the tagger will be stored; defaults to the default unnamed
+  annotation set.
+\item \textbf{outsideLabel:} the label assigned to tokens outside of an entity;
+  e.g., the ``O" in a BIO labelling scheme; defaults to \verb|O|.
+\end{itemize}
+
+
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \sect[sec:misc-creole:boilerpipe]{Content Detection Using Boilerpipe}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Modified: userguide/trunk/parsers.tex
===================================================================
--- userguide/trunk/parsers.tex 2014-06-17 01:19:44 UTC (rev 18098)
+++ userguide/trunk/parsers.tex 2014-06-17 10:44:51 UTC (rev 18099)
@@ -559,9 +559,8 @@
 German.
 
 
-This plugin, `Parser\_Stanford', developed by the GATE team, provides a PR
-(\texttt{gate.stanford.Parser}) that acts as a wrapper around the Stanford
-Parser (version 2.0.4) and translates GATE annotations to and from the data
+This PR (\texttt{gate.stanford.Parser}) acts as a wrapper around the Stanford
+Parser and translates GATE annotations to and from the data
 structures of the parser itself.  The plugin is supplied with the unmodified
 \texttt{jar} file and one English data file obtained from Stanford.  Stanford's
 software itself is subject to the full GPL.
@@ -570,10 +569,7 @@
 The parser itself can be trained on other corpora and languages, as documented
 on the \htlink{http://nlp.stanford.edu/software/lex-parser.shtml}{website}, but
 this plugin does not provide a means of doing so.  Trained data files are not
-necessarily compatible between different versions of the parser; in particular
-files from versions before 2.0 are probably incompatible with the current
-software.  (GATE switched from 1.6 to 1.6.1 at build 3120 in January 2009, to
-1.6.5 in December 2010, to 1.6.8 in August 2011, and to 2.0.1 in March 2012.)
+necessarily compatible between different versions of the parser.
 
 The current versions of the Stanford parser and this PR are threadsafe.
 Multiple instances of the PR with the same or different model files can be used

Modified: userguide/trunk/plugin-name-map.tex
===================================================================
--- userguide/trunk/plugin-name-map.tex 2014-06-17 01:19:44 UTC (rev 18098)
+++ userguide/trunk/plugin-name-map.tex 2014-06-17 10:44:51 UTC (rev 18099)
@@ -40,7 +40,7 @@
 openNLP & OpenNLP\\
 rasp & Parser\_RASP\\
 romanian & Lang\_Romanian\\
-Stanford & Parser\_Stanford\\
+Stanford & Stanford\_CoreNLP\\
 Stemmer & Stemmer\_Snowball\\
 SUPPLE & Parser\_SUPPLE\\
 TaggerFramework & Tagger\_Framework\\

Modified: userguide/trunk/social-media.tex
===================================================================
--- userguide/trunk/social-media.tex    2014-06-17 01:19:44 UTC (rev 18098)
+++ userguide/trunk/social-media.tex    2014-06-17 10:44:51 UTC (rev 18099)
@@ -30,7 +30,7 @@
 \sect[sec:social:twitter]{Tools for Twitter}
 
 The \verb!Twitter! plugin contains several tools useful for processing tweets.
-This plugin depends on the \verb!Tagger_Stanford! plugin, which must be loaded
+This plugin depends on the \verb!Stanford_CoreNLP! plugin, which must be loaded
 first.  This includes tools to load documents into GATE from the JSON format
 provided by the Twitter APIs, a tokeniser and POS tagger tuned specifically for
 Tweets, a tool to split up multi-word hashtags, and an example named entity

Modified: userguide/trunk/tao_main.tex
===================================================================
--- userguide/trunk/tao_main.tex        2014-06-17 01:19:44 UTC (rev 18098)
+++ userguide/trunk/tao_main.tex        2014-06-17 10:44:51 UTC (rev 18099)
@@ -85,6 +85,7 @@
       Johann Petrak
       Yaoyong Li
       Wim Peters
+      Leon Derczynski
       et al
     },
     pdfsubject={GATE, Language Processing},

This was sent by the SourceForge.net collaborative development platform, the 
world's largest Open Source development site.


------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs

[gate-cvs] SF.net SVN: gate:[18099] userguide/trunk

Reply via email to