Revision: 18099
http://sourceforge.net/p/gate/code/18099
Author: leondz
Date: 2014-06-17 10:44:51 +0000 (Tue, 17 Jun 2014)
Log Message:
-----------
update to reflect Stanf_CoreNLP plugin
Modified Paths:
--------------
userguide/trunk/misc-creole.tex
userguide/trunk/parsers.tex
userguide/trunk/plugin-name-map.tex
userguide/trunk/social-media.tex
userguide/trunk/tao_main.tex
Modified: userguide/trunk/misc-creole.tex
===================================================================
--- userguide/trunk/misc-creole.tex 2014-06-17 01:19:44 UTC (rev 18098)
+++ userguide/trunk/misc-creole.tex 2014-06-17 10:44:51 UTC (rev 18099)
@@ -2854,11 +2854,19 @@
%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\sect[sec:misc:creole:stanford]{Stanford Part-of-Speech Tagger}
+\sect[sec:misc:creole:stanford]{Stanford CoreNLP}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+GATE supports some of the NLP tools from Stanford, collectively known as
+Stanford CoreNLP. It currently supports named entity recognition,
+part-of-speech tagging, and parsing. Note that Stanford CoreNLP models are
+often not compatible between its different versions.
+
+
+\subsect[sec:misc:creole:stanford:pos]{Stanford Tagger}
+This tool is a cyclic-dependency based machine-learning PoS
tagger~\cite{Toutanova2003a}.
To use the Stanford Part-of-Speech
tagger\footnote{\url{http://www-nlp.stanford.edu/software/tagger.shtml}}
-within GATE you need first to load the \verb|Tagger_Stanford| plugin.
+within GATE you need first to load the \verb|Stanford_CoreNLP| plugin.
The PR is configured using the following initialization time parameters:
@@ -2895,10 +2903,52 @@
will be preserved. Furthermore, the pre-existing tags for these tokens will
be fed through to the tagger and may influence the tags of their
surrounding context by constraining the possible sequences of tags for the
- sentence as a whole. If false, existing category features are ignored and
- overwritten with the output of the tagger. Defaults to true.
+ sentence as a whole (see also~\cite{Derczynski2013c}). If false, existing
+ category features are ignored and overwritten with the output of the tagger.
+ Defaults to true.
\end{itemize}
+\subsect[sec:misc:creole:stanford:pos]{Stanford Parser}
+
+The GATE interface to the Stanford Parser is detailed in Section
\ref{sec:parsers:stanford}.
+
+\subsect[sec:misc:creole:stanford:pos]{Stanford Named Entity Recognition}
+
+Stanford NER provides a CRF-based approach to finding named entity
chunks~\cite{Finkel2005StanfordNER},
+based on an externally-learned model file.
+
+
+The PR is configured using the following initialization time parameters:
+
+\begin{itemize}
+\item \textbf{modelFile:} the URL to the named entity recognition model. This
defaults to a
+ fast English model but further models for other languages are available from
downloads on the
+ \htlink{http://nlp.stanford.edu/software/CRF-NER.shtml}{Stanford NER
homepage}.
+\end{itemize}
+
+Further configuration of the NER tool is via the following runtime parameters:
+
+\begin{itemize}
+\item \textbf{baseSentenceAnnotationType:} the input annotation type which
+ represents sentences; defaults to Sentence.
+\item \textbf{baseTokenAnnotationType:} the input annotation type which
+ represents tokens; defaults to Token
+\item \textbf{failOnMissingInputAnnotations:} if true and no annotations of
+ the types specified in the previous two options are found then an an
+ exception will be thrown halting any further processing. If false, a warning
+ will be printed instead and processing will continue. Defaults to true to
help
+ quickly catch misconfiguration during application development.
+\item \textbf{inputASName:} the name of the annotation set that serves as input
+ to the tagger (i.e. where the tagger will look for sentences and tokens to
+ process); defaults to the default unnamed annotation set.
+\item \textbf{outputASName:} the name of the annotation set into which the
+ results of running the tagger will be stored; defaults to the default unnamed
+ annotation set.
+\item \textbf{outsideLabel:} the label assigned to tokens outside of an entity;
+ e.g., the ``O" in a BIO labelling scheme; defaults to \verb|O|.
+\end{itemize}
+
+
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\sect[sec:misc-creole:boilerpipe]{Content Detection Using Boilerpipe}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Modified: userguide/trunk/parsers.tex
===================================================================
--- userguide/trunk/parsers.tex 2014-06-17 01:19:44 UTC (rev 18098)
+++ userguide/trunk/parsers.tex 2014-06-17 10:44:51 UTC (rev 18099)
@@ -559,9 +559,8 @@
German.
-This plugin, `Parser\_Stanford', developed by the GATE team, provides a PR
-(\texttt{gate.stanford.Parser}) that acts as a wrapper around the Stanford
-Parser (version 2.0.4) and translates GATE annotations to and from the data
+This PR (\texttt{gate.stanford.Parser}) acts as a wrapper around the Stanford
+Parser and translates GATE annotations to and from the data
structures of the parser itself. The plugin is supplied with the unmodified
\texttt{jar} file and one English data file obtained from Stanford. Stanford's
software itself is subject to the full GPL.
@@ -570,10 +569,7 @@
The parser itself can be trained on other corpora and languages, as documented
on the \htlink{http://nlp.stanford.edu/software/lex-parser.shtml}{website}, but
this plugin does not provide a means of doing so. Trained data files are not
-necessarily compatible between different versions of the parser; in particular
-files from versions before 2.0 are probably incompatible with the current
-software. (GATE switched from 1.6 to 1.6.1 at build 3120 in January 2009, to
-1.6.5 in December 2010, to 1.6.8 in August 2011, and to 2.0.1 in March 2012.)
+necessarily compatible between different versions of the parser.
The current versions of the Stanford parser and this PR are threadsafe.
Multiple instances of the PR with the same or different model files can be used
Modified: userguide/trunk/plugin-name-map.tex
===================================================================
--- userguide/trunk/plugin-name-map.tex 2014-06-17 01:19:44 UTC (rev 18098)
+++ userguide/trunk/plugin-name-map.tex 2014-06-17 10:44:51 UTC (rev 18099)
@@ -40,7 +40,7 @@
openNLP & OpenNLP\\
rasp & Parser\_RASP\\
romanian & Lang\_Romanian\\
-Stanford & Parser\_Stanford\\
+Stanford & Stanford\_CoreNLP\\
Stemmer & Stemmer\_Snowball\\
SUPPLE & Parser\_SUPPLE\\
TaggerFramework & Tagger\_Framework\\
Modified: userguide/trunk/social-media.tex
===================================================================
--- userguide/trunk/social-media.tex 2014-06-17 01:19:44 UTC (rev 18098)
+++ userguide/trunk/social-media.tex 2014-06-17 10:44:51 UTC (rev 18099)
@@ -30,7 +30,7 @@
\sect[sec:social:twitter]{Tools for Twitter}
The \verb!Twitter! plugin contains several tools useful for processing tweets.
-This plugin depends on the \verb!Tagger_Stanford! plugin, which must be loaded
+This plugin depends on the \verb!Stanford_CoreNLP! plugin, which must be loaded
first. This includes tools to load documents into GATE from the JSON format
provided by the Twitter APIs, a tokeniser and POS tagger tuned specifically for
Tweets, a tool to split up multi-word hashtags, and an example named entity
Modified: userguide/trunk/tao_main.tex
===================================================================
--- userguide/trunk/tao_main.tex 2014-06-17 01:19:44 UTC (rev 18098)
+++ userguide/trunk/tao_main.tex 2014-06-17 10:44:51 UTC (rev 18099)
@@ -85,6 +85,7 @@
Johann Petrak
Yaoyong Li
Wim Peters
+ Leon Derczynski
et al
},
pdfsubject={GATE, Language Processing},
This was sent by the SourceForge.net collaborative development platform, the
world's largest Open Source development site.
------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs