[gate-cvs] SF.net SVN: gate:[19404] userguide/trunk

ian_roberts Tue, 07 Jun 2016 03:39:13 -0700

Revision: 19404
          http://sourceforge.net/p/gate/code/19404
Author:   ian_roberts
Date:     2016-06-07 10:38:09 +0000 (Tue, 07 Jun 2016)
Log Message:
-----------
Merged branch changes back to trunk, with 8.3-SNAPSHOT version number


Modified Paths:
--------------
    userguide/trunk/language-creole.tex
    userguide/trunk/misc-creole.tex
    userguide/trunk/recent-changes.tex
    userguide/trunk/tao_main.tex

Property Changed:
----------------
    userguide/trunk/
    userguide/trunk/recent-changes.tex

Index: userguide/trunk
===================================================================
--- userguide/trunk     2016-06-07 01:22:36 UTC (rev 19403)
+++ userguide/trunk     2016-06-07 10:38:09 UTC (rev 19404)

Property changes on: userguide/trunk
___________________________________________________________________
Modified: svn:mergeinfo
## -2,5 +2,6 ##
 /userguide/branches/release-7.0:15332-15399
 /userguide/branches/release-8.0:17948
 /userguide/branches/release-8.1:18738
+/userguide/branches/release-8.2:19338-19355,19357-19360
 /userguide/tags/release-7.0:15400-15404
 /userguide/trunk:10614-10900
\ No newline at end of property
Modified: userguide/trunk/language-creole.tex
===================================================================
--- userguide/trunk/language-creole.tex 2016-06-07 01:22:36 UTC (rev 19403)
+++ userguide/trunk/language-creole.tex 2016-06-07 10:38:09 UTC (rev 19404)
@@ -6,11 +6,11 @@
 \nnormalsize
 
 There are plugins available for processing the following languages:
-French, German, Italian, Chinese, Arabic, Romanian, Hindi, Russian, and
-Cebuano. Some of the applications are quite basic and just contain
-some useful processing resources to get you started when developing a
-full application. Others (Cebuano and Hindi) are more like toy systems
-built as part of an exercise in language portability.
+French, German, Italian, Danish, Chinese, Arabic, Romanian, Hindi, Russian,
+Welsh and Cebuano. Some of the applications are quite basic and just contain
+some useful processing resources to get you started when developing a full
+application. Others (Cebuano and Hindi) are more like toy systems built as
+part of an exercise in language portability.
 
 Note that if you wish to use individual language processing resources
 without loading the whole application, you will need to load the
@@ -326,3 +326,25 @@
 into GATE. Currently no other Bulgarian specific PRs are available so
 the stemmer should be used with the Unicode tokenizer and a sentence splitter
 to process Bulgarian language documents.
+
+\sect[sec:misc-creole:language-plugins:danish]{Danish Plugin}
+The Danish plugin (\verb|Lang_Danish|) contains resources for a Danish IE
+application. As well as a set of tokeniser rules and gazetteer lists tuned for
+Danish, the plugin includes models for the Stanford CoreNLP POS tagger and
+named entity recogniser trained on the Danish PAROLE corpus and the Copenhagen
+Dependency Treebank respectively.  Full details can be found in the EACL 2014
+paper \cite{Derczynski2014d}.
+
+The Java code in this plugin (the tokeniser and gazetteer) is released under
+the same LGPL licence as GATE itself, but the POS tagger and NER models are
+subject to the full GPL as this is the licence of the data used for training.
+
+\sect[sec:misc-creole:language-plugins:welsh]{Welsh Plugin}
+The Welsh plugin (\verb|Lang_Welsh|) is the result of the Welsh Natural
+Language Toolkit
+project\footnote{\htlinkplain{http://hypermedia.research.southwales.ac.uk/kos/wnlt/}},
+funded by the Welsh Government.  It contains a set of resources that mirror the
+English-language ANNIE application, but adapted to the Welsh language.  The
+plugin includes a tokeniser, sentence splitter, POS tagger, morphological
+analyser, gazetteers and named entity JAPE grammars, with a ready-made
+application called \emph{CYMRIE} to combine them all.

Modified: userguide/trunk/misc-creole.tex
===================================================================
--- userguide/trunk/misc-creole.tex     2016-06-07 01:22:36 UTC (rev 19403)
+++ userguide/trunk/misc-creole.tex     2016-06-07 10:38:09 UTC (rev 19404)
@@ -3646,3 +3646,98 @@
 application, as it is not possible to provide this in an easily portable way.
 See Section \ref{sec:misc-creole:wn} for details of how to load WordNet into
 GATE.
+
+%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\sect[sec:misc-creole:gate-time]{GATE-Time}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%
+This plugin provides a number of components and applications for annotating 
time
+related information and events within documents.
+
+\subsect{DCTParser}
+If processing news (news-style and also colloquial) documents, it is important
+that later components (based around HeidelTime) know the document creation time
+(DCT) of the documents.
+
+Note that it is not the time when the documents have been loaded into GATE.
+Instead, it is the time when the document was written, e.g., when a news
+document was published. To provide the DCT of a document / all documents in the
+corpus, the DCTParser can be used. It can be used in two ways:
+
+\begin{itemize}
+\item to parse the DCT out of TimeML-style xml documents, e.g., the corpora
+TempEval-3 TimeBank, TempEval-3 Aquaint, and TempEval-3 platinum
+contain DCT information in this format. (cf. very last section)
+\item to manually set the DCT for a document or a full corpus.
+\end{itemize}
+
+%It might make sense to add further parsing formats to DCTParser, e.g., 
+%that the dct can be parsed out of a document's name (e.g., if the documents
+%in a corpus are named like ``NYT-20100910-article1.txt'' and ``NYT-20100911-
+%article1.txt'').
+It is crucial to know that if a corpus contains many documents,
+then, the documents typically have differing DCTs. Currently, the DCT can
+only be parsed if it is available in TimeML-style format, or it can be manually
+provided for the document or the full corpus. If HeidelTime processes news doc-
+uments with wrong DCT information, relative and underspecified expressions
+will, of course, be normalized incorrectly.
+If the documents that are to be processed are narrative documents (e.g.,
+Wikipedia documents), no document creation time is required. The HeidelTime
+GATE wrapper can handle this automatically if the domain of the HeidelTime
+component is set to ``narratives'' (see next section).
+
+The DCTParser is configured through the following runtime parameters:
+\begin{itemize}
+\item[dctParsingFormat] timeml or manualdate
+\item[inputASName] name of the annotation set where DCT is stored
+\item[manuallySetDct] if format is set to ``manualdate'', the user can set a 
date
+manually and this date is stored as DCT by DCTParser
+\item[outputASName] name of annotation set for output
+\end{itemize}
+
+\subsect{HeidelTime}
+HeidelTime can be used for many languages and four domains (in particular news
+and narrative, but also colloquial and autonomic for English –- see
+Heideltime standalone Manual). Note that HeidelTime can perform linguistic
+preprocessing for all the languages if respective tools are installed correctly
+and configured correctly in the \verb!config.props! file. 
+
+If processing HeidelTime narrative-style documents, it is not important that
+DCT information is available for the documents. If news-style (and colloquial)
+documents are processed, then DCT information is crucial and processing fails,
+if no DCT information is available. For this, \verb!creationDateAnnotationType!
+has to contain information about the DCT annotation (see above).
+
+HeidelTime can be used in such a way that the linguistic preprocessing is
+performed internally. For this further tools have to be set-up and the
+parameter doPreprocessing has to be set to \verb!true!. In this case, some
+other parameters are ignored (about Sentence, Token, POS).  If other
+preprocessing annotations shall be used (e.g., those of ANNIE) then
+doPreprocessing has to be set to \verb!false! and the other parameters (about
+Sentence, Token, POS) have to be provided correctly.
+
+HeidelTime is configured via three init parameters:
+different models have to be loaded depending on language and domain.
+
+\begin{itemize}
+\item[configFile] the location of the config.props file
+\item[documentType] narratives, news, colloquial, or scientific
+\item[language] english, german, dutch, .......
+\end{itemize}
+
+and the following runtime parameters:
+
+\begin{itemize}
+\item[creationDateAnnotationType] if DCTParser is used to set the DCT, then 
the value is ``DCT''
+\item[doPreprocessing] set to false to use existing annotations, true if you 
want HeidelTime to pre-process the document
+\item[inputASName] name of annotation set, where token, sentence, pos 
information are stored (if any)
+\item[outputASName] name of annotation set for output
+\item[posAnnotationNameAsTokenAttribute] name of the part-of-speech feature of 
the Token annotations (if using ANNIE, this is \verb!category!)
+\item[sentenceAnnotationType] type of the sentence annotation (if using ANNIE, 
this is \verb!Sentence!)
+\item[tokenAnnotationType] type of the token annotation (if using ANNIE, this 
is \verb!Token!)
+\end{itemize}
+
+\subsect{TimeML Event Detection}
+
+The plugin also contains a ``Ready Made'' application for detecting TimeML 
based events.

Modified: userguide/trunk/recent-changes.tex
===================================================================
--- userguide/trunk/recent-changes.tex  2016-06-07 01:22:36 UTC (rev 19403)
+++ userguide/trunk/recent-changes.tex  2016-06-07 10:38:09 UTC (rev 19404)
@@ -19,16 +19,40 @@
   \def\rcSubsubsect#1{\subsubsect{#1}}
 \fi
 
-\rcSect[next]{Next Release}
+\rcSect[8.2]{Version 8.2 (May 2016)}
 
-\rcSubsect{July 2015}
+GATE Developer and Embedded 8.2 is mainly a bug fix release -- there are a few
+new plugins but the emphasis is on bug fixing and library updates.  This will
+be the final release of GATE before major re-structuring of the codebase and
+the plugin system for version 9.0.
 
-Added support to the \verb!Format_CSV! plugin for outputting results from GCP 
into a CSV file.
+\begin{itemize}
+\item New tools for temporal expression and event detection, including a 
wrapper
+  for the HeidelTime tagger (section~\ref{sec:misc-creole:gate-time}).
+\item New language plugins for Danish and Welsh named entitiy recognition.
+\item Performance improvements in the ANNIE NER system, in particular to deal
+  better with hyphenated names and titles.
+\item Improvements to TermRaider to support GATE documents that contain many
+  independent sections (e.g. web forums, lists of tweets).
+\item Bug fixes in the handling of Twitter JSON data -- GATE now has full
+  round-trip support for Twitter JSON, tweets can be loaded, annotated, and
+  saved back to the same format accurately.  The JSON format parser has been
+  separated from the rest of the Twitter plugin, making it easier to add JSON
+  support to non-TwitIE applications.
+\item Updated dependencies -- the \verb!Stanford_CoreNLP! plugin now uses
+  version 3.6.0 of Stanford CoreNLP, and the Groovy plugin uses Groovy version
+  2.4.4
+\item GCP input and output handlers added to the \verb!Format_CSV! plugin.
+\end{itemize}
 
-\rcSubsect{July 2015}
+Plus the usual suite of miscellaneous smaller bug fixes.
 
-Fixed a bug in the ANNIE Orthomatcher that meant the last annotation of a 
given type (Person, Location, or Organization) was ignored when trying to 
corefer Unknown annotations.
+\rcSubsect{Java compatibility}
 
-Added support to the \verb!Format_CSV! plugin for loading CSV files into GCP.
+For GATE 8.2 we recommend the use of the latest Java 8 from Oracle.  If you are
+still restricted to Java 7, most components will still work with the exception
+of the Stanford CoreNLP tools.  Future versions of GATE will require Java 8 as
+a minimum.
 
+
 % vim:ft=tex


Property changes on: userguide/trunk/recent-changes.tex
___________________________________________________________________
Modified: svn:mergeinfo
## -3,5 +3,6 ##
 /userguide/branches/release-7.0/recent-changes.tex:15332-15399
 /userguide/branches/release-7.1/recent-changes.tex:16356-16357
 /userguide/branches/release-8.1/recent-changes.tex:18738
+/userguide/branches/release-8.2/recent-changes.tex:19338-19355,19357-19360
 /userguide/tags/release-7.0/recent-changes.tex:15400-15404
 /userguide/trunk/recent-changes.tex:10614-10900,17951-17952
\ No newline at end of property
Modified: userguide/trunk/tao_main.tex
===================================================================
--- userguide/trunk/tao_main.tex        2016-06-07 01:22:36 UTC (rev 19403)
+++ userguide/trunk/tao_main.tex        2016-06-07 10:38:09 UTC (rev 19404)
@@ -19,7 +19,7 @@
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 % GATE version this manual is for.
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\newcommand{\gateversion}{8.2}
+\newcommand{\gateversion}{8.3}
 
 \newif\ifgaterelease
 
@@ -407,7 +407,7 @@
       to the documentation for that release instead, which is accessible via
       the Help menu in GATE Developer.\\
       The latest release is
-      \htlink{https://gate.ac.uk/userguide?gateVersion=8.1}{version 8.1}
+      \htlink{https://gate.ac.uk/userguide?gateVersion=8.2}{version 8.2}
       \end{center}
     \end{minipage}}
   \fi
@@ -565,7 +565,7 @@
   \textbf{Developing Language Processing Components with GATE Version 8}
 \fi
 
-\textbf{\copyright 2015 The University of Sheffield, Department of Computer 
Science}
+\textbf{\copyright 2016 The University of Sheffield, Department of Computer 
Science}
 
 The University of Sheffield, Department of Computer Science\\
 Regent Court\\

This was sent by the SourceForge.net collaborative development platform, the 
world's largest Open Source development site.


------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs

[gate-cvs] SF.net SVN: gate:[19404] userguide/trunk

Reply via email to