Revision: 19404
http://sourceforge.net/p/gate/code/19404
Author: ian_roberts
Date: 2016-06-07 10:38:09 +0000 (Tue, 07 Jun 2016)
Log Message:
-----------
Merged branch changes back to trunk, with 8.3-SNAPSHOT version number
Modified Paths:
--------------
userguide/trunk/language-creole.tex
userguide/trunk/misc-creole.tex
userguide/trunk/recent-changes.tex
userguide/trunk/tao_main.tex
Property Changed:
----------------
userguide/trunk/
userguide/trunk/recent-changes.tex
Index: userguide/trunk
===================================================================
--- userguide/trunk 2016-06-07 01:22:36 UTC (rev 19403)
+++ userguide/trunk 2016-06-07 10:38:09 UTC (rev 19404)
Property changes on: userguide/trunk
___________________________________________________________________
Modified: svn:mergeinfo
## -2,5 +2,6 ##
/userguide/branches/release-7.0:15332-15399
/userguide/branches/release-8.0:17948
/userguide/branches/release-8.1:18738
+/userguide/branches/release-8.2:19338-19355,19357-19360
/userguide/tags/release-7.0:15400-15404
/userguide/trunk:10614-10900
\ No newline at end of property
Modified: userguide/trunk/language-creole.tex
===================================================================
--- userguide/trunk/language-creole.tex 2016-06-07 01:22:36 UTC (rev 19403)
+++ userguide/trunk/language-creole.tex 2016-06-07 10:38:09 UTC (rev 19404)
@@ -6,11 +6,11 @@
\nnormalsize
There are plugins available for processing the following languages:
-French, German, Italian, Chinese, Arabic, Romanian, Hindi, Russian, and
-Cebuano. Some of the applications are quite basic and just contain
-some useful processing resources to get you started when developing a
-full application. Others (Cebuano and Hindi) are more like toy systems
-built as part of an exercise in language portability.
+French, German, Italian, Danish, Chinese, Arabic, Romanian, Hindi, Russian,
+Welsh and Cebuano. Some of the applications are quite basic and just contain
+some useful processing resources to get you started when developing a full
+application. Others (Cebuano and Hindi) are more like toy systems built as
+part of an exercise in language portability.
Note that if you wish to use individual language processing resources
without loading the whole application, you will need to load the
@@ -326,3 +326,25 @@
into GATE. Currently no other Bulgarian specific PRs are available so
the stemmer should be used with the Unicode tokenizer and a sentence splitter
to process Bulgarian language documents.
+
+\sect[sec:misc-creole:language-plugins:danish]{Danish Plugin}
+The Danish plugin (\verb|Lang_Danish|) contains resources for a Danish IE
+application. As well as a set of tokeniser rules and gazetteer lists tuned for
+Danish, the plugin includes models for the Stanford CoreNLP POS tagger and
+named entity recogniser trained on the Danish PAROLE corpus and the Copenhagen
+Dependency Treebank respectively. Full details can be found in the EACL 2014
+paper \cite{Derczynski2014d}.
+
+The Java code in this plugin (the tokeniser and gazetteer) is released under
+the same LGPL licence as GATE itself, but the POS tagger and NER models are
+subject to the full GPL as this is the licence of the data used for training.
+
+\sect[sec:misc-creole:language-plugins:welsh]{Welsh Plugin}
+The Welsh plugin (\verb|Lang_Welsh|) is the result of the Welsh Natural
+Language Toolkit
+project\footnote{\htlinkplain{http://hypermedia.research.southwales.ac.uk/kos/wnlt/}},
+funded by the Welsh Government. It contains a set of resources that mirror the
+English-language ANNIE application, but adapted to the Welsh language. The
+plugin includes a tokeniser, sentence splitter, POS tagger, morphological
+analyser, gazetteers and named entity JAPE grammars, with a ready-made
+application called \emph{CYMRIE} to combine them all.
Modified: userguide/trunk/misc-creole.tex
===================================================================
--- userguide/trunk/misc-creole.tex 2016-06-07 01:22:36 UTC (rev 19403)
+++ userguide/trunk/misc-creole.tex 2016-06-07 10:38:09 UTC (rev 19404)
@@ -3646,3 +3646,98 @@
application, as it is not possible to provide this in an easily portable way.
See Section \ref{sec:misc-creole:wn} for details of how to load WordNet into
GATE.
+
+%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\sect[sec:misc-creole:gate-time]{GATE-Time}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%
+This plugin provides a number of components and applications for annotating
time
+related information and events within documents.
+
+\subsect{DCTParser}
+If processing news (news-style and also colloquial) documents, it is important
+that later components (based around HeidelTime) know the document creation time
+(DCT) of the documents.
+
+Note that it is not the time when the documents have been loaded into GATE.
+Instead, it is the time when the document was written, e.g., when a news
+document was published. To provide the DCT of a document / all documents in the
+corpus, the DCTParser can be used. It can be used in two ways:
+
+\begin{itemize}
+\item to parse the DCT out of TimeML-style xml documents, e.g., the corpora
+TempEval-3 TimeBank, TempEval-3 Aquaint, and TempEval-3 platinum
+contain DCT information in this format. (cf. very last section)
+\item to manually set the DCT for a document or a full corpus.
+\end{itemize}
+
+%It might make sense to add further parsing formats to DCTParser, e.g.,
+%that the dct can be parsed out of a document's name (e.g., if the documents
+%in a corpus are named like ``NYT-20100910-article1.txt'' and ``NYT-20100911-
+%article1.txt'').
+It is crucial to know that if a corpus contains many documents,
+then, the documents typically have differing DCTs. Currently, the DCT can
+only be parsed if it is available in TimeML-style format, or it can be manually
+provided for the document or the full corpus. If HeidelTime processes news doc-
+uments with wrong DCT information, relative and underspecified expressions
+will, of course, be normalized incorrectly.
+If the documents that are to be processed are narrative documents (e.g.,
+Wikipedia documents), no document creation time is required. The HeidelTime
+GATE wrapper can handle this automatically if the domain of the HeidelTime
+component is set to ``narratives'' (see next section).
+
+The DCTParser is configured through the following runtime parameters:
+\begin{itemize}
+\item[dctParsingFormat] timeml or manualdate
+\item[inputASName] name of the annotation set where DCT is stored
+\item[manuallySetDct] if format is set to ``manualdate'', the user can set a
date
+manually and this date is stored as DCT by DCTParser
+\item[outputASName] name of annotation set for output
+\end{itemize}
+
+\subsect{HeidelTime}
+HeidelTime can be used for many languages and four domains (in particular news
+and narrative, but also colloquial and autonomic for English –- see
+Heideltime standalone Manual). Note that HeidelTime can perform linguistic
+preprocessing for all the languages if respective tools are installed correctly
+and configured correctly in the \verb!config.props! file.
+
+If processing HeidelTime narrative-style documents, it is not important that
+DCT information is available for the documents. If news-style (and colloquial)
+documents are processed, then DCT information is crucial and processing fails,
+if no DCT information is available. For this, \verb!creationDateAnnotationType!
+has to contain information about the DCT annotation (see above).
+
+HeidelTime can be used in such a way that the linguistic preprocessing is
+performed internally. For this further tools have to be set-up and the
+parameter doPreprocessing has to be set to \verb!true!. In this case, some
+other parameters are ignored (about Sentence, Token, POS). If other
+preprocessing annotations shall be used (e.g., those of ANNIE) then
+doPreprocessing has to be set to \verb!false! and the other parameters (about
+Sentence, Token, POS) have to be provided correctly.
+
+HeidelTime is configured via three init parameters:
+different models have to be loaded depending on language and domain.
+
+\begin{itemize}
+\item[configFile] the location of the config.props file
+\item[documentType] narratives, news, colloquial, or scientific
+\item[language] english, german, dutch, .......
+\end{itemize}
+
+and the following runtime parameters:
+
+\begin{itemize}
+\item[creationDateAnnotationType] if DCTParser is used to set the DCT, then
the value is ``DCT''
+\item[doPreprocessing] set to false to use existing annotations, true if you
want HeidelTime to pre-process the document
+\item[inputASName] name of annotation set, where token, sentence, pos
information are stored (if any)
+\item[outputASName] name of annotation set for output
+\item[posAnnotationNameAsTokenAttribute] name of the part-of-speech feature of
the Token annotations (if using ANNIE, this is \verb!category!)
+\item[sentenceAnnotationType] type of the sentence annotation (if using ANNIE,
this is \verb!Sentence!)
+\item[tokenAnnotationType] type of the token annotation (if using ANNIE, this
is \verb!Token!)
+\end{itemize}
+
+\subsect{TimeML Event Detection}
+
+The plugin also contains a ``Ready Made'' application for detecting TimeML
based events.
Modified: userguide/trunk/recent-changes.tex
===================================================================
--- userguide/trunk/recent-changes.tex 2016-06-07 01:22:36 UTC (rev 19403)
+++ userguide/trunk/recent-changes.tex 2016-06-07 10:38:09 UTC (rev 19404)
@@ -19,16 +19,40 @@
\def\rcSubsubsect#1{\subsubsect{#1}}
\fi
-\rcSect[next]{Next Release}
+\rcSect[8.2]{Version 8.2 (May 2016)}
-\rcSubsect{July 2015}
+GATE Developer and Embedded 8.2 is mainly a bug fix release -- there are a few
+new plugins but the emphasis is on bug fixing and library updates. This will
+be the final release of GATE before major re-structuring of the codebase and
+the plugin system for version 9.0.
-Added support to the \verb!Format_CSV! plugin for outputting results from GCP
into a CSV file.
+\begin{itemize}
+\item New tools for temporal expression and event detection, including a
wrapper
+ for the HeidelTime tagger (section~\ref{sec:misc-creole:gate-time}).
+\item New language plugins for Danish and Welsh named entitiy recognition.
+\item Performance improvements in the ANNIE NER system, in particular to deal
+ better with hyphenated names and titles.
+\item Improvements to TermRaider to support GATE documents that contain many
+ independent sections (e.g. web forums, lists of tweets).
+\item Bug fixes in the handling of Twitter JSON data -- GATE now has full
+ round-trip support for Twitter JSON, tweets can be loaded, annotated, and
+ saved back to the same format accurately. The JSON format parser has been
+ separated from the rest of the Twitter plugin, making it easier to add JSON
+ support to non-TwitIE applications.
+\item Updated dependencies -- the \verb!Stanford_CoreNLP! plugin now uses
+ version 3.6.0 of Stanford CoreNLP, and the Groovy plugin uses Groovy version
+ 2.4.4
+\item GCP input and output handlers added to the \verb!Format_CSV! plugin.
+\end{itemize}
-\rcSubsect{July 2015}
+Plus the usual suite of miscellaneous smaller bug fixes.
-Fixed a bug in the ANNIE Orthomatcher that meant the last annotation of a
given type (Person, Location, or Organization) was ignored when trying to
corefer Unknown annotations.
+\rcSubsect{Java compatibility}
-Added support to the \verb!Format_CSV! plugin for loading CSV files into GCP.
+For GATE 8.2 we recommend the use of the latest Java 8 from Oracle. If you are
+still restricted to Java 7, most components will still work with the exception
+of the Stanford CoreNLP tools. Future versions of GATE will require Java 8 as
+a minimum.
+
% vim:ft=tex
Property changes on: userguide/trunk/recent-changes.tex
___________________________________________________________________
Modified: svn:mergeinfo
## -3,5 +3,6 ##
/userguide/branches/release-7.0/recent-changes.tex:15332-15399
/userguide/branches/release-7.1/recent-changes.tex:16356-16357
/userguide/branches/release-8.1/recent-changes.tex:18738
+/userguide/branches/release-8.2/recent-changes.tex:19338-19355,19357-19360
/userguide/tags/release-7.0/recent-changes.tex:15400-15404
/userguide/trunk/recent-changes.tex:10614-10900,17951-17952
\ No newline at end of property
Modified: userguide/trunk/tao_main.tex
===================================================================
--- userguide/trunk/tao_main.tex 2016-06-07 01:22:36 UTC (rev 19403)
+++ userguide/trunk/tao_main.tex 2016-06-07 10:38:09 UTC (rev 19404)
@@ -19,7 +19,7 @@
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% GATE version this manual is for.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\newcommand{\gateversion}{8.2}
+\newcommand{\gateversion}{8.3}
\newif\ifgaterelease
@@ -407,7 +407,7 @@
to the documentation for that release instead, which is accessible via
the Help menu in GATE Developer.\\
The latest release is
- \htlink{https://gate.ac.uk/userguide?gateVersion=8.1}{version 8.1}
+ \htlink{https://gate.ac.uk/userguide?gateVersion=8.2}{version 8.2}
\end{center}
\end{minipage}}
\fi
@@ -565,7 +565,7 @@
\textbf{Developing Language Processing Components with GATE Version 8}
\fi
-\textbf{\copyright 2015 The University of Sheffield, Department of Computer
Science}
+\textbf{\copyright 2016 The University of Sheffield, Department of Computer
Science}
The University of Sheffield, Department of Computer Science\\
Regent Court\\
This was sent by the SourceForge.net collaborative development platform, the
world's largest Open Source development site.
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs