Revision: 17998
          http://sourceforge.net/p/gate/code/17998
Author:   ian_roberts
Date:     2014-05-15 16:59:08 +0000 (Thu, 15 May 2014)
Log Message:
-----------
2.4 changelog

Modified Paths:
--------------
    gcp/trunk/doc/batch-def.tex
    gcp/trunk/doc/introduction.tex

Modified: gcp/trunk/doc/batch-def.tex
===================================================================
--- gcp/trunk/doc/batch-def.tex 2014-05-15 16:36:54 UTC (rev 17997)
+++ gcp/trunk/doc/batch-def.tex 2014-05-15 16:59:08 UTC (rev 17998)
@@ -172,6 +172,7 @@
 assumes that the document ID is the path of an entry in the ZIP file.
 
 \subsection{The {\tt ARCInputHandler} and {\tt WARCInputHandler}}
+\label{sec:batch-def:arc}
 
 These two input handlers read documents out of ARC- and WARC format web archive
 files as produced by the Heritrix web crawler and other similar tools.  They

Modified: gcp/trunk/doc/introduction.tex
===================================================================
--- gcp/trunk/doc/introduction.tex      2014-05-15 16:36:54 UTC (rev 17997)
+++ gcp/trunk/doc/introduction.tex      2014-05-15 16:59:08 UTC (rev 17998)
@@ -134,6 +134,26 @@
 
 This section summarises the main changes between releases of GCP
 
+\subsection{2.4 (May 2014)}
+
+\bit
+\item Now depends on GATE Embedded 8.0
+\item Added input handler for WARC format archives, to complement the existing
+  ARC handler (section~\ref{sec:batch-def:arc}).
+\item ARC and WARC handlers can optionally load individual records from
+  remotely hosted archives using HTTP requests with a ``Range'' header.  This
+  facility can be used with publicly-hosted data sets such as Common
+  Crawl\footnote{\url{http://www.commoncrawl.org}}.  To support this
+  functionality, document identifiers in a batch definition can now take XML
+  attributes as well as the actual string identifier (exactly how such
+  attributes are used is up to the handler implementations).
+\item Added output handler to save documents in a JSON format modelled on that
+  used by Twitter to represent ``entities'' (e.g. username mentions) in Tweets.
+\item Efficiency improvements in the M\'{i}mir output handler, to send
+  documents to the server in batches rather than opening a new HTTP connection
+  for every document.
+\eit
+
 \subsection{2.3 (November 2012)}
 
 \bit

This was sent by the SourceForge.net collaborative development platform, the 
world's largest Open Source development site.


------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs

Reply via email to