Revision: 18757
          http://sourceforge.net/p/gate/code/18757
Author:   ian_roberts
Date:     2015-06-05 15:56:46 +0000 (Fri, 05 Jun 2015)
Log Message:
-----------
Updated GCP documentation to include gcp-direct script and a changelog for 2.5

Modified Paths:
--------------
    gcp/trunk/doc/batch-def.tex
    gcp/trunk/doc/gcp-guide.pdf
    gcp/trunk/doc/install-and-run.tex
    gcp/trunk/doc/introduction.tex

Modified: gcp/trunk/doc/batch-def.tex
===================================================================
--- gcp/trunk/doc/batch-def.tex 2015-06-05 01:20:22 UTC (rev 18756)
+++ gcp/trunk/doc/batch-def.tex 2015-06-05 15:56:46 UTC (rev 18757)
@@ -270,11 +270,16 @@
   the specified path will be ignored.
 \item[compression] (optional) the compression format used by the
   \verb!srcFile!, if any.  If the value is ``none'' (the default) then the file
-  is assumed not to be compressed, if the value is ``gzip'' then Java's native
-  GZIP decompression utilities will be used, otherwise the value is taken to be
+  is assumed not to be compressed, if the value is one of the compression 
formats
+  supported by Apache Commons Compress (``gz''\footnote{For backwards
+  compatibility, ``gzip'' is treated as an alias for ``gz''}, ``bzip2'',
+  ``xz'', ``lzma'', ``snappy-raw'', ``snappy-framed'', ``pack200'', ``z'') 
then 
+  it will be unpacked using that library.  If the value is ``any'' then the
+  handler uses the auto-detection capabilities of Commons Compress to attempt
+  to detect the appropriate compression format.  Any other value is taken to be
   the command line for a native decompression program that expects compressed
   data on stdin and will produce decompressed data on stdout, for example
-  \verb!"lzop -dc"! or \verb!"bunzip2"!.
+  \verb!"lzop -dc"!.
 \item[mimeType] (optional but highly recommended) the value to pass as the
   ``mimeType'' parameter when creating a GATE Document from the JSON string.
   This will be used by GATE to select an appropriate document format parser, so

Modified: gcp/trunk/doc/gcp-guide.pdf
===================================================================
(Binary files differ)

Modified: gcp/trunk/doc/install-and-run.tex
===================================================================
--- gcp/trunk/doc/install-and-run.tex   2015-06-05 01:20:22 UTC (rev 18756)
+++ gcp/trunk/doc/install-and-run.tex   2015-06-05 15:56:46 UTC (rev 18757)
@@ -20,11 +20,22 @@
 
 \section{Running GCP}
 
-Once GCP is installed you can run it using the \verb!gcp-cli.jar! executable
+Once GCP is installed you can run it in one of two ways:
+\bit
+\item using the \verb!gcp-cli.jar! executable
 JAR file in the installation directory (or the \verb!gcp.sh! bash script, which
-simply calls \verb!java -jar gcp-cli.jar!).  This tool takes a number of
-optional arguments:
+simply calls \verb!java -jar gcp-cli.jar!)
+\item using the \verb!gcp-direct.sh! bash script.
+\eit
 
+\subsection{Using {\tt gcp-cli.jar}}
+
+The usual way to run GCP is to write one or more {\em batch definition} XML
+files (see chapter~\ref{chap:batch-def} for details) defining the application
+you want to run, the documents to process, and the output formats to produce.
+You then pass these batch definitions to \verb!gcp-cli.jar! for processing.
+The CLI tool takes a number of optional arguments:
+
 \bde
 \item[-m] Specifies the maximum Java heap size, in the format expected by the
   usual \verb!-Xmx! Java option, e.g. \verb!-m 10G! for a 10GB heap limit.  The
@@ -41,6 +52,12 @@
   \verb!-Djava.io.tmpdir=/home/bigtmp!.  \verb!-D! options specified before the
   \verb!-jar! apply to the virtual machine running the CLI, those specified
   after \verb!-jar gcp-cli.jar! will be passed to the batch runner processes.
+  If you have an installed copy of GATE Developer you may wish to set
+  \verb!-Dgate.home=...! to point to your installation.  This is required if
+  your saved GATE application refers to standard GATE plugins (using
+  \verb!$gatehome$! paths in the xgapp), but is optional if the application is
+  self-contained -- GCP includes its own copy of GATE Embedded and does not
+  require a separate installed copy of the core libraries.
 \ede
 
 The tool will determine the location of where GCP is installed in the 
@@ -99,4 +116,45 @@
 the script to exit at the end of the batch it is currently processing (or
 immediately if it is currently idle).
 
+
+\subsection{Using {\tt gcp-direct.sh}}
+\label{sec:running:gcp-direct}
+
+The \verb!gcp-direct.sh! script can be used for simple cases where you want to
+process all the files under one particular directory and output the resulting
+annotations in GATE XML or FastInfoset format.  For this specific case it is
+not necessary to write an XML batch descriptor, you can specify the required
+parameters using command line options to \verb!gcp-direct.sh!:
+
+\bde
+\item[-t] the number of parallel threads to use.
+\item[-x] the path to the saved GATE application that you want to run.
+\item[-f] the output format to use for saving results, must be either ``xml''
+  (GATE XML format) or ``finf'' (FastInfoset format).  To use FastInfoset the
+  GATE \verb!Format_FastInfoset! plugin must be loaded by the saved
+  application.
+\item[-i] the directory in which to look for the input files.  All files in
+  this directory and any subdirectories will be processed (except for standard
+  backup and temporary file name patterns and source control metadata -- see
+  \url{http://ant.apache.org/manual/dirtasks.html#defaultexcludes} for
+  details).
+\item[-o] the directory in which to place the output files.  Each input file
+  will generate an output file with the same name in the output directory.
+\ede
+
+Additionally, you can specify \verb!-D! and \verb!-X! options which will be
+passed through to the Java VM, for example you can set the maximum amount of
+heap memory that the JVM can use with an option like \verb!-Xmx2G!
+
+The \verb!gcp-direct.sh! script is deliberately opinionated, in order to reduce
+the number of different options that need to be set, and it has a number of
+hard-coded assumptions.  It assumes that your input documents use the UTF-8
+character encoding, that the correct document format parser to use can be
+determined from the file extension, and that you always want to save \emph{all}
+the annotations that your application generates.  If you need to process
+documents in a different encoding, you have more complex output requirements
+(XCES, JSON, M\'{i}mir, \ldots) or want to output only a subset of the GATE
+annotations from each document, then you should write a batch definition in XML
+and use \verb!gcp-cli.jar! as discussed above.
+
 % vim:ft=tex

Modified: gcp/trunk/doc/introduction.tex
===================================================================
--- gcp/trunk/doc/introduction.tex      2015-06-05 01:20:22 UTC (rev 18756)
+++ gcp/trunk/doc/introduction.tex      2015-06-05 15:56:46 UTC (rev 18757)
@@ -134,6 +134,24 @@
 
 This section summarises the main changes between releases of GCP
 
+\subsection{2.5 (June 2015)}
+
+\bit
+\item Now depends on GATE Embedded 8.1
+\item Introduced ``streaming'' style input and output handlers for JSON
+  data (e.g. from Twitter), which can read a series of documents from
+  a single JSON input file, and write JSON results to a single concatenated
+  output file (sections~\ref{sec:batch-def:json-input} and
+  \ref{sec:batch-def:file-output-handlers}).
+\item Introduced the \verb!gcp-direct.sh! script to cover simple invocations
+  of GCP without the need to write a batch definition XML file
+  (section~\ref{sec:running:gcp-direct}).
+\item For ``controller-aware''
+  
PRs\footnote{\url{http://gate.ac.uk/gate/doc/javadoc/gate/creole/ControllerAwarePR.html}},
+  the various callbacks are now invoked just once per batch rather than before
+  and after every single document.
+\eit
+
 \subsection{2.4 (May 2014)}
 
 \bit

This was sent by the SourceForge.net collaborative development platform, the 
world's largest Open Source development site.


------------------------------------------------------------------------------
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs

Reply via email to