Revision: 17217
http://sourceforge.net/p/gate/code/17217
Author: ian_roberts
Date: 2014-01-07 14:18:04 +0000 (Tue, 07 Jan 2014)
Log Message:
-----------
Documentation for the JSON output handler.
Modified Paths:
--------------
gcp/trunk/doc/batch-def.tex
gcp/trunk/doc/gcp-guide.pdf
Modified: gcp/trunk/doc/batch-def.tex
===================================================================
--- gcp/trunk/doc/batch-def.tex 2014-01-07 09:37:25 UTC (rev 17216)
+++ gcp/trunk/doc/batch-def.tex 2014-01-07 14:18:04 UTC (rev 17217)
@@ -245,7 +245,7 @@
\subsection{File-based Output
Handlers}\label{sec:batch-def:file-output-handlers}
-GCP provides a set of five standard file-based output handlers to save data to
+GCP provides a set of six standard file-based output handlers to save data to
files on the filesystem in various formats.
\bit
@@ -260,6 +260,8 @@
\item \verb!gate.cloud.io.xces.XCESOutputHandler! to save annotations in the
XCES standoff format. Annotation offsets in XCES refer to the plain text as
saved by a \verb!PlainTextOutputHandler!.
+\item \verb!gate.cloud.io.file.JSONOutputHandler! to save documents in a JSON
+ format modelled on that used by Twitter to represent "entities" in Tweets.
\item \verb!gate.cloid.io.file.SerializedObjectOutputHandler! to save documents
using Java's built in \emph{object serialization} protocol (with optional
compression). This handler ignores annotation filters, and always writes
@@ -267,7 +269,7 @@
\verb!SerialDataStore!.
\eit
-The five handlers share the following \verb!<output>! attributes:
+The handlers share the following \verb!<output>! attributes:
\bde
\item[encoding] (optional, not applicable to
@@ -325,6 +327,72 @@
otherwise (including if the attribute is omitted) it will save just the tags
with no attributes.
+The \verb!JSONOutputHandler! saves the document in a JSON format modelled on
+that used by Twitter to represent entities in Tweets. This is a JSON object
+with two properties, ``text'' holding the plain text of the document and
+``entities'' holding the annotations. The ``entities'' value is itself an
+object mapping a ``label'' to an array of annotations.
+%
+\begin{verbatim}
+{
+ "text":"The text of the document",
+ "entities":{
+ "Person":[
+ {
+ "indices":[start,end],
+ "feature1":"value1",
+ "feature2":"value2"
+ },
+ {
+ "indices":[start,end],
+ "feature1":"value1",
+ "feature2":"value2"
+ }
+ ]
+ }
+}
+\end{verbatim}
+
+For each annotation the ``indices'' property gives the start and end offsets of
+the annotation as character offsets into the ``text'', and the other properties
+of the object represent the features of the annotation.
+
+This handler supports a number of additional \verb!<output>! attributes to
+control the format.
+
+\begin{description}
+\item[groupEntitiesBy] controls how the annotations are grouped under the
+ ``entities'' object. Permitted values are ``type'' (the default) or ``set''.
+ Grouping by ``type'' produces output like the example above, with one entry
+ under ``entities'' for each annotation type containing all annotations of
+ that type from across all annotation sets that were selected by the
+ \verb!<annotationSet>! filters. Conversely, grouping by ``set'' creates one
+ entry under ``entities'' for each annotation set name (with the name
+ ``default'' used for the default annotation set -- technically JSON
+ permits the empty string as a property name but this is likely to cause
+ problems for some consumer libraries), containing all the annotations in
+ that set that were selected by the filters, regardless of type. Grouping by
+ ``set'' will often be used in combination with the ``annotationTypeProperty''
+ attribute.
+
+\item[annotationTypeProperty] if set, the type of each annotation is added to
+ the output as this property (i.e. treated as if it were an additional feature
+ of the annotation). This is useful in combination with
+ \verb!groupEntitiesBy="set"! when different types of annotation are grouped
+ under a single label.
+
+\item[documentAnnotationASName] the annotation set in which to search for a
+ \emph{document annotation} (see below). If omitted, the default set is used.
+\item[documentAnnotationType] if specified, the output handler will look for a
+ single annotation of this type within the specified annotation set and assume
+ that this annotation spans the ``interesting'' portion of the document. Only
+ the text and annotations covered by this annotation will be output, and
+ furthermore the features of the document annotation will be added as
+ top-level properties (alongside ``text'' and ``entities'') of the generated
+ JSON object. This option is intended to support round-trip processing of
+ documents that were originally loaded from JSON by GATE's Twitter support.
+\end{description}
+
\subsection{The M\'{i}mir Output Handler}
GCP also provides \verb!gate.cloud.io.mimir.MimirOutputHandler! to send
annotated documents to a M\'{i}mir server for indexing. This handler supports
the following \verb!<output>! attributes:
Modified: gcp/trunk/doc/gcp-guide.pdf
===================================================================
(Binary files differ)
This was sent by the SourceForge.net collaborative development platform, the
world's largest Open Source development site.
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs