Revision: 17217
          http://sourceforge.net/p/gate/code/17217
Author:   ian_roberts
Date:     2014-01-07 14:18:04 +0000 (Tue, 07 Jan 2014)
Log Message:
-----------
Documentation for the JSON output handler.

Modified Paths:
--------------
    gcp/trunk/doc/batch-def.tex
    gcp/trunk/doc/gcp-guide.pdf

Modified: gcp/trunk/doc/batch-def.tex
===================================================================
--- gcp/trunk/doc/batch-def.tex 2014-01-07 09:37:25 UTC (rev 17216)
+++ gcp/trunk/doc/batch-def.tex 2014-01-07 14:18:04 UTC (rev 17217)
@@ -245,7 +245,7 @@
 
 \subsection{File-based Output 
Handlers}\label{sec:batch-def:file-output-handlers}
 
-GCP provides a set of five standard file-based output handlers to save data to
+GCP provides a set of six standard file-based output handlers to save data to
 files on the filesystem in various formats.
 
 \bit
@@ -260,6 +260,8 @@
 \item \verb!gate.cloud.io.xces.XCESOutputHandler! to save annotations in the
   XCES standoff format.  Annotation offsets in XCES refer to the plain text as
   saved by a \verb!PlainTextOutputHandler!.
+\item \verb!gate.cloud.io.file.JSONOutputHandler! to save documents in a JSON
+  format modelled on that used by Twitter to represent "entities" in Tweets.
 \item \verb!gate.cloid.io.file.SerializedObjectOutputHandler! to save documents
   using Java's built in \emph{object serialization} protocol (with optional
   compression).  This handler ignores annotation filters, and always writes
@@ -267,7 +269,7 @@
   \verb!SerialDataStore!.
 \eit
 
-The five handlers share the following \verb!<output>! attributes:
+The handlers share the following \verb!<output>! attributes:
 
 \bde
 \item[encoding] (optional, not applicable to
@@ -325,6 +327,72 @@
 otherwise (including if the attribute is omitted) it will save just the tags
 with no attributes.
 
+The \verb!JSONOutputHandler! saves the document in a JSON format modelled on
+that used by Twitter to represent entities in Tweets.  This is a JSON object
+with two properties, ``text'' holding the plain text of the document and
+``entities'' holding the annotations.  The ``entities'' value is itself an
+object mapping a ``label'' to an array of annotations.
+%
+\begin{verbatim}
+{
+  "text":"The text of the document",
+  "entities":{
+    "Person":[
+      {
+        "indices":[start,end],
+        "feature1":"value1",
+        "feature2":"value2"
+      },
+      {
+        "indices":[start,end],
+        "feature1":"value1",
+        "feature2":"value2"
+      }
+    ]
+  }
+}
+\end{verbatim}
+
+For each annotation the ``indices'' property gives the start and end offsets of
+the annotation as character offsets into the ``text'', and the other properties
+of the object represent the features of the annotation.
+
+This handler supports a number of additional \verb!<output>! attributes to
+control the format.
+
+\begin{description}
+\item[groupEntitiesBy] controls how the annotations are grouped under the
+  ``entities'' object.  Permitted values are ``type'' (the default) or ``set''.
+  Grouping by ``type'' produces output like the example above, with one entry
+  under ``entities'' for each annotation type containing all annotations of
+  that type from across all annotation sets that were selected by the
+  \verb!<annotationSet>! filters.  Conversely, grouping by ``set'' creates one
+  entry under ``entities'' for each annotation set name (with the name
+  ``default'' used for the default annotation set -- technically JSON
+  permits the empty string as a property name but this is likely to cause
+  problems for some consumer libraries), containing all the annotations in
+  that set that were selected by the filters, regardless of type.  Grouping by
+  ``set'' will often be used in combination with the ``annotationTypeProperty''
+  attribute.
+
+\item[annotationTypeProperty] if set, the type of each annotation is added to
+  the output as this property (i.e. treated as if it were an additional feature
+  of the annotation).  This is useful in combination with
+  \verb!groupEntitiesBy="set"! when different types of annotation are grouped
+  under a single label.
+
+\item[documentAnnotationASName] the annotation set in which to search for a
+  \emph{document annotation} (see below).  If omitted, the default set is used.
+\item[documentAnnotationType] if specified, the output handler will look for a
+  single annotation of this type within the specified annotation set and assume
+  that this annotation spans the ``interesting'' portion of the document.  Only
+  the text and annotations covered by this annotation will be output, and
+  furthermore the features of the document annotation will be added as
+  top-level properties (alongside ``text'' and ``entities'') of the generated
+  JSON object.  This option is intended to support round-trip processing of
+  documents that were originally loaded from JSON by GATE's Twitter support.
+\end{description}
+
 \subsection{The M\'{i}mir Output Handler}
 
 GCP also provides \verb!gate.cloud.io.mimir.MimirOutputHandler! to send 
annotated documents to a M\'{i}mir server for indexing.  This handler supports 
the following \verb!<output>! attributes:

Modified: gcp/trunk/doc/gcp-guide.pdf
===================================================================
(Binary files differ)

This was sent by the SourceForge.net collaborative development platform, the 
world's largest Open Source development site.


------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs

Reply via email to