Revision: 15983
          http://gate.svn.sourceforge.net/gate/?rev=15983&view=rev
Author:   ian_roberts
Date:     2012-07-25 16:44:35 +0000 (Wed, 25 Jul 2012)
Log Message:
-----------
Added a "serialized object" output handler to save GATE documents as Java
serialized files, the same format as SerialDataStore.

Modified Paths:
--------------
    gcp/trunk/.classpath
    gcp/trunk/doc/batch-def.tex
    gcp/trunk/doc/gcp-guide.pdf

Added Paths:
-----------
    gcp/trunk/src/gate/cloud/io/file/SerializedObjectOutputHandler.java

Modified: gcp/trunk/.classpath
===================================================================
--- gcp/trunk/.classpath        2012-07-25 10:50:14 UTC (rev 15982)
+++ gcp/trunk/.classpath        2012-07-25 16:44:35 UTC (rev 15983)
@@ -7,6 +7,6 @@
        <classpathentry kind="lib" 
path="lib/fastutil-5.0.3-heritrix-subset-1.0.jar"/>
        <classpathentry kind="lib" path="lib/heritrix-1.14.4.jar"/>
        <classpathentry kind="lib" path="lib/mimir-client-4.0.jar"/>
-       <classpathentry kind="con" 
path="org.apache.ivyde.eclipse.cpcontainer.IVYDE_CONTAINER/?project=gcp&amp;ivyXmlPath=build%2Fivy.xml&amp;confs=*"/>
+       <classpathentry kind="con" 
path="org.apache.ivyde.eclipse.cpcontainer.IVYDE_CONTAINER/?project=gcp&amp;ivyXmlPath=build%2Fivy.xml&amp;confs=*&amp;ivySettingsPath=%24%7Bworkspace_loc%3Agcp%2Fbuild%2Fivysettings.xml%7D&amp;loadSettingsOnDemand=false&amp;propertyFiles="/>
        <classpathentry kind="output" path="classes"/>
 </classpath>

Modified: gcp/trunk/doc/batch-def.tex
===================================================================
--- gcp/trunk/doc/batch-def.tex 2012-07-25 10:50:14 UTC (rev 15982)
+++ gcp/trunk/doc/batch-def.tex 2012-07-25 16:44:35 UTC (rev 15983)
@@ -245,7 +245,7 @@
 
 \subsection{File-based Output Handlers}
 
-GCP provides a set of four standard file-based output handlers to save data to
+GCP provides a set of five standard file-based output handlers to save data to
 files on the filesystem in various formats.
 
 \bit
@@ -260,13 +260,19 @@
 \item \verb!gate.cloud.io.xces.XCESOutputHandler! to save annotations in the
   XCES standoff format.  Annotation offsets in XCES refer to the plain text as
   saved by a \verb!PlainTextOutputHandler!.
+\item \verb!gate.cloid.io.file.SerializedObjectOutputHandler! to save documents
+  using Java's built in \emph{object serialization} protocol (with optional
+  compression).  This handler ignores annotation filters, and always writes
+  the complete document.  This is the same mechanism used by GATE's
+  \verb!SerialDataStore!.
 \eit
 
-The four handlers share the following \verb!<output>! attributes:
+The five handlers share the following \verb!<output>! attributes:
 
 \bde
-\item[encoding] (optional) The character encoding used when writing files.  If
-  omitted, ``UTF-8'' is the default.
+\item[encoding] (optional, not applicable to
+  \verb!SerializedObjectOutputHandler!) The character encoding used when
+  writing files.  If omitted, ``UTF-8'' is the default.
 \item[compression] (optional) The compression algorithm to apply to the saved
   files.  Can be either ``none'' (no compression, the default) or ``gzip''
   (GZIP compression).

Modified: gcp/trunk/doc/gcp-guide.pdf
===================================================================
(Binary files differ)

Added: gcp/trunk/src/gate/cloud/io/file/SerializedObjectOutputHandler.java
===================================================================
--- gcp/trunk/src/gate/cloud/io/file/SerializedObjectOutputHandler.java         
                (rev 0)
+++ gcp/trunk/src/gate/cloud/io/file/SerializedObjectOutputHandler.java 
2012-07-25 16:44:35 UTC (rev 15983)
@@ -0,0 +1,70 @@
+/*
+ *  SerializedObjectOutputHandler.java
+ *  Copyright (c) 2007-2012, The University of Sheffield.
+ *
+ *  This file is part of GCP (see http://gate.ac.uk/), and is free
+ *  software, licenced under the GNU Affero General Public License,
+ *  Version 3, November 2007.
+ *
+ *
+ *  $Id$ 
+ */
+package gate.cloud.io.file;
+
+import static gate.cloud.io.IOConstants.PARAM_ENCODING;
+import static gate.cloud.io.IOConstants.PARAM_FILE_EXTENSION;
+import gate.Document;
+import gate.cloud.io.OutputHandler;
+import gate.util.Benchmark;
+import gate.util.GateException;
+
+import java.io.IOException;
+import java.io.ObjectOutputStream;
+import java.util.Map;
+
+import org.apache.log4j.Logger;
+
+/**
+ * An {@link OutputHandler} that writes GATE Documents to files using
+ * Java serialization. The files may be optionally gzip compressed. Note
+ * that this always writes the complete document, any annotation type
+ * filters specified in the batch definition are ignored.
+ */
+public class SerializedObjectOutputHandler extends AbstractFileOutputHandler {
+
+  private static final Logger logger = 
Logger.getLogger(SerializedObjectOutputHandler.class);
+  
+  @Override
+  protected void configImpl(Map<String, String> configData) throws IOException,
+          GateException {
+    // make sure we default to .ser as the extension
+    if(!configData.containsKey(PARAM_FILE_EXTENSION)) {
+      configData.put(PARAM_FILE_EXTENSION, ".ser");
+    }
+    if(configData.containsKey(PARAM_ENCODING)) {
+      logger.warn(this.getClass().getName() + " does not support the "
+              + PARAM_ENCODING + " parameter - ignored");
+    }
+    super.configImpl(configData);
+  }
+
+  @Override
+  protected void outputDocumentImpl(Document document, String documentId)
+          throws IOException, GateException {
+    String baseBenchmarkID =
+            Benchmark.createBenchmarkId(document.getName(), documentId);
+
+    ObjectOutputStream outputStream =
+            new ObjectOutputStream(getFileOutputStream(documentId));
+    try {
+      String saveBID =
+              Benchmark.createBenchmarkId("saveSerialized", baseBenchmarkID);
+      long startTime = Benchmark.startPoint();
+      outputStream.writeObject(document);
+      Benchmark.checkPoint(startTime, saveBID, this, null);
+    } finally {
+      outputStream.close();
+    }
+  }
+
+}


Property changes on: 
gcp/trunk/src/gate/cloud/io/file/SerializedObjectOutputHandler.java
___________________________________________________________________
Added: svn:keywords
   + Id
Added: svn:eol-style
   + native

This was sent by the SourceForge.net collaborative development platform, the 
world's largest Open Source development site.


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs

Reply via email to