Revision: 17790
          http://sourceforge.net/p/gate/code/17790
Author:   ian_roberts
Date:     2014-04-09 16:50:36 +0000 (Wed, 09 Apr 2014)
Log Message:
-----------
Example Groovy script to construct a suitable Mention annotation for the
classification job builder from a set of overlapping Lookups.

Added Paths:
-----------
    gate/trunk/plugins/Crowd_Sourcing/resources/
    gate/trunk/plugins/Crowd_Sourcing/resources/lookupsToMention.groovy

Added: gate/trunk/plugins/Crowd_Sourcing/resources/lookupsToMention.groovy
===================================================================
--- gate/trunk/plugins/Crowd_Sourcing/resources/lookupsToMention.groovy         
                (rev 0)
+++ gate/trunk/plugins/Crowd_Sourcing/resources/lookupsToMention.groovy 
2014-04-09 16:50:36 UTC (rev 17790)
@@ -0,0 +1,73 @@
+/*
+ *  lookupsToMentions.groovy
+ *
+ *  This is an example Groovy script intended to be run by the script PR,
+ *  demonstrating how you can create annotations suitable for use by the Entity
+ *  Classification Job Builder PR.
+ *
+ *  The example scenario is one where we have Lookup annotations generated by
+ *  an ontological gazetteer (e.g. Gazetteer_LKB) with a feature "inst" linking
+ *  them to an instance in a knowledge base.  For ambiguous entities there may
+ *  be several different Lookup annotations spanning the same pair of offsets,
+ *  each with a different "inst".  Additionally, Lookups may have a feature
+ *  named "connections" containing a human-readable description of the KB
+ *  instance (these can be created using the semantic enrichment PR from the
+ *  Gazetteer_LKB plugin).
+ *
+ *  The script gathers co-extensive Lookup annotations into groups, and for
+ *  each group it generates one output Mention annotation, with the "options"
+ *  feature expected by the job builder PR - a LinkedHashMap from "inst" to
+ *  "connections" (if it exists).
+ *
+ *  Additionally, if the document has a "Key" annotation set containing a
+ *  Lookup at the same location, this is assumed to be the gold standard
+ *  correct answer.
+ *
+ *  Copyright (c) 2014, The University of Sheffield. See the file
+ *  COPYRIGHT.txt in the software or at http://gate.ac.uk/gate/COPYRIGHT.txt
+ *
+ *  This file is part of GATE (see http://gate.ac.uk/), and is free
+ *  software, licenced under the GNU Library General Public License,
+ *  Version 3, June 2007 (in the distribution as file licence.html,
+ *  and also available at http://gate.ac.uk/gate/licence.html).
+ *  
+ *  $Id: CrowdFlowerConstants.java 17412 2014-02-24 17:30:09Z ian_roberts $
+ */
+
+import org.apache.commons.lang.StringEscapeUtils as SEU
+
+// gather up all the groups of co-extensive Lookup annotations
+inputAS["Lookup"].groupBy { [it.start(), it.end()] }.each { offsets, anns ->
+  // example of filtering - if each Lookup has a "count" feature then we can
+  // pick the top 8 (or fewer) sorted in descending order of count
+  def top8anns = anns.sort {
+    -1 * (it.features.count?.toInteger() ?: 0)
+  }.take(8)
+  
+  // randomize the order to avoid conditioning humans to always pick the
+  // first one
+  Collections.shuffle(top8anns)
+
+  def mentionFeatures = Factory.newFeatureMap()
+  def options = new LinkedHashMap()
+  top8anns.each { ann ->
+    // option is URI -> abstract if there is one
+    if(ann.features.connections) {
+      options[ann.features.inst] = ann.features.connections
+    } else {
+      // or URI -> URI-with-a-clickable-link if not
+      def htmlInst = SEU.escapeHtml(ann.features.inst)
+      options[ann.features.inst] = (htmlInst +
+        ' <a target="_blank" href="' + htmlInst + '">(details)</a>')
+    }
+  }
+  mentionFeatures.options = options
+
+  // If this is gold, pull the right answer out of the key set
+  def keyMentions = doc.getAnnotations('Key').getContainedAnnotations(anns[0], 
"Mention")
+  if(keyMentions) {
+    mentionFeatures.correct = keyMentions.onlyAnn.features.inst
+  }
+
+  outputAS.add(offsets[0] as Long, offsets[1] as Long, "Mention", 
mentionFeatures)
+}

This was sent by the SourceForge.net collaborative development platform, the 
world's largest Open Source development site.


------------------------------------------------------------------------------
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment 
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs

Reply via email to