Revision: 17790
http://sourceforge.net/p/gate/code/17790
Author: ian_roberts
Date: 2014-04-09 16:50:36 +0000 (Wed, 09 Apr 2014)
Log Message:
-----------
Example Groovy script to construct a suitable Mention annotation for the
classification job builder from a set of overlapping Lookups.
Added Paths:
-----------
gate/trunk/plugins/Crowd_Sourcing/resources/
gate/trunk/plugins/Crowd_Sourcing/resources/lookupsToMention.groovy
Added: gate/trunk/plugins/Crowd_Sourcing/resources/lookupsToMention.groovy
===================================================================
--- gate/trunk/plugins/Crowd_Sourcing/resources/lookupsToMention.groovy
(rev 0)
+++ gate/trunk/plugins/Crowd_Sourcing/resources/lookupsToMention.groovy
2014-04-09 16:50:36 UTC (rev 17790)
@@ -0,0 +1,73 @@
+/*
+ * lookupsToMentions.groovy
+ *
+ * This is an example Groovy script intended to be run by the script PR,
+ * demonstrating how you can create annotations suitable for use by the Entity
+ * Classification Job Builder PR.
+ *
+ * The example scenario is one where we have Lookup annotations generated by
+ * an ontological gazetteer (e.g. Gazetteer_LKB) with a feature "inst" linking
+ * them to an instance in a knowledge base. For ambiguous entities there may
+ * be several different Lookup annotations spanning the same pair of offsets,
+ * each with a different "inst". Additionally, Lookups may have a feature
+ * named "connections" containing a human-readable description of the KB
+ * instance (these can be created using the semantic enrichment PR from the
+ * Gazetteer_LKB plugin).
+ *
+ * The script gathers co-extensive Lookup annotations into groups, and for
+ * each group it generates one output Mention annotation, with the "options"
+ * feature expected by the job builder PR - a LinkedHashMap from "inst" to
+ * "connections" (if it exists).
+ *
+ * Additionally, if the document has a "Key" annotation set containing a
+ * Lookup at the same location, this is assumed to be the gold standard
+ * correct answer.
+ *
+ * Copyright (c) 2014, The University of Sheffield. See the file
+ * COPYRIGHT.txt in the software or at http://gate.ac.uk/gate/COPYRIGHT.txt
+ *
+ * This file is part of GATE (see http://gate.ac.uk/), and is free
+ * software, licenced under the GNU Library General Public License,
+ * Version 3, June 2007 (in the distribution as file licence.html,
+ * and also available at http://gate.ac.uk/gate/licence.html).
+ *
+ * $Id: CrowdFlowerConstants.java 17412 2014-02-24 17:30:09Z ian_roberts $
+ */
+
+import org.apache.commons.lang.StringEscapeUtils as SEU
+
+// gather up all the groups of co-extensive Lookup annotations
+inputAS["Lookup"].groupBy { [it.start(), it.end()] }.each { offsets, anns ->
+ // example of filtering - if each Lookup has a "count" feature then we can
+ // pick the top 8 (or fewer) sorted in descending order of count
+ def top8anns = anns.sort {
+ -1 * (it.features.count?.toInteger() ?: 0)
+ }.take(8)
+
+ // randomize the order to avoid conditioning humans to always pick the
+ // first one
+ Collections.shuffle(top8anns)
+
+ def mentionFeatures = Factory.newFeatureMap()
+ def options = new LinkedHashMap()
+ top8anns.each { ann ->
+ // option is URI -> abstract if there is one
+ if(ann.features.connections) {
+ options[ann.features.inst] = ann.features.connections
+ } else {
+ // or URI -> URI-with-a-clickable-link if not
+ def htmlInst = SEU.escapeHtml(ann.features.inst)
+ options[ann.features.inst] = (htmlInst +
+ ' <a target="_blank" href="' + htmlInst + '">(details)</a>')
+ }
+ }
+ mentionFeatures.options = options
+
+ // If this is gold, pull the right answer out of the key set
+ def keyMentions = doc.getAnnotations('Key').getContainedAnnotations(anns[0],
"Mention")
+ if(keyMentions) {
+ mentionFeatures.correct = keyMentions.onlyAnn.features.inst
+ }
+
+ outputAS.add(offsets[0] as Long, offsets[1] as Long, "Mention",
mentionFeatures)
+}
This was sent by the SourceForge.net collaborative development platform, the
world's largest Open Source development site.
------------------------------------------------------------------------------
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs