[gate-cvs] SF.net SVN: gate:[17490] mimir/trunk/doc

valyt Fri, 28 Feb 2014 08:44:11 -0800

Revision: 17490
          http://sourceforge.net/p/gate/code/17490
Author:   valyt
Date:     2014-02-28 16:43:43 +0000 (Fri, 28 Feb 2014)
Log Message:
-----------
Excised all mentions of the sesame plugin.


Modified Paths:
--------------
    mimir/trunk/doc/admin.tex
    mimir/trunk/doc/changes.tex
    mimir/trunk/doc/extending.tex
    mimir/trunk/doc/indexing.tex
    mimir/trunk/doc/mimir-guide.pdf
    mimir/trunk/doc/plugins.tex

Modified: mimir/trunk/doc/admin.tex
===================================================================
--- mimir/trunk/doc/admin.tex   2014-02-28 14:55:55 UTC (rev 17489)
+++ mimir/trunk/doc/admin.tex   2014-02-28 16:43:43 UTC (rev 17490)
@@ -13,11 +13,6 @@
 stores annotation data using H2\footnote{\url{http://h2database.com}}, an
 in-process embedded SQL database.
 
-\item[plugins/sesame] An alternative annotation storage implementation that
-stores its annotation data in a triple store using the Sesame API
-(\url{http://www.openrdf.org/}).  Annotations with an ``inst'' feature are
-treated as links into the knowledge base, supporting richer semantic queries.
-
 \item[plugins/sparql] A helper that can be layered on top of any other storage
 implementation to provide semantic querying against a separate knowledge base,
 accessible at a SPARQL endpoint.
@@ -163,15 +158,15 @@
 }
 \end{lstlisting}
 
-This section specifies the \Mimir\ plugins that should be loaded, and
-determines the kinds of annotation helpers you will be able to use in your
-indexes.  You generally need at least one of the standard {\tt db-h2} and/or
-{\tt sesame} plugins to be able to do anything useful with \Mimir, and you may
-want the {\tt measurements} plugin as well if you will be searching on
-Measurement annotations and/or the {\tt sparql} plugin if you have an external
-knowledge base.  Section~\ref{sec:indexing:helpers} has more information about
-the standard annotation helpers, and section~\ref{sec:extend:helpers} discusses
-how to implement your own custom ones.
+This section specifies the \Mimir\ plugins that should be loaded, and 
determines
+the kinds of annotation helpers you will be able to use in your indexes.  You
+generally need at least  the standard {\tt db-h2} plugin to be able to do
+anything useful with \Mimir, and you may want the {\tt measurements} plugin as
+well if you will be searching on Measurement annotations and/or the {\tt 
sparql}
+plugin if you have an external knowledge base. 
+Section~\ref{sec:indexing:helpers} has more information about the standard
+annotation helpers, and section~\ref{sec:extend:helpers} discusses how to
+implement your own custom ones.
 
 \Mimir\ uses the GATE plugin mechanism, so \Mimir\ plugins are actually
 very simple CREOLE plugins\footnote{See

Modified: mimir/trunk/doc/changes.tex
===================================================================
--- mimir/trunk/doc/changes.tex 2014-02-28 14:55:55 UTC (rev 17489)
+++ mimir/trunk/doc/changes.tex 2014-02-28 16:43:43 UTC (rev 17490)
@@ -11,6 +11,9 @@
   \item The {\em mimir-demo} example web application has been removed.
   \item The {\em mimir-cloud} has been modified to make it more suitable as a
   generic example web application.
+  \item The sesame \Mimir{} plugin has been removed. For standard annotation
+  indexing we recommend using the db-h2 plugin. For handling formal semantics,
+  we recommend using the SPARQL plugin.
   \item New query operator: {\bf MINUS} (also `-') performs the set minus
   operation on result sets (see Section~\ref{sec:minus-query}).  
   \item \Mimir{} now supports the construction of direct indexes (see

Modified: mimir/trunk/doc/extending.tex
===================================================================
--- mimir/trunk/doc/extending.tex       2014-02-28 14:55:55 UTC (rev 17489)
+++ mimir/trunk/doc/extending.tex       2014-02-28 16:43:43 UTC (rev 17490)
@@ -44,18 +44,14 @@
 
 \subsubsection*{Lifecycle Methods}
 
-The interface includes two pairs of init/close lifecycle methods, one pair
-taking an {\tt Indexer} parameter (used when the helper is indexing
-annotations) and the other pair taking a {\tt QueryEngine} parameter (used when
-the helper is searching).  Both the Indexer and QueryEngine provide access to
-an {\tt IndexConfig} object which defines the configuration of the index,
-including the location of the index files on disk, and provides a mutable
-``context'' map that can be used to share objects among the various SAH objects
-(for example the Sesame helper uses the context to share a single connection to
-the semantic repository among all the helpers associated with the index).  The
-appropriate {\tt init} method is called by \Mimir\ when the index is opened (in
-whichever mode), before any other requests are passed to the helper, and the
-corresponding {\tt close} method is called when the index is shut down.
+The interface includes init/close lifecycle methods, taking an {\tt MimirIndex}
+parameter. The MimirIndex object provides access to an {\tt IndexConfig} object
+which defines the configuration of the index, including the location of the
+index files on disk, and provides a mutable ``context'' map that can be used to
+share objects among the various SAH objects.  The {\tt init} method is called 
by
+\Mimir\ when the index is opened, before any other requests are passed to the
+helper, and the corresponding {\tt close} method is called when the index is
+shut down.
 
 \subsubsection*{Indexing Methods}
 

Modified: mimir/trunk/doc/indexing.tex
===================================================================
--- mimir/trunk/doc/indexing.tex        2014-02-28 14:55:55 UTC (rev 17489)
+++ mimir/trunk/doc/indexing.tex        2014-02-28 16:43:43 UTC (rev 17490)
@@ -116,12 +116,11 @@
 One could make a distinction between {\em generic} semantic annotation helper
 types, which can be configured to handle any annotation type and features, and
 {\em special-purpose} helpers that are designed to handle specific annotation
-types.  \Mimir\ supplies two generic helper implementations in the {\tt db-h2}
-and {\tt sesame} plugins that store annotation information in a relational
-database and a knowledge base respectively.  For the most standard cases, one 
or
-other of these default helper implementations should be sufficient.  One sample
-special-purpose helper for {\tt Measurement} annotations (as generated by the
-GATE {\tt Tagger\_Measurements} plugin) is also provided, in the {\tt
+types.  \Mimir\ supplies a generic helper implementations in the {\tt db-h2}
+plugin that store annotation information in a relational database.  For the 
most
+standard cases, this default helper implementation should be sufficient.  One
+sample special-purpose helper for {\tt Measurement} annotations (as generated 
by
+the GATE {\tt Tagger\_Measurements} plugin) is also provided, in the {\tt
 measurements} plugin.  This is intended both to be useful in its own right and
 to serve as a template for how to implement your own helpers for other complex
 annotation types.  The {\tt sparql} plugin provides a helper that can wrap any
@@ -280,60 +279,198 @@
 
 \section{The Default Representation Scheme}\label{sec:indexing:dsah-detail}
 
-\begin{figure}[htb]
+The default generic SAH implementations try to minimise the amount of data
+stored in their underlying database or semantic repository by creating
+representation templates that are shared between all occurrences of annotations
+with the same values for the features. There are two levels of templates, the
+first defined by the values of nominal features, and the second that uses the
+values of all the other features. This is intended to reflect the typical
+scenario where most annotations are defined by a small set of nominal features,
+with a few of them having features with arbitrary values. Most annotation types
+would then only make use of level-1 templates, with a few of them employing 
both
+level-1, and level-2 templates.
+
+\begin{figure*}[htb]
 \begin{center}
-\includegraphics[scale=0.66]{img/dsah-model}
-\caption{Default semantic annotation helper representation schema.}
-\label{fig:dsah-model}
-\end{center}
-\end{figure}
+{\footnotesize  
+\begin{tabular}{|r|l|l|l|l|l|l|l|}
+\hline
+Document: & London & is & located & on & the & Thames & .\\
+\hline
+position: & 0 & 1 & 2 & 3 & 4 & 5 & 6\\
+\hline
+{\bf string:} & london & is & located & on & the & thames & .\\
+\hline
+{\bf root:} & london & be & locate & on & the & thames & .\\
+\hline
+{\bf part-of-speech:} & NNP & VBZ & VBN & IN & DT & NNP & .\\
+\hline
+{\bf Location:} & {\tt type=city} &  &  & &  & {\tt type=river} & \\
+\hline
+\end{tabular}
 
-The default generic SAH implementations try to minimise the amount of data 
stored in their underlying database or semantic repository
-by creating representation templates that are shared
-between all occurrences of annotations with the same values for the features.
-There are two levels of templates, the first defined by the values of nominal
-features, and the second that uses the values of all the other features. This
-is intended to reflect the typical scenario where most annotations are defined
-by a small set of nominal features, with a few of them having features with
-arbitrary values. Most annotation types would then only make use of level-1
-templates, with a few of them employing both level-1, and level-2 templates.
+\smallskip
 
-\begin{figure}[htb]
-\begin{center}
-\includegraphics[scale=0.66]{img/dsah-data}
-\caption{Default semantic annotation helper data example.}
-\label{fig:dsah-data}
+\begin{tabular}{ll}
+\multicolumn{2}{c}{\bf Token indexes}\\
+
+% Token root index
+\parbox[t]{5em} {
+\begin{tabular}[t]{|l|l|}
+\hline
+\multicolumn{2}{|l|}{\bf root index}\\
+\hline
+. & $0(6)$\\ 
+\hline
+be & $0(1)$\\ 
+\hline
+locate & $0(2)$\\
+\hline
+london & $0(0)$\\
+\hline
+on & $0(3)$\\
+\hline
+thames & $0(5)$\\
+\hline
+the & $0(4)$\\
+\hline
+\end{tabular}
+} &
+% Token PoS index
+\parbox[t]{6em} {
+\begin{tabular}[t]{|l|l|}
+\hline
+\multicolumn{2}{|l|}{\bf PoS index}\\
+\hline
+. & $0(6)$\\
+\hline
+DT & $0(4)$\\ 
+\hline
+IN & $0(3)$\\ 
+\hline
+NNP & $0(0, 5)$\\
+\hline
+VBN & $0(2)$\\
+\hline
+VBZ & $0(1)$\\
+\hline
+\end{tabular}
+} \\
+
+{\bf Location templates} &  {\bf Location index} \\
+% Location templates
+\parbox[t]{24em} {
+\begin{tabular}[t]{|l|l|}
+\hline
+{\bf L1 ID} & {\bf type}\\
+\hline
+1 & city\\ 
+\hline
+2 & river\\ 
+\hline
+\end{tabular}
+
+\smallskip
+
+\begin{tabular}[t]{|l|l|l|}
+\hline
+{\bf L2 ID} & {\bf L1 ID} & {\bf instURI}\\
+\hline
+1 & 1 & dbpedia.org/resource/London \\ 
+\hline
+2 & 2 & dbpedia.org/resource/Thames\_river \\ 
+\hline
+\end{tabular}
+
+\begin{tabular}[t]{|l|l|l|l|}
+\hline
+{\bf Mention ID} & {\bf L1 ID} & {\bf L2 ID} & {\bf length} \\
+\hline
+Location:1 & 1 & - & 1 \\
+\hline
+Location:2 & 1 & 1 & 1 \\ 
+\hline
+Location:3 & 2 & - & 1 \\
+\hline
+Location:4 & 2 & 2 & 1 \\
+\hline
+Location:5 & 2 & 2 & 3 \\ 
+\hline
+\end{tabular}
+} &
+
+% Location index
+\parbox[t]{14em} {
+
+\begin{tabular}[t]{|l|l|}
+\hline
+\multicolumn{2}{|l|}{\bf \{Location\} index}\\
+\hline
+Location:1 & $0(0)$\\
+\hline
+Location:2 & $0(0)$\\
+\hline
+Location:3 & $0(5)$\\ 
+\hline
+Location:4 & $0(5)$\\ 
+\hline
+\end{tabular}
+}
+\end{tabular}
+} 
+\caption[\Mimir{} index contents]{A very simple example document and the
+corresponding contents of a \Mimir{} index. We assume that the only document 
ID is $0$.\\
+Different {\em views} of the document text are generated by different token
+features, which are stored in separate sub-indexes. The document string has 
been
+down-cased prior to indexing; we do not show the {\tt string} index, as it is
+very similar to the one for the {\tt root} feature. The values used for
+Part-of-Speech (PoS) are standard tags as produced by GATE's PoS Tagger:
+DT=determiner, IN=preposition, NNP=proper noun, VBN=verb - past participle, 
+VBZ=verb - 3rd person singular present.\\
+A single annotation type ({\tt \{Location\}}) is being indexed, with two
+different occurrences, and we assume the only non-nominal feature to be the
+DBpedia instance URI. Note that ``{\tt Location:5}'' (i.e. a mention of the
+Thames that is 3-tokens long) does not actually occur in the document text, so
+it is not present in the index. We have included it here as an example of an
+annotation of length greater than $1$.}
+\label{fig:token-indexes}
 \end{center}
-\end{figure}
+\end{figure*}
 
-The representation schema used by the Sesame helper is illustrated in
-Figure~\ref{fig:dsah-model}.  Figure~\ref{fig:dsah-data} shows the data created
-for an example annotation, with the two mention URIs displayed in bold. These
-URIs will be stored in the mentions index.  The DB helper uses a similar
-strategy, with separate level 1 (nominal) and level 2 (everything else)
-database tables for each annotation type.  Annotation types that only have
-nominal features need just a level 1 table.
+For each input annotation the following IDs are retrieved (or generated on 
first
+occurrence):\\
+{\bf Level-1 template ID} The annotation type and the values for all its 
nominal
+features form a tuple. The first time each tuple configuration is seen, it is
+allocated a level-1 ID. Subsequent annotations that match an already existing
+tuple will re-use the same level-1 ID. For example, in
+Figure~\ref{fig:token-indexes} all annotations of type {\em Location} with
+feature {\em city} will use the level-1 ID `{\tt 1}'.\\
+{\bf Level-2 template ID} The level-1 template ID together with the values for
+all the remaining (i.e. non-nominal) features form a second tuple. Unique
+configurations of these tuples are allocated level-2 IDs. It should be noted
+that most NLP annotations tend to include only nominal features, so they would
+not require a level-2 ID. The \verb!{Location}! annotations shown in
+Figure~\ref{fig:token-indexes} have a non-nominal feature, so they each get a
+level-2 ID allocated to them. Note, however, that all further mentions of e.g.
+the {\em Thames} would re-use the same IDs, even when phrased differently in
+the text, e.g ``{\em the river Thames}'', or ``{\em La Tamise}''.
+\\
+{\bf Mention ID} The level-1 ID and the annotation length (number of tokens)
+forms a tuple, which is associated with a mention ID -- in
+figure~\ref{fig:token-indexes} {\em Location} annotations with feature {\em
+city} covering one token will take the mention ID ``Location:1''. If present,
+the level-2 ID and the annotation length also get a mention ID. For example, 
all
+mentions of ``the River Thames'' are associated with the mention ID
+``{\tt Location:5}'' (because they refer to the Thames, and are 3 tokens long).
 
-The execution flow for each annotation includes the following main steps:
-\bit
-  \item Given the annotation's nominal features, find an appropriate level-1
-  (L1) template. If none is found, create one.
-  \item For the L1 template, find a mention of appropriate length (the number
-  of tokens covered by the annotation). If none is found, create one. Add the
-  mention URI to the mentions index.
-  \item If the annotation has non-nominal features:
-  \begin{itemize}
-    \item Find an appropriate level-2 (L2) template, based on the feature
-    values. If none is found, create one.
-    \item For the L2 template, find (or create) a mention of appropriate 
length;
-    add the mention URI to the index.
-  \end{itemize}  
-\eit
+Finally, the one or two mention IDs associated with each annotation are added 
to
+an \emph{annotation index}, using the annotation start token as
+the position. 
 
-All annotations sharing the same feature values will share the same database
-entries or knowledge base entities (the resources with URIs `\verb!FooL1:1!'
-and `\verb!FooL1:1_1!'), and the same mention objects. This has the
-advantage of reducing the size of the database, which allows more
-documents to be indexed, and helps achieve better execution speeds during
-search. The downside of this is that the indexing process is somewhat slowed
-down, as pre-existing entities need to be retrieved at every step.
+We index two separate mention IDs associated with either level-1 or level-2 
IDs,
+in order to speed-up searches that only make use of nominal features. For
+annotation types that have non-nominal features, the number of level-2 IDs will
+be orders of magnitude greater than that for level-1. If a search only relies 
on
+nominal constraints (a large proportion of searches tend to fall into this
+category), then the query can be answered much faster by only accessing the
+smaller number of posting lists for the matching level-1 IDs.

Modified: mimir/trunk/doc/mimir-guide.pdf
===================================================================
(Binary files differ)

Modified: mimir/trunk/doc/plugins.tex
===================================================================
--- mimir/trunk/doc/plugins.tex 2014-02-28 14:55:55 UTC (rev 17489)
+++ mimir/trunk/doc/plugins.tex 2014-02-28 16:43:43 UTC (rev 17490)
@@ -6,12 +6,12 @@
 
 \section{The {\tt db-h2} Plugin}\label{sec:plugins:db}
 
-The {\tt db-h2} plugin is one of two plugins (the other being the {\tt sesame}
-plugin) that provides a {\em generic} semantic annotation helper implementation
-that can be configured for any annotation type with any features.  The helper
-provided by {\tt db-h2} uses an embedded relational database engine
-(\url{http://www.h2database.com/}) to store the annotation data, and generally
-provides the best performance of the standard generic helpers.
+The {\tt db-h2} plugin a plugin that provides a {\em generic} semantic
+annotation helper implementation that can be configured for any annotation type
+with any features.  The helper provided by {\tt db-h2} uses an embedded
+relational database engine (\url{http://www.h2database.com/}) to store the
+annotation data, and generally provides the best performance of the standard
+generic helpers.
 
 \lstinline!gate.mimir.db.DBSemanticAnnotationHelper! is the helper class
 provided by the {\tt db-h2} plugin.  It has a constructor that takes a
@@ -52,53 +52,8 @@
 \ede
 
 The DB-based helper does not distinguish between text- and URI-valued features,
-indexing both types in the same way, but it accepts both kinds as arguments
-to be consistent with the Sesame helper (described below).  When using the
-{\tt import as} mechanism, switching an index template from the DB helper to
-the Sesame helper or vice versa is simply a matter of changing the name of the
-class being imported.
+indexing both types in the same way, but it accepts both kinds as arguments.
 
-\section{The {\tt sesame} Plugin}\label{sec:plugins:sesame}
-
-The {\tt sesame} plugin provides the other standard generic helper,
-\lstinline!gate.mimir.sesame.SesameSemanticAnnotationHelper!.  This helper
-stores annotation data as triples in a Sesame repository (by default using
-OWLIM as the underlying storage engine).
-
-The \lstinline!SesameSemanticAnnotationHelper! constructor takes the same named
-arguments as the DB helper for the annotation type and feature names (see the
-previous section for details).  Internally all numeric features (integer and
-float) are stored as floating point numbers, but the helper accepts both
-``integerFeatures'' and ``floatFeatures'' parameters for consistency with the
-DB helper.
-
-For advanced users it also provides a mechanism to specify the location of the
-Sesame repository configuration template file (in the standard Turtle RDF
-format), as a parameter named either ``absolutePath'' (for an absolute path
-to the file) or ``relativePath'' (for a path relative to the {\tt sesame}
-plugin directory).  The default configuration file is {\tt resources/owlim.ttl}
-under the {\tt sesame} plugin.  When specifying a custom repository
-configuration it is important to retain the repositoryID of ``owlim'' as this
-is assumed by the helper code.
-
-\subsection{Searching in the Knowledge Base}
-
-For annotations that have a URI-valued feature named ``inst'', the Sesame
-helper provides an additional ``synthetic'' feature named
-{\tt semanticConstraint} that provides a way to query for annotations based on
-information in the knowledge base.
-\begin{lstlisting}[breaklines]
-{Organization semanticConstraint = "?inst 
<http://proton.semanticweb.org/2005/04/protont#locatedIn> 
<http://example.com#Sheffield> ."}
-\end{lstlisting}
-
-The value of the semanticConstraint feature is a SPARQL fragment that has
-access to the {\tt ?inst} variable, referring to the URI in the ``inst''
-feature of the annotation.  This mechanism can only query for one specific
-feature (inst) and only against triples stored in the same semantic repository
-as the \Mimir\ annotation data.  For a more flexible approach to semantic
-queries that can make use of semantic data stored in a remote knowledge base,
-see the {\tt sparql} plugin (section~\ref{sec:plugins:sparql}).
-
 \section{The {\tt measurements} Plugin}\label{sec:plugins:measurements}
 
 The GATE {\tt Tagger\_Measurements} plugin, introduced in GATE 6.1, is able to
@@ -155,8 +110,7 @@
   of generic helper that the Measurements helper should delegate to.  This
   class must provide a 6-argument constructor taking the annotation type (a
   String) and five String arrays for the nominal, integer, float, text and URI
-  feature names respectively.  Both the DB and Sesame helpers provide this
-  constructor.
+  feature names respectively.
 \item[unitsFile] the location of the {\tt units.dat} file used to configure the
   measurements parser.  If not specified, a default file provided with the
   {\tt measurements} plugin is used.  This value can be an absolute URL

This was sent by the SourceForge.net collaborative development platform, the 
world's largest Open Source development site.


------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
GATE-cvs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gate-cvs

[gate-cvs] SF.net SVN: gate:[17490] mimir/trunk/doc

Reply via email to