Author: buildbot
Date: Thu Oct  3 12:41:51 2013
New Revision: 881012

Log:
Staging update by buildbot for stanbol

Added:
    
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-addfields.png
   (with props)
    
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-fstconfig.png
   (with props)
    
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-fstfolder.png
   (with props)
    
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-indexlayout.png
   (with props)
    
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-solrcore.png
   (with props)
    
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config.png
   (with props)
    
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/lucenefstlinking.html
Modified:
    websites/staging/stanbol/trunk/content/   (props changed)
    
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/list.html

Propchange: websites/staging/stanbol/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Thu Oct  3 12:41:51 2013
@@ -1 +1 @@
-1513363
+1528830

Added: 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-addfields.png
==============================================================================
Binary file - no diff available.

Propchange: 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-addfields.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-fstconfig.png
==============================================================================
Binary file - no diff available.

Propchange: 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-fstconfig.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-fstfolder.png
==============================================================================
Binary file - no diff available.

Propchange: 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-fstfolder.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-indexlayout.png
==============================================================================
Binary file - no diff available.

Propchange: 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-indexlayout.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-solrcore.png
==============================================================================
Binary file - no diff available.

Propchange: 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config-solrcore.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config.png
==============================================================================
Binary file - no diff available.

Propchange: 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/fstengine-config.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Modified: 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/list.html
==============================================================================
--- 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/list.html
 (original)
+++ 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/list.html
 Thu Oct  3 12:41:51 2013
@@ -280,6 +280,15 @@
 </ul>
 </li>
 <li>
+<p><strong><a href="lucenefstlinking">FST Linking Engine</a>:</strong></p>
+<ul>
+<li>Entity Linking Engine based on Lucene FST (Finit State Transducer) 
technology</li>
+<li>Links Entities indexed in a Solr index (e.g. an Entityhub Site backed by a 
SolrYard)</li>
+<li>Provides better linking performance as the <a 
href="entityhublinking">Entityhub Linking Engine</a></li>
+<li>Requires a lot of CPU after changes of the vocabulary to re-create the FST 
models.</li>
+</ul>
+</li>
+<li>
 <p><strong>DBpedia Spotlight Annotation Engine:</strong> Integration of the 
DBpedia Spotlight with the Stanbol Enhancer (see <a 
href="https://issues.apache.org/jira/browse/STANBOL-706";>STANBOL-706</a>)</p>
 <ul>
 <li>includes NLP, Entity Linking and Disambiguation of Entities using <a 
href="http://dbpedia.org";>DBpedia</a> as knowledge base</li>

Added: 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/lucenefstlinking.html
==============================================================================
--- 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/lucenefstlinking.html
 (added)
+++ 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/lucenefstlinking.html
 Thu Oct  3 12:41:51 2013
@@ -0,0 +1,229 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
"http://www.w3.org/TR/html4/loose.dtd";>
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - The FST Linking Engine: Linking NLP processed Text 
with Vocabularies indexed in a Solr index</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link title="doap" rel="meta" type="application/rdf+xml" href="/doap.rdf"/>
+  <link rel="icon" type="image/png" 
href="/images/stanbol-logo/stanbol-favicon.png"/>
+  <script type="text/javascript">
+    // Google Analytics Tracking Code
+    var _gaq = _gaq || [];
+    _gaq.push(['_setAccount', 'UA-32086816-1']);
+    _gaq.push(['_trackPageview']);
+
+    (function() {
+      var ga = document.createElement('script'); ga.type = 'text/javascript'; 
ga.async = true;
+      ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 
'http://www') + '.google-analytics.com/ga.js';
+      var s = document.getElementsByTagName('script')[0]; 
s.parentNode.insertBefore(ga, s);
+    })();
+  </script>  
+</head>
+
+<body>
+  <div id="navigation"> <!-- but auto scroll the menue -->
+    <a href="/index.html"><img alt="Apache Stanbol" width="220" height="101" 
border="0" src="/images/stanbol-logo/stanbol-2010-12-14.png"/></a><br />
+      <ul>
+<li><a href="/docs/trunk/tutorial.html">Getting Started</a></li>
+<li><a href="/docs/trunk/">Documentation</a><ul>
+<li><a href="/docs/trunk/scenarios.html">Usage Scenarios</a></li>
+<li><a href="/docs/trunk/components/">Components</a></li>
+<li><a href="/docs/trunk/production-mode/">Production Mode</a></li>
+</ul>
+</li>
+<li><a href="/development/">Development</a><ul>
+<li><a href="/development/index.html#mailing_lists">Mailing Lists</a></li>
+<li><a href="/development/index.html#issue_tracker">Issue Tracker</a></li>
+<li><a href="/development/index.html#source_code">Source Code</a></li>
+<li><a href="/development/index.html#development_practices">Development 
Practices</a></li>
+</ul>
+</li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/downloads/">Overview</a><ul>
+<li><a href="/downloads/releases.html">Releases</a></li>
+<li><a href="/downloads/launchers.html">Launchers</a></li>
+</ul>
+</li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/pmc/">PMC</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0";>License</a></li>
+<li><a href="/privacy-policy.html">Privacy Policy</a></li>
+</ul>
+<h1 id="archived-docs">Archived Docs</h1>
+<ul>
+<li><a href="/docs/0.9.0-incubating/">0.9.0-incubating</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org";>Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html";>Become a 
Sponsor</a></li>
+<li><a href="http://www.apache.org/security/";>Security</a></li>
+</ul>
+<p><br /><a href="/doap.rdf"><img style="margin-left: 1em;" border="0" 
alt="DOAP File" src="/images/doap.png"/></a></p>
+  </div>
+  <div id="content">
+    <div class="breadcrumbs">
+      <ul> <li><a href="/">Home</a></li> <li class="item"><a 
href="/docs/">Docs</a></li> <li class="item"><a 
href="/docs/trunk/">Trunk</a></li> <li class="item"><a 
href="/docs/trunk/components/">Components</a></li> <li class="item"><a 
href="/docs/trunk/components/enhancer/">Enhancer</a></li> <li class="item"><a 
href="/docs/trunk/components/enhancer/engines/">Engines</a></li> </ul>
+    </div>
+    <h1 class="title">The FST Linking Engine: Linking NLP processed Text with 
Vocabularies indexed in a Solr index</h1>
+    <p>The <strong>Lucene FST Linking Engine</strong> is an Entity Linking 
Engine based on the <a href="http://lucene.apache.org";>Lucene</a> FST (Finite 
State Transducer) technology. FST provides a very efficient way to hold Entity 
labels in-memory. This avoids the need of disc IO for such as required by the 
other entity linking engines.</p>
+<p>This engine is build on top of the OpenSextant <a 
href="https://github.com/OpenSextant/SolrTextTagger/";>Solr-Text-Tagger</a> that 
implements the building of the FST models as well as the tagging of the 
processed text.</p>
+<h2 id="configuration">Configuration</h2>
+<p>The configuration of the FST linking engine consists of several parts 
explained in detail by the following sub-sections.
+Configurations can be created by using the <a 
href="fstengine-config.png">Configuration Dialog</a> provided by the Apache 
Felix Webconsole (search for "FST Linking" in the configuration tab). However 
NOTE that his dialog dos not include all supported configuration options. 
Options not included in the dialog can be configured by directly using OSGi 
configuration (*.config) files.</p>
+<h3 id="engine-name-and-service-ranking">Engine Name and Service Ranking</h3>
+<p>As all Stanbol Enhancement Engines this engine support the following two 
properties</p>
+<ul>
+<li><strong>Name</strong> <em>(stanbol.enhancer.engine.name)</em>: The name of 
the Enhancement Engine. This name is used to refer an <a 
href="index.html">EnhancementEngine</a> in <a 
href="../chains">EnhancementChain</a>s</li>
+<li><strong>ServiceRankging</strong> <em>(service.ranking)</em>: In case 
multiple enhancement engines do use the same name, than only the one with the 
higher ranking will get uses.</li>
+</ul>
+<h3 id="configuration-of-the-solr-index">Configuration of the Solr Index</h3>
+<p><img alt="SolrCore configuration" src="fstengine-config-solrcore.png" 
title="The configuration option used to configure the SolrCore" /></p>
+<p>The Solr index is configured by using the 
<code>enhancer.engines.linking.lucenefst.solrcore</code> configuration property 
of the Engine. This property needs to point to a Solr index that runs embedded 
in the same JVM as Apache Stanbol. The Stanbol Commons Solr modules provide two 
Components that allow to configure embedded Solr Indexes:</p>
+<ol>
+<li><strong><a 
href="/docs/trunk/utils/commons-solr#referencedsolrserver">ReferencedSolrServer</a></strong>:
 This components allows uses to configure a directory containing a SolrServer 
configuration (the directory with the solr.xml file). All Solr indexes defined 
by the Solr.xml will be initialized and published as OSGI services to Apache 
Stanbol. Such indexes can be configured to the engine by using 
{server-name}:{index-name}. {server-name} is the name of the 
ReferencedSolrServer as provided in the configuration. {index-name} is the name 
of the Solr index as defined in the solr.xml.</li>
+<li><strong><a 
href="/docs/trunk/utils/commons-solr#managedsolrserver">ManagedSolrServer</a></strong>:
 This component allows to have a Solr server that is fully managed by Apache 
Stanbol. Indexes can be installed by copying '{name-name}.solrindex.zip' files 
to the 'stanbol/datafiles'. Solr indexes initialized like that will be 
available under '{index-name}' and 'default:{index-name}'.</li>
+</ol>
+<p>Used Solr indexes need also confirm to the requirements of the <a 
href="https://github.com/OpenSextant/SolrTextTagger/";>SolrTextTagger</a> 
module. That means that fields used for FST linking MUST use field analyzers 
that produce consecutive positions (i.e. the position increment of each term 
must always be 1). This means that typical field analyzers as sued for searches 
will not work.</p>
+<p>The SolrTextTagger README provides an example for a Field Analyzer 
configuration that does work. To make things easier this engine includes this 
<a href="fst_field_types.xml">XML file</a> that includes a schema.xml fragment 
with FST tagging compatible configurations for most languages supported by 
Solr.</p>
+<h3 id="solr-index-layout-configuration">Solr Index Layout Configuration</h3>
+<p><img alt="Solr core index layout configuration" 
src="fstengine-config-indexlayout.png" title="The configuration option used to 
configure the Solr Index Layout" /></p>
+<p>This part of the configuration is used to specify the layout if the used 
Solr index. It specifies how Entity information are stored in the Solr 
index.</p>
+<h4 id="field-name-encoding">Field Name Encoding</h4>
+<p>The Field Name Encoding configuration 
<code>enhancer.engines.linking.lucenefst.fieldEncoding</code> specifies how 
Solr fields for multiple languages are encoded. As an example a Vocabulary with 
labels in multiple languages might use "en_label" for the English language 
labels and "de_label" for the German language labels. In this case users should 
set this property to <code>UnderscorePrefix</code> and simple use "label" when 
configuring the FST field name. </p>
+<p>The Field Name Encodings work well with Solr dynamic field configurations 
that allow to map language specific FieldType specifications to prefixes and 
suffixes such as</p>
+<p><dynamicField name="en_*" type="text_en_fst" indexed="true" stored="true" 
multiValued="true" omitNorms="false"/>
+   <dynamicField name="de_*" type="text_en_fst" indexed="true" stored="true" 
multiValued="true" omitNorms="false"/></p>
+<p>This is the full list of supported Field encodings:</p>
+<ul>
+<li>SolrYard: This supports the encoding use by the Stanbol Entityhub SolrYard 
implementation to encode RDF data types and language literals. If you configure 
the FST Linking Engine for a Solr index build for the SolrYard you need to use 
this encoding</li>
+<li>MinusPrefix: {lang}-{field} (e.g. "en-name")</li>
+<li>UnderscorePrefix: {lang}_{field} (e.g. "en_name")</li>
+<li>AtPrefix: {lang}@{field} (e.g. "en@name")</li>
+<li>MinusSuffix: {field}-{lang} (e.g. "name-en")</li>
+<li>UnderscoreSuffix: {field}-{lang} (e.g. "name_en")</li>
+<li>AtSuffix: {field}-{lang} (e.g. "name@en")</li>
+<li>None: In this case no prefix/suffix rewriting of configured 
<code>field</code> and <code>store</code> values is done. This means that the 
FST Configuration MUST define the exact field names in the Solr index for every 
configured language.</li>
+</ul>
+<h4 id="fst-tagging-configuration">FST Tagging Configuration</h4>
+<p><img alt="FST configuration" src="fstengine-config-fstconfig.png" 
title="The configuration used to configure the languages and fields FST models 
are build for" /></p>
+<p>The FST Tagging Configuration 
<code>enhancer.engines.linking.lucenefst.fstconfig</code> defines several 
things:</p>
+<ol>
+<li>for what languages FST models should be build. This configuration is 
basically a list of language codes but also supports wildcards '*' and 
exclusions '!{en}'</li>
+<li>what fields in the Solr Index are used to build FST models. Two fields per 
language are required: a) an 'Indexed Field' (<em>field</em> parameter) and b) 
a 'Stored Field' (<em>stored</em> parameter). Both the indexed and stored field 
might refer to the same field in the Solr index. In that case this field needs 
to use <code>indexed="true" stored="true"</code>.</li>
+<li>if FST models can be build by the Engine at runtime as well as the name of 
the serialized models.</li>
+</ol>
+<p>This configuration is line based (multi valued) and uses the following 
generic syntax:</p>
+<div class="codehilite"><pre><span class="p">{</span><span 
class="n">language</span><span class="p">};{</span><span 
class="n">param</span><span class="p">}={</span><span 
class="n">value</span><span class="p">};{</span><span 
class="n">param1</span><span class="p">}={</span><span 
class="n">value1</span><span class="p">};</span>
+<span class="sx">!{language}</span>
+</pre></div>
+
+
+<p><code>{language}</code> is either the name of the language (e.g. 'en'), '*' 
for all languages or '' (empty string) for defining default parameter values 
without including all languages. Lines that do start with '!' do explicitly 
exclude a language. Those lines do not allow parameters.</p>
+<p>The following parameters are supported by the Engine:</p>
+<ul>
+<li><strong>field</strong>: The indexed field in the configured Solr index. In 
multilingual scenarios this might be the 'base name' of the field that is 
extended by a prefix or suffix to get the actual field name in the Solr index 
(see also the field encoding configuration)</li>
+<li><strong>stored</strong> (default: <em>field</em> value) : The field in the 
Solr index with the stored label information. This parameter is optional. If 
not present <code>stored</code> is assumed to be equals to 
<code>field</code>.</li>
+<li><strong>fst</strong> (default based on <em>field</em> value): Optionally 
allows to manually specify the base file name of the FST models. Those files 
are assumed within the data directory of the configured Solr index under 
<code>fst/{fst}.{lang}.fst</code>. By default the configured <code>field</code> 
name is used (with non alpha-numeric chars replaced by '_').If runtime creation 
is enabled those files will be created if not present.</li>
+<li><strong>generate</strong> (default: false): If enabled the Engine will 
generate missing FST models. If this is enabled the engine will also be able to 
update FST models after changes to the Solr Index. <strong>NOTE</strong> that 
the creation of FST models is an expensive operation (both CPU and memory 
wise). The FST engine uses a pool of low priority threads to create FST models. 
The size of the pool can be configured by using the 
<code>enhancer.engines.linking.lucenefst.fstThreadPoolSize</code> parameter. 
Because of this the default is <code>false</code>.</li>
+</ul>
+<p>A more advanced Configuration might look like:</p>
+<div class="codehilite"><pre><span class="p">;</span><span 
class="n">field</span><span class="p">=</span><span class="n">fise</span><span 
class="p">:</span><span class="n">fstTagging</span><span 
class="p">;</span><span class="n">stored</span><span class="p">=</span><span 
class="n">rdfs</span><span class="p">:</span><span class="n">label</span><span 
class="p">;</span><span class="n">generate</span><span class="p">=</span><span 
class="n">true</span>
+<span class="n">en</span>
+<span class="n">de</span>
+<span class="n">es</span>
+<span class="n">fr</span>
+<span class="n">it</span>
+</pre></div>
+
+
+<p>This would set the index field to "fise:fstTagging", the stored field to 
"rdfs:label" and allow runtime generation. It would also enable to process 
English, German, Spanish, French and Italian texts. A similar configuration 
that would build FST models for all languages would look as follows </p>
+<div class="codehilite"><pre><span class="o">*</span><span 
class="p">;</span><span class="n">field</span><span class="p">=</span><span 
class="n">fise</span><span class="p">:</span><span 
class="n">fstTagging</span><span class="p">;</span><span 
class="n">stored</span><span class="p">=</span><span class="n">rdfs</span><span 
class="p">:</span><span class="n">label</span><span class="p">;</span><span 
class="n">generate</span><span class="p">=</span><span class="n">true</span>
+</pre></div>
+
+
+<h4 id="additional-entity-information">Additional Entity Information</h4>
+<p><img alt="Additional Fields config" src="fstengine-config-addfields.png" 
title="Fields the types and rankings of entities are read from" /></p>
+<p>In addition to the URI and the labels of Entities the EntityLinking process 
also uses entity type and ranking information.</p>
+<ul>
+<li><strong>Entity Type Field</strong> 
<em>(enhancer.engines.linking.lucenefst.typeField)</em>: This field specifies 
the Solr field name holding entity type information of Entities. In case 
'SolrYard' is used as <em>Field Name Encoding</em> one can use the the QNAME of 
the property (typically 'rdf:type'). Otherwise the value must be the exact 
field name holding the type information. Values are expected to be URIs.</li>
+<li><strong>Entity Ranking Field</strong> 
<em>(enhancer.engines.linking.lucenefst.rankingField)</em>: This is an 
<strong>ADDITIONAL</strong> property used to configure the name of the Field 
storing the floating point value of the ranking for the Entity. Entities with 
higher ranking will get a slightly better <code>fise:confidence</code> value if 
labels of several Entities do match the text.</li>
+</ul>
+<p>NOTE that type and ranking information are optional.</p>
+<h3 id="runtime-fst-generation-thread-pool">Runtime FST generation Thread 
Pool</h3>
+<p>The <code>enhancer.engines.linking.lucenefst.fstThreadPoolSize</code> 
parameter can be used to configure the size of the thread pool used for the 
runtime generation of FST models. The default size of the thread pool is 
<code>1</code>. Threads do use the lowest possible priority to reduce the 
performance impact on enhancements as much as possible.</p>
+<p>When configuring the size of the thread pool users need to be aware that 
the generation of FST models does need a lot more memory as the resulting 
model. So having to manny parallel threads might require to increase the memory 
settings of the JVM. On typical machines FST creation threads will consume 100% 
CPU. That means that the number of threads should be configured to the number 
of CPU cores that can be spared for FST generation.</p>
+<p><em>NOTE</em> that the <code>generate</code> parameter of the FST Tagging 
Configuration needs to be set to <code>true</code> to enable runtime 
generation.</p>
+<h3 id="fst-storage-location">FST storage location</h3>
+<p><img alt="FST folder" src="fstengine-config-fstfolder.png" 
title="Configuration of the storage location for FST modles" /></p>
+<p>FST models are not only kept in memory but also serialized to disc. This 
avoids rebuilding the model after a restart of the Stanbol Server. By default 
the models are stored within the data folder of the SolrCore. However in some 
scenarios users might want to store FST models in a different location. This 
can be achieved by using the 
<code>enhancer.engines.linking.lucenefst.fstfolder</code> property.</p>
+<p>The configuration options does support property substitution with OSGI and 
System properties. In addition it supports the following additional properties 
(all relative to the configured SolrCore.</p>
+<ul>
+<li><code>solr-data-dir</code> : the data directory of the SolrCore</li>
+<li><code>solr-index-dir</code>: the index directory of the SolrCore</li>
+<li><code>solr-server-name</code>: the name of the <a 
href="/docs/trunk/utils/commons-solr#referencedsolrserver">ReferencedSolrServer</a>
 or <a 
href="/docs/trunk/utils/commons-solr#managedsolrserver">ManagedSolrServer</a> 
holding the SolrCore (see also [Configuration of the Solr Index]</li>
+<li><code>solr-core-name</code> : the name of the SolrCore</li>
+</ul>
+<p>The default value of this property is <code>${solr-data-dir}/fst</code>. To 
manage FST models within the Stanbol folder you can us e.g. 
<code>${sling.home}/fst/${solr-server-name}/solr-core-name</code>.</p>
+<h3 id="entity-cache-configuration">Entity Cache Configuration</h3>
+<p>While FST tagging is fully done in-memory the FST linking engine needs to 
read information of matching Entities from the Solr index. This requires disc 
IO and is typically the part of the process that consumes the most time. The 
Entity Cache tries to prevent such disc level IO by caching SolrDocuments 
containing only fields required for the linking process (labels, types and (if 
available) entity rankings).  To further reduce memory requirements only labels 
in languages requested by processed ContentItems are stored in the cache. The 
Cache uses the LRU semantic and is based on the Solr cache implementation.</p>
+<p>The size of the cache can be configured by using the 
<code>enhancer.engines.linking.lucenefst.entityCacheSize</code> parameter. The 
default size is ~65k entities. Increasing the maximum size of the cache will 
improve performance. For small and medium sized vocabularies the cache can be 
configured </p>
+<h3 id="text-processing-configuration">Text Processing Configuration</h3>
+<p>With the extension of the SolrTextTagger with a <a 
href="https://github.com/OpenSextant/SolrTextTagger/pull/7";>TaggingAttribute</a>
 the FST linking engine can support the exact same text processing 
functionality as the other Entity Linking Engine.</p>
+<p>For the configuration please see the <a 
href="entitylinking#text-processing-configuration">Text Processing 
configuration</a> section of the Entity Linking Engine.</p>
+<h3 id="entity-linking-configuration">Entity Linking Configuration</h3>
+<p>The Entity Linking Configuration of this Engine is very similar as the one 
for the <a 
href="http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entitylinking#entity-linker-configuration";>EntityLinking
 engine</a>. The configuration does use the exact same keys, but it does not 
support all properties and some do have a slightly different meaning. In the 
following only the differences are described. For the all other things please 
refer to the linked section of the documentation of the EntityLinking 
engine.</p>
+<ul>
+<li><s><strong>Label Field</strong> 
<em>(enhancer.engines.linking.labelField)</em></s>: The label field is 
<strong>IGNORED</strong> as the field holding the labels is anyway provided by 
the [FST Tagging Configuration]. That means that the field defined by the 
<em>stored</em> parameter is used. If the <em>stored</em> parameter is not 
present it fallbacks to the <em>field</em> parameter.</li>
+<li><s><strong>Type Field</strong> 
<em>(enhancer.engines.linking.typeField)</em></s>: This configuration gets 
<strong>IGNORED</strong> in favor of the 
<code>enhancer.engines.linking.lucenefst.typeField</code>. See the [Additional 
Entity Information] section for details. </li>
+<li><strong>Redirect Field</strong> 
<em>(enhancer.engines.linking.redirectField)</em></s>: Note implemented. 
<strong>NOTE</strong> This might not be possible to efficiently implement. When 
those redirects need already be considered when building the FST models.</li>
+<li><s><strong>Use EntityRankings 
(enhancer.engines.linking.useEntityRankings)_</s>: This configuration gets 
</strong>IGNORED__. EntityRanking based sorting is enabled as soon as the 
<em>Entity Ranking Field</em> is configured.</li>
+<li><s><strong>Lemma based Matching</strong> 
<em>(enhancer.engines.linking.lemmaMatching)</em></s>: Not Yet implemented</li>
+<li><s><strong>Min Match Score</strong> 
<em>(enhancer.engines.linking.minMatchScore)</em></s>: Not Yet Implemented. 
Currently all linked Entities are added regardless of their score. However the 
way the Tagging is done makes it very unlikely to have suggestions with 
<code>fise:confidence</code> values less as 0.5.</li>
+</ul>
+<p>In addition the following properties are <strong>IGNORED</strong> as they 
are not relevant for the FST Linking Engine:</p>
+<ul>
+<li><s><strong>Max Search Token Distance</strong> 
<em>(enhancer.engines.linking.maxSearchTokenDistance)</em></s></li>
+<li><s><strong>Max Search Tokens</strong> 
<em>(enhancer.engines.linking.maxSearchTokens)</em></s></li>
+<li><s><strong>Min Matched Tokens</strong> 
<em>(enhancer.engines.linking.minFoundTokens)</em></s></li>
+<li><s><strong>Min Text Score</strong> 
<em>(enhancer.engines.linking.minTextScore)</em></s></li>
+</ul>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are 
trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>
+


Reply via email to