Author: schor
Date: Fri Nov 22 15:29:22 2019
New Revision: 1870165

URL: http://svn.apache.org/viewvc?rev=1870165&view=rev
Log:
no Jira, add a 

Added:
    uima/site/trunk/uima-website/docs/doc-uimaj-cookbook.html
    uima/site/trunk/uima-website/xdocs/doc-uimaj-cookbook.xml

Added: uima/site/trunk/uima-website/docs/doc-uimaj-cookbook.html
URL: 
http://svn.apache.org/viewvc/uima/site/trunk/uima-website/docs/doc-uimaj-cookbook.html?rev=1870165&view=auto
==============================================================================
--- uima/site/trunk/uima-website/docs/doc-uimaj-cookbook.html (added)
+++ uima/site/trunk/uima-website/docs/doc-uimaj-cookbook.html Fri Nov 22 
15:29:22 2019
@@ -0,0 +1,536 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
"https://www.w3.org/TR/html4/loose.dtd";>
+
+
+    <!-- 
====================================================================== -->
+    <!-- GENERATED FILE, DO NOT EDIT, EDIT THE XML FILE IN xdocs INSTEAD! -->
+    <!-- 
====================================================================== -->
+    <html>
+        <head>
+            <meta http-equiv="Content-Type" content="text/html; 
charset=iso-8859-1"/>
+            <style type="text/css">@import "stylesheets/base.css";</style>
+                                          <meta name="author" value="
+                       Apache UIMA Team
+               ">
+  <meta name="email" value="d...@uima.apache.org">
+                        
+            
+                        
+                        <title>Apache UIMA - Cookbook: addressing some typical 
use-cases</title>
+            
+            <!-- Begin Cookie Consent plugin by Silktide - 
https://silktide.com/cookieconsent -->
+            <!-- Commented out because implied consent is not compatible with 
GDPR -->
+            <!--
+            <script type="text/javascript">
+                window.cookieconsent_options = {"message":"This website uses 
cookies to ensure you get the best experience on our website","dismiss":"Got 
it!","learnMore":"More 
info","link":"https://uima.apache.org/privacy-policy.html","theme":"dark-bottom"};
+            </script>
+            
+            <script type="text/javascript" 
src="/cookieconsent2/cookieconsent.min.js"></script>
+            -->
+            <!-- End Cookie Consent plugin -->
+            
+            <!-- Begin Google Analytics -->
+            <!-- Commented out because GA requires consent according to GDPR 
-->
+            <!--
+            <script>
+              
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+              (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+              
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+              
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+            
+              ga('create', 'UA-70846351-1', 'auto');
+              ga('set', 'anonymizeIp', true);
+              ga('send', 'pageview');
+            
+            </script>
+            -->
+            <!-- End Google Analytics -->
+        </head>
+
+        <body>
+          <div class="topLogos">        
+            <table border="0" width="100%" cellspacing="0">
+                <!-- TOP IMAGE -->
+                <tr>
+                    <td align='LEFT'>
+                      <a href="index.html">
+                                    <img style="border: 1px solid black;" 
src="./images/UIMA_banner2tlpTm.png" alt="UIMA project logo" border="0"/>
+                            </a>
+                    </td>
+                    <td align='CENTER'>
+                          <div class="pageBanner">Cookbook: addressing some 
typical use-cases</div>
+                    </td>
+                    <td align='RIGHT'>
+                                  <a href="https://www.apache.org";>
+        <img src="./images/asf-logo-on-white-smallTm.png" alt="Apache UIMA" 
border="0"/>
+      </a>
+                          </td>
+                </tr>
+            </table>
+            <hr noshade="" size="1"/>
+            </div>
+            <table border="0" width="100%" cellspacing="4">
+              <tr>
+                <td align='RIGHT' colspan="2">
+                  <form method="get" action="https://www.google.com/search";>
+                    Search the site
+                    <input type="text"   name="q" size="25" maxlength="255" 
value="" />
+                    <input type="hidden" name="sitesearch" 
value="https://uima.apache.org/"; />
+                    <input name="Search" value="Search Site" type="submit"/>
+                  </form>
+                </td>
+              </tr>
+              <tr> <!-- LEFT SIDE NAVIGATION -->
+                <td width="20%" valign="top">
+
+
+
+
+
+
+                   <!-- regular menu -->
+                      <div class="navBar">
+                  <br/>
+            <div class="navBarItem">      <div 
class="navPartHeading">General</div>
+                </div>
+                <div class="navBar">
+                  <div class="navBarItem">      <a href="./index.html">Home</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./downloads.cgi">Downloads</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./documentation.html">Documentation</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./news.html">News</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./publications.html">Publications</a>
+                </div>
+                    <br style="line-height: .5em"/>
+                          <div class="navBarItem">      <a 
href="https://issues.apache.org/jira/browse/uima"; target="_blank" 
rel="noopener">Issue tracker <img src="images/offsitelink.png"/></a>
+                </div>
+                          <div class="navBarItem">      <a 
href="https://cwiki.apache.org/confluence/display/UIMA/"; target="_blank" 
rel="noopener">Wiki <img src="images/offsitelink.png"/></a>
+                </div>
+                    <br style="line-height: .5em"/>
+                          <div class="navBarItem">      <a 
href="https://cwiki.apache.org/confluence/display/UIMA/Powered+by+Apache+UIMA"; 
target="_blank" rel="noopener">Powered By UIMA <img 
src="images/offsitelink.png"/></a>
+                </div>
+            </div>
+                      <br/>
+            <div class="navBarItem">      <div 
class="navPartHeading">Community</div>
+                </div>
+                <div class="navBar">
+                  <div class="navBarItem">      <a 
href="./get-involved.html">Get Involved</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./mail-lists.html">Mailing Lists</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./contribution-policy.html">Contribution Policies</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./faq.html">FAQ</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./project-guidelines.html">Project Guidelines</a>
+                </div>
+            </div>
+                      <br/>
+            <div class="navBarItem">      <div class="navPartHeading">Scaleout 
Frameworks</div>
+                </div>
+                <div class="navBar">
+                  <div class="navBarItem">      <a 
href="./doc-uimaas-what.html">UIMA-AS</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./doc-uimaducc-whatitam.html">UIMA-DUCC</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./doc-uimaducc-demo.html">..Demo Page</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="http://uima-ducc-demo.apache.org:42133"; target="_blank" 
rel="noopener">..Demo Live <img src="images/offsitelink.png"/></a>
+                </div>
+            </div>
+                      <br/>
+            <div class="navBarItem">      <div 
class="navPartHeading">Components & Tools</div>
+                </div>
+                <div class="navBar">
+                  <div class="navBarItem">      <a 
href="./sandbox.html#uima-addons-annotators">Annotators</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./toolsServers.html">Tools & Servers</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./sandbox.html">Addons and Sandbox</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./ruta.html">UIMA Ruta</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./uimafit.html">uimaFIT</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./external-resources.html">External Resources</a>
+                </div>
+            </div>
+                      <br/>
+            <div class="navBarItem">      <div 
class="navPartHeading">Development</div>
+                </div>
+                <div class="navBar">
+                  <div class="navBarItem">      <a 
href="./dev-quick.html">Quick Start: building</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./building-uima.html">Building from Source</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./one-time-setup.html">One-time setups</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./svn.html">Source Code</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./distribution.html">Creating a Distribution</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./release.html">Doing a UIMA release</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="https://www.apache.org/security/committers.html"; target="_blank" 
rel="noopener">Doing a CVE (Apache) <img src="images/offsitelink.png"/></a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./eclipse-update-site.html">Eclipse Update Sites</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./git.html">GIT</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./codeConventions.html">Code Conventions</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./uima-specification.html">UIMA Specification (OASIS)</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./team-list.html">Project Team</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./maven-design.html">Maven Use</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./updating-website.html">Updating this Website</a>
+                </div>
+            </div>
+                      <br/>
+            <div class="navBarItem">      <div class="navPartHeading">Events 
and Conferences</div>
+                </div>
+                <div class="navBar">
+                  <div class="navBarItem">      <a 
href="./coling14.html">COLING 2014</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./gscl13.html">GSCL 2013</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./iks09.html">IKS 2009</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./gscl09.html">GSCL 2009</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./lsm09.html">LSM 2009</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./lrec08.html">LREC 2008</a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./gldv07.html">GLDV 2007</a>
+                </div>
+            </div>
+                      <br/>
+            <div class="navBarItem">      <div class="navPartHeading">ASF</div>
+                </div>
+                <div class="navBar">
+                  <div class="navBarItem">      <a 
href="https://www.apache.org/licenses/"; target="_blank" rel="noopener">License 
<img src="images/offsitelink.png"/></a>
+                </div>
+                          <div class="navBarItem">      <a 
href="https://www.apache.org/foundation/thanks.html"; target="_blank" 
rel="noopener">ASF Sponsors <img src="images/offsitelink.png"/></a>
+                </div>
+                          <div class="navBarItem">      <a 
href="https://www.apache.org/foundation/sponsorship.html"; target="_blank" 
rel="noopener">ASF Sponsorship <img src="images/offsitelink.png"/></a>
+                </div>
+                          <div class="navBarItem">      <a 
href="./security_report">Security</a>
+                </div>
+            </div>
+        </div>
+                </td>
+                <td width="80%" align="left" valign="top">
+                                                          <div 
class="sectionTable">
+      <table class="sectionTable">
+        <tr><td>
+        <a name="Working with Feature Structures"><h1><img 
src="images/UIMA_4sq50tightCropSolid.png"/>&nbsp;Working with Feature 
Structures</h1></a>
+      </td></tr>
+      <tr><td>
+        <blockquote class="sectionBody">
+                                    <p>These work with all kinds of Feature 
Structures, Annotations and non-Annotations, both.</p>
+                                                      <table 
class="subsectionTable">
+        <tr><td>
+       
+       
+       
+          <a name="Remove all Feature Structures of a particular type">
+            <h2>Remove all Feature Structures of a particular type
+                        </h2>
+          </a>
+      </td></tr>
+      <tr><td>
+        <blockquote class="subsectionBody">
+                                    <p>There are built-in methods to do this, 
over all indexes in a particular view.  There are 2 variations:
+          <ul><li>remove all including the subtypes of the type
+               <pre>myJCasView.removeAllIncludingSubtypes(Foo.type)</pre>
+          </li>
+          <li>remove all excluding the subtypes of the type
+              
<pre>myJCasView.removeAllExcludingSubtypes(Foo.type)</pre></li></ul>
+        </p>
+                                                <p>Both of these are much 
faster than iterating over the Feature Structures; they directly clear the 
associated indexes.</p>
+                            </blockquote>
+        </td></tr>
+    </table>
+                                                      <table 
class="subsectionTable">
+        <tr><td>
+       
+       
+       
+          <a name="General suggestions: working with iterators">
+            <h2>General suggestions: working with iterators
+                        </h2>
+          </a>
+      </td></tr>
+      <tr><td>
+        <blockquote class="subsectionBody">
+                                    <p>Many times code will iterate over all 
instances of a type, and only do something with a subset.
+         Frequently, the iteration can be cut short, by starting near the spot 
of interest and stopping as soon
+         as it can be determined that no further iteration will find 
interesting Annotations.</p>
+                                                <p>Example: Let's say you have 
a "token" annotation, and want to find the "sentence" that contains it.
+         You could write an iterator over all sentences.  
+      </p>
+                                                <h3>Stop early</h3>
+                                                <p>
+         When you find the first sentence that overlaps the token, you can use 
extra knowledge that you might have,
+         such as: there's only one sentence per token, to conclude that having 
found it, there's no need to do any
+         further iteration, so you can stop the iteration. 
+       </p>
+                                                <p>Furthermore, if the token 
appears outside of any sentence, you can similarly stop the iteration, and 
return
+          an "empty" result, as soon as the test sentence begins after the 
token's "begin".
+          This is because, at that point, due to the sorting of the returned 
values, no future sentences could
+          start before or equal to the token's begin.
+       </p>
+                                                <h3>Begin closer to the right 
spot, maybe iterate backwards</h3>
+                                                <p>But you can do better.</p>
+                                                <p>You can start the 
iteration, instead of at the beginning, at the position of the token, and 
iterate backwards.
+       Iterators have a moveTo() method which takes a feature structure 
argument, so you can moveTo(the-token), 
+       and then perhaps with some edge adjustment for equality, start 
iterating backwards, looking for the sentence at that
+       position that covers the token.  
+       </p>
+                                                <p>If you are iterating 
backwards, and looking for a "covering" annotation, and know the largest span 
for that
+       covering type, then you can stop iterating as soon as the start 
position you reach, + the largest span, is less than
+       the start of the annotation you're trying to cover.</p>
+                                                <p 
style="margin-left:1rem">This is used internally in version 3's 
+           <a target="_blank" rel="noopener" 
href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select.annot.subselect";>select
 framework</a>
+       to speed up 
+       the <code>covering</code> kind of iteration.</p>
+                                                <p>There are many other 
examples, but the principle is the same: start the iteration "close to" the 
right spot, 
+          perhaps moving backwards instead of forwards, and end the iteration 
as soon as you can logically say that
+          no more suitable feature structures would be found. </p>
+                                                <h3>Use UIMA Version 3's 
select framework</h3>
+                                                <p>The <a target="_blank" 
rel="noopener" 
href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select";>select
 framework</a> 
+           incorporates many of the popular use cases for doing iterations 
that we've seen, into a Java friendly approach that
+           automatically uses optimized iterators and can produce Java 
Streams, as well.</p>
+                            </blockquote>
+        </td></tr>
+    </table>
+                            </blockquote>
+        </p>
+      </td></tr>
+    </table>
+                                        <div class="sectionTable">
+      <table class="sectionTable">
+        <tr><td>
+        <a name="Working with Annotations"><h1><img 
src="images/UIMA_4sq50tightCropSolid.png"/>&nbsp;Working with 
Annotations</h1></a>
+      </td></tr>
+      <tr><td>
+        <blockquote class="sectionBody">
+                                          <ul>
+          <li><a href='#Watch out for type-priorites'>
+                  Watch out for type-priorites
+        
+                </a></li>
+          <li><a href='#Annotation containment'>
+                  Annotation containment
+        
+                </a></li>
+          <li><a href='#Adjusting an existing annotation's begin and end'>
+                  Adjusting an existing annotation's begin and end
+        
+                </a></li>
+          <li><a href='#Avoid where possible, copying sets of Feature 
Structures'>
+                  Avoid where possible, copying sets of Feature Structures
+        
+                </a></li>
+        </ul>
+                                                  <p>
+                               The CAS holds Feature Structures (FSs).  There 
is special support for FSs which are a subtype of Annotation;
+                               these have an associated Subject of Analysis 
(Sofa) and <code>begin</code> and <code>end</code> offsets. 
+                       </p>
+                                                <h3>Annotations are not 
required in all cases</h3>
+                                                <p>If your application deals 
with a different kind of unstructured data, say, for instance, images, then
+                            Annotations may not be the appropriate supertype 
for your types, because they're designed for 
+                            things having a linear begin / end meaningful 
demarcations. </p>
+                                                <p>You can have your feature 
structures inherit from TOP, or from some other appropriate supertype, other
+                            than Annotation.</p>
+                                                <h3>Making use of the built-in 
Annotation index</h3>
+                                                <p>Annotations are special in 
UIMA in that there is a "built-in" index, the AnnotationIndex, which can be used
+                          to rapidly access these in a sorted order.  The 
ordering is by <code>begin</code> (ascending), then by
+                          <code>end</code> (descending), and then by 
type-priorities.</p>
+                                                <p 
style="margin-left:1rem"><i>This is really a set of indexes, one for each 
subtype of Annotation.</i></p>
+                                                <p 
style="margin-left:1rem"><i>Although the index has type-priorities, in UIMA v3, 
the <code>select-framework</code>
+                         by default ignores these; this behavior can be 
overridden.</i></p>
+                                                      <table 
class="subsectionTable">
+        <tr><td>
+       
+       
+       
+          <a name="Watch out for type-priorites">
+            <h2>Watch out for type-priorites
+                        </h2>
+          </a>
+      </td></tr>
+      <tr><td>
+        <blockquote class="subsectionBody">
+                                    <p>When 2 annotations have the same start 
and end, but different types, then one comes before the other,
+                            according to type priorites.  This is intended to 
allow you to say if you have a Sentence annotation, and a 
+                            Foo annotation, both covering the same span, to 
declare that the Sentence logically contains Foo, and not the 
+                            other way around.</p>
+                                                <p>To make this work, you need 
to specify the type priorities. This is a global setting for your application.
+                            See 
+                            <a target="_blank" rel="noopener" 
href="http://uima.apache.org/d/uimaj-current/references.html#ugr.ref.xml.component_descriptor.aes.primitive";>
+                              type priorities</a> (scroll down to find it) for 
how to specify this.</p>
+                                                <h3>Avoiding type 
priorities</h3>
+                                                <p>Often, the use of type 
priorities gets in the way.  With UIMA Version 3, the 
+                            <a target="_blank" rel="noopener" 
href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select";>select
 framework</a>
+           by default ignores type priorites when doing its operations; but 
this can be overridden as needed.</p>
+                            </blockquote>
+        </td></tr>
+    </table>
+                                                      <table 
class="subsectionTable">
+        <tr><td>
+       
+       
+       
+          <a name="Annotation containment">
+            <h2>Annotation containment
+                        </h2>
+          </a>
+      </td></tr>
+      <tr><td>
+        <blockquote class="subsectionBody">
+                                    <h3>a contains b</h3>
+                                                <ul><li>Ignoring type 
priorities:</li></ul>
+                                                <pre>a != null &amp;&amp; b != 
null &amp;&amp;       // null check
+a.getBegin() &lt;= b.getBegin() &amp;&amp; // a starts before (or equal to) b 
+a.getEnd() &gt;= b.getEnd()        // a ends after (or equal to) b</pre>
+                                                <h3>a and b overlap (have at 
least one char in common)</h3>
+                                                <pre>
+                                    // ((omitted) check for non-null)
+if (a.getBegin() &lt;= b.getBegin()) { // if a starts before (or equal to) b
+  return a.getEnd() &gt; b.getBegin(); // then it overlaps if a's end is after 
b's begin
+} else {                            // otherwise, b's begin is before a's begin
+  return b.getEnd() &gt; a.getBegin(); // so it overlaps if b's end is after 
a's begin.
+</pre>
+                            </blockquote>
+        </td></tr>
+    </table>
+                                                      <table 
class="subsectionTable">
+        <tr><td>
+       
+       
+       
+          <a name="Adjusting an existing annotation's begin and end">
+            <h2>Adjusting an existing annotation's begin and end
+                        </h2>
+          </a>
+      </td></tr>
+      <tr><td>
+        <blockquote class="subsectionBody">
+                                    <p>Sometimes, your code may want to adjust 
an annotations begin and end values.
+                            If the annotation is not indexed, there's no issue 
- just change the value.
+                            But if it is indexed, it's in index(es) in a 
position determined by its begin and end position, so if you 
+                            change these, the item needs to be reindexed (in 
all the indexes holding it).  Typically, only one index
+                            (the Annotation Index for a particular CAS View) 
is involved, but in general, there could be multiple
+                            indexes involved.</p>
+                                                <p>If you are using UIMA 
version 2.7.0 or later, the UIMA 
+                            <a target="_blank" rel="nopener" 
href="https://uima.apache.org/d/uimaj-current/references.html#ugr.ref.cas.updating_indexed_feature_structures";>framework</a>
 
+                               detects updates that would need this 
re-indexing, and
+                               automatically removes the Feature Structure 
from all involved index(es), updates the Feature, and then adds the Feature 
Structure back to the index(es).
+                          </p>
+                                                <p>You can improve the 
efficiency of this, if you are updating, say, both the begin and end value of 
an annotation, by
+                             doing this yourself, in your code.
+                            <ul><li>Removing the item from the index(es)</li>
+                                <li>Doing both updates</li>
+                                <li>Adding the item back into the 
index(es)</li></ul>.
+               More details <a target="_blank" rel="nopener" 
href="https://uima.apache.org/d/uimaj-current/references.html#ugr.ref.cas.updating_indexed_feature_structures";>here</a>.
+                                </p>
+                                                <p>Example: if you know a 
particular annotation is only indexed in one view, 
+                           then you can update it's begin and end features 
using
+                           <pre>a.<b>removeFsFromIndexes</b>();
+                           
+  a.setBegin(new_value_begin);
+  a.setEnd(new_value_end);
+  
+a.<b>addToIndexes</b>();</pre>
+This is the most efficient way to do this.
+                           </p>
+                                                <p>There's a couple of special 
forms you can use to protect indexes while you're updating features used as 
keys.
+                          This is useful when you're not sure what feature 
values might be used as keys in some index.
+                          <pre>
+try (AutoCloseable ac = my_cas.<b>protectIndexes</b>()) {
+   // ...  arbitrary user code which updates features 
+   //      which may be "keys" in one or more indexes, e.g.
+   
+   a.setBegin(new_value_begin);
+   a.setEnd(new_value_end); 
+}</pre>
+or
+<pre>
+my_cas.<b>protectIndexes</b>(() -&gt; {
+   // ... arbitrary user code updating "key" features, 
+   //     but no checked exceptions are permitted
+   //     (because inside a lambda)
+   
+   a.setBegin(new_value_begin);
+   a.setEnd(new_value_end);
+   });</pre>
+   These use the frameworks automatic detection mechanism, and removes Feature 
Structures from all involved indexes
+   if needed, but delays adding them back, until the end of the protected 
section.
+                              </p>
+                            </blockquote>
+        </td></tr>
+    </table>
+                                                      <table 
class="subsectionTable">
+        <tr><td>
+       
+       
+       
+          <a name="Avoid where possible, copying sets of Feature Structures">
+            <h2>Avoid where possible, copying sets of Feature Structures
+                        </h2>
+          </a>
+      </td></tr>
+      <tr><td>
+        <blockquote class="subsectionBody">
+                                    <p>Operations which iterate over Feature 
Structures, and put them into a Collection or List, and then 
+                            iterate over that list to do some other 
operations, can often be done directly on the Feature Structures in the CAS,
+                            omitting the first copying of them into a list.
+                         </p>
+                                                <p>A frequent speedup can 
happen when the particular logic can detect when no further items in a (sorted) 
index
+                            are needed, and the iteration can be stopped 
early.</p>
+                                                <p>For example, you might have 
code which iterates over all feature structures of a particular type, and puts 
these into a list,
+                            and then goes thru the list, and picks out certain 
ones and put those into another list, which is then returned.
+                         </p>
+                                                <p>The first copying can be 
omitted, by moving the logic of what to include into the first iteration, and 
producing the second
+                            list directly.</p>
+                                                <p>In UIMA Version 3, you can 
make use of the <a target="_blank" rel="noopener" 
href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select";>select
 framework</a>.
+                            It already has many of the use-cases where you 
might want to start or exit an iteration, accounted for.
+                            You can also use its ability to produce streams, 
and combine that with Java's takeWhile method, to exit a stream early.
+                         </p>
+                            </blockquote>
+        </td></tr>
+    </table>
+                            </blockquote>
+        </p>
+      </td></tr>
+    </table>
+                                  </td>
+                </tr>
+                <!-- FOOTER -->
+                <tr><td colspan="2">
+                  <hr noshade="" size="1"/>
+                </td></tr>
+                <tr><td colspan="2"> 
+                  <table class="pageFooter">
+                    <tr>
+                      <td><a href="index.html">Home</a></td>
+                      <td><a href="privacy-policy.html">Privacy Policy</a></td>
+                      <td style="font-size:75%">
+                Copyright &#169; 2006-2013, The Apache Software 
Foundation.<br/>
+                Apache UIMA, UIMA, the Apache UIMA logo and the Apache Feather 
logo are trademarks of The Apache Software Foundation.<br/>
+                All other marks mentioned may be trademarks or registered 
trademarks of their respective owners.
+                      </td>
+                      <td><a href="mailto:d...@uima.apache.org";>Contact 
us</a></td>
+                    </tr>
+                  </table>                    
+                </td></tr>
+            </table>
+        </body>
+    </html>
+

Added: uima/site/trunk/uima-website/xdocs/doc-uimaj-cookbook.xml
URL: 
http://svn.apache.org/viewvc/uima/site/trunk/uima-website/xdocs/doc-uimaj-cookbook.xml?rev=1870165&view=auto
==============================================================================
--- uima/site/trunk/uima-website/xdocs/doc-uimaj-cookbook.xml (added)
+++ uima/site/trunk/uima-website/xdocs/doc-uimaj-cookbook.xml Fri Nov 22 
15:29:22 2019
@@ -0,0 +1,252 @@
+<?xml version="1.0" encoding="ISO-8859-1"?>
+
+<!--
+       Licensed to the Apache Software Foundation (ASF) under one
+       or more contributor license agreements.  See the NOTICE file
+       distributed with this work for additional information
+       regarding copyright ownership.  The ASF licenses this file
+       to you under the Apache License, Version 2.0 (the
+       "License"); you may not use this file except in compliance
+       with the License.  You may obtain a copy of the License at
+       
+       https://www.apache.org/licenses/LICENSE-2.0
+       
+       Unless required by applicable law or agreed to in writing,
+       software distributed under the License is distributed on an
+       "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+       KIND, either express or implied.  See the License for the
+       specific language governing permissions and limitations
+       under the License.
+-->
+
+<document>
+
+       <properties>
+               <title>Cookbook: addressing some typical use-cases</title>
+               <author email="d...@uima.apache.org">
+                       Apache UIMA Team
+               </author>
+       </properties>
+
+       <body>
+       
+         <section name="Working with Feature Structures">
+           <p>These work with all kinds of Feature Structures, Annotations and 
non-Annotations, both.</p>
+          
+           <subsection name="Remove all Feature Structures of a particular 
type">
+        <p>There are built-in methods to do this, over all indexes in a 
particular view.  There are 2 variations:
+          <ul><li>remove all including the subtypes of the type
+               <pre>myJCasView.removeAllIncludingSubtypes(Foo.type)</pre>
+          </li>
+          <li>remove all excluding the subtypes of the type
+              
<pre>myJCasView.removeAllExcludingSubtypes(Foo.type)</pre></li></ul>
+        </p>
+        <p>Both of these are much faster than iterating over the Feature 
Structures; they directly clear the associated indexes.</p>
+      </subsection>
+      
+      <subsection name="General suggestions: working with iterators">
+
+            
+      <p>Many times code will iterate over all instances of a type, and only 
do something with a subset.
+         Frequently, the iteration can be cut short, by starting near the spot 
of interest and stopping as soon
+         as it can be determined that no further iteration will find 
interesting Annotations.</p>
+         
+      <p>Example: Let's say you have a "token" annotation, and want to find 
the "sentence" that contains it.
+         You could write an iterator over all sentences.  
+      </p>
+      <h3>Stop early</h3>
+        <p>
+         When you find the first sentence that overlaps the token, you can use 
extra knowledge that you might have,
+         such as: there's only one sentence per token, to conclude that having 
found it, there's no need to do any
+         further iteration, so you can stop the iteration. 
+       </p>
+       
+       <p>Furthermore, if the token appears outside of any sentence, you can 
similarly stop the iteration, and return
+          an "empty" result, as soon as the test sentence begins after the 
token's "begin".
+          This is because, at that point, due to the sorting of the returned 
values, no future sentences could
+          start before or equal to the token's begin.
+       </p>
+       
+       <h3>Begin closer to the right spot, maybe iterate backwards</h3>
+       
+       <p>But you can do better.</p>
+       <p>You can start the iteration, instead of at the beginning, at the 
position of the token, and iterate backwards.
+       Iterators have a moveTo() method which takes a feature structure 
argument, so you can moveTo(the-token), 
+       and then perhaps with some edge adjustment for equality, start 
iterating backwards, looking for the sentence at that
+       position that covers the token.  
+       </p>
+       
+       <p>If you are iterating backwards, and looking for a "covering" 
annotation, and know the largest span for that
+       covering type, then you can stop iterating as soon as the start 
position you reach, + the largest span, is less than
+       the start of the annotation you're trying to cover.</p>
+       
+       <p style="margin-left:1rem">This is used internally in version 3's 
+           <a target="_blank" rel="noopener"
+           
href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select.annot.subselect";>select
 framework</a>
+       to speed up 
+       the <code>covering</code> kind of iteration.</p>
+       
+       <p>There are many other examples, but the principle is the same: start 
the iteration "close to" the right spot, 
+          perhaps moving backwards instead of forwards, and end the iteration 
as soon as you can logically say that
+          no more suitable feature structures would be found. </p>
+      
+       <h3>Use UIMA Version 3's select framework</h3>
+       <p>The <a target="_blank" rel="noopener"
+           
href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select";>select
 framework</a> 
+           incorporates many of the popular use cases for doing iterations 
that we've seen, into a Java friendly approach that
+           automatically uses optimized iterators and can produce Java 
Streams, as well.</p>    
+      </subsection>
+      
+         </section>
+               
+               <section
+                       name="Working with Annotations">
+                       
+                           <subsectionToc/>
+                       <p>
+                               The CAS holds Feature Structures (FSs).  There 
is special support for FSs which are a subtype of Annotation;
+                               these have an associated Subject of Analysis 
(Sofa) and <code>begin</code> and <code>end</code> offsets. 
+                       </p>
+                       
+                       <h3>Annotations are not required in all cases</h3>
+                         <p>If your application deals with a different kind of 
unstructured data, say, for instance, images, then
+                            Annotations may not be the appropriate supertype 
for your types, because they're designed for 
+                            things having a linear begin / end meaningful 
demarcations. </p>
+                         <p>You can have your feature structures inherit from 
TOP, or from some other appropriate supertype, other
+                            than Annotation.</p>
+                       
+                       <h3>Making use of the built-in Annotation index</h3>    
    
+                       <p>Annotations are special in UIMA in that there is a 
"built-in" index, the AnnotationIndex, which can be used
+                          to rapidly access these in a sorted order.  The 
ordering is by <code>begin</code> (ascending), then by
+                          <code>end</code> (descending), and then by 
type-priorities.</p>
+                       <p style="margin-left:1rem"><i>This is really a set of 
indexes, one for each subtype of Annotation.</i></p>
+                       <p style="margin-left:1rem"><i>Although the index has 
type-priorities, in UIMA v3, the <code>select-framework</code>
+                         by default ignores these; this behavior can be 
overridden.</i></p>  
+                         
+                       <subsection name="Watch out for type-priorites">
+                         <p>When 2 annotations have the same start and end, 
but different types, then one comes before the other,
+                            according to type priorites.  This is intended to 
allow you to say if you have a Sentence annotation, and a 
+                            Foo annotation, both covering the same span, to 
declare that the Sentence logically contains Foo, and not the 
+                            other way around.</p>
+                            
+                         <p>To make this work, you need to specify the type 
priorities. This is a global setting for your application.
+                            See 
+                            <a target="_blank" rel="noopener"
+                              
href="http://uima.apache.org/d/uimaj-current/references.html#ugr.ref.xml.component_descriptor.aes.primitive";>
+                              type priorities</a> (scroll down to find it) for 
how to specify this.</p>
+                              
+                         <h3>Avoiding type priorities</h3>
+                         <p>Often, the use of type priorities gets in the way. 
 With UIMA Version 3, the 
+                            <a target="_blank" rel="noopener"
+           
href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select";>select
 framework</a>
+           by default ignores type priorites when doing its operations; but 
this can be overridden as needed.</p>        
+                       </subsection>  
+     
+      <subsection name="Annotation containment">
+        <h3>a contains b</h3>
+        <ul><li>Ignoring type priorities:</li></ul>
+        <pre>a != null &amp;&amp; b != null &amp;&amp;       // null check
+a.getBegin() &lt;= b.getBegin() &amp;&amp; // a starts before (or equal to) b 
+a.getEnd() &gt;= b.getEnd()        // a ends after (or equal to) b</pre>
+        
+        <h3>a and b overlap (have at least one char in common)</h3>
+        <pre>
+                                    // ((omitted) check for non-null)
+if (a.getBegin() &lt;= b.getBegin()) { // if a starts before (or equal to) b
+  return a.getEnd() &gt; b.getBegin(); // then it overlaps if a's end is after 
b's begin
+} else {                            // otherwise, b's begin is before a's begin
+  return b.getEnd() &gt; a.getBegin(); // so it overlaps if b's end is after 
a's begin.
+</pre>    
+                       </subsection>
+                       
+                       
+                       <subsection name="Adjusting an existing annotation's 
begin and end">
+                         <p>Sometimes, your code may want to adjust an 
annotations begin and end values.
+                            If the annotation is not indexed, there's no issue 
- just change the value.
+                            But if it is indexed, it's in index(es) in a 
position determined by its begin and end position, so if you 
+                            change these, the item needs to be reindexed (in 
all the indexes holding it).  Typically, only one index
+                            (the Annotation Index for a particular CAS View) 
is involved, but in general, there could be multiple
+                            indexes involved.</p>
+                            
+                         <p>If you are using UIMA version 2.7.0 or later, the 
UIMA 
+                            <a target="_blank" rel="nopener"
+                               
href="https://uima.apache.org/d/uimaj-current/references.html#ugr.ref.cas.updating_indexed_feature_structures";>framework</a>
 
+                               detects updates that would need this 
re-indexing, and
+                               automatically removes the Feature Structure 
from all involved index(es), updates the Feature, and then adds the Feature 
Structure back to the index(es).
+                          </p> 
+                          
+                          <p>You can improve the efficiency of this, if you 
are updating, say, both the begin and end value of an annotation, by
+                             doing this yourself, in your code.
+                            <ul><li>Removing the item from the index(es)</li>
+                                <li>Doing both updates</li>
+                                <li>Adding the item back into the 
index(es)</li></ul>.
+               More details <a target="_blank" rel="nopener"
+              
href="https://uima.apache.org/d/uimaj-current/references.html#ugr.ref.cas.updating_indexed_feature_structures";>here</a>.
+                                </p> 
+                                
+                        <p>Example: if you know a particular annotation is 
only indexed in one view, 
+                           then you can update it's begin and end features 
using
+                           <pre>a.<b>removeFsFromIndexes</b>();
+                           
+  a.setBegin(new_value_begin);
+  a.setEnd(new_value_end);
+  
+a.<b>addToIndexes</b>();</pre>
+This is the most efficient way to do this.
+                           </p>         
+                                
+                          <p>There's a couple of special forms you can use to 
protect indexes while you're updating features used as keys.
+                          This is useful when you're not sure what feature 
values might be used as keys in some index.
+                          <pre>
+try (AutoCloseable ac = my_cas.<b>protectIndexes</b>()) {
+   // ...  arbitrary user code which updates features 
+   //      which may be "keys" in one or more indexes, e.g.
+   
+   a.setBegin(new_value_begin);
+   a.setEnd(new_value_end); 
+}</pre>
+or
+<pre>
+my_cas.<b>protectIndexes</b>(() -> {
+   // ... arbitrary user code updating "key" features, 
+   //     but no checked exceptions are permitted
+   //     (because inside a lambda)
+   
+   a.setBegin(new_value_begin);
+   a.setEnd(new_value_end);
+   });</pre>
+   These use the frameworks automatic detection mechanism, and removes Feature 
Structures from all involved indexes
+   if needed, but delays adding them back, until the end of the protected 
section.
+                              </p>      
+                             
+                       </subsection>
+                       
+                       <subsection name="Avoid where possible, copying sets of 
Feature Structures">
+                       
+                         <p>Operations which iterate over Feature Structures, 
and put them into a Collection or List, and then 
+                            iterate over that list to do some other 
operations, can often be done directly on the Feature Structures in the CAS,
+                            omitting the first copying of them into a list.
+                         </p>
+                         
+                         <p>A frequent speedup can happen when the particular 
logic can detect when no further items in a (sorted) index
+                            are needed, and the iteration can be stopped 
early.</p>
+                         
+                         <p>For example, you might have code which iterates 
over all feature structures of a particular type, and puts these into a list,
+                            and then goes thru the list, and picks out certain 
ones and put those into another list, which is then returned.
+                         </p>
+                         
+                         <p>The first copying can be omitted, by moving the 
logic of what to include into the first iteration, and producing the second
+                            list directly.</p>
+                            
+                         <p>In UIMA Version 3, you can make use of the <a 
target="_blank" rel="noopener"
+           
href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select";>select
 framework</a>.
+                            It already has many of the use-cases where you 
might want to start or exit an iteration, accounted for.
+                            You can also use its ability to produce streams, 
and combine that with Java's takeWhile method, to exit a stream early.
+                         </p>     
+                            
+                       </subsection>
+               </section>
+       </body>
+
+</document>
+


Reply via email to