docs...

buildbot Mon, 04 Apr 2016 02:24:13 -0700

Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Cookbook.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Cookbook.html 
(added)
+++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Cookbook.html Mon 
Apr  4 09:23:29 2016
@@ -0,0 +1,140 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::Cookbook â Apache Lucy Documentation</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache 
Lucyâ¢"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/"; title="Apache Software 
Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/"; 
title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html"; 
title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html"; 
title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " 
title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/";>Apache</a>&nbsp;&raquo&nbsp;<a 
href="/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/">Perl</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/Lucy/Docs/">Docs</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" 
action="http://www.google.com/search"; method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/perl/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/";>Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY";>Issue 
Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/";>Lucene</a></li>
+              <li><a href="http://dezi.org/";>Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/";>Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/";>Lucene.NET</a></li>
+              <li><a 
href="http://lucene.apache.org/pylucene/";>PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::Cookbook - Apache Lucy recipes</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>The Cookbook provides thematic documentation covering some of Apache 
Lucy&#8217;s more sophisticated features.
+For a step-by-step introduction to Lucy,
+see <a href="../../Lucy/Docs/Tutorial.html" class="podlinkpod"
+>Tutorial</a>.</p>
+
+<h3><a class='u'
+name="Chapters"
+>Chapters</a></h3>
+
+<ul>
+<li><a href="../../Lucy/Docs/Cookbook/FastUpdates.html" class="podlinkpod"
+>FastUpdates</a> - While index updates are fast on average,
+worst-case update performance may be significantly slower.
+To make index updates consistently quick,
+we must manually intervene to control the process of index segment 
consolidation.</li>
+
+<li><a href="../../Lucy/Docs/Cookbook/CustomQuery.html" class="podlinkpod"
+>CustomQuery</a> - Explore Lucy&#8217;s support for custom query types by 
creating a &#8220;PrefixQuery&#8221; class to handle trailing wildcards.</li>
+
+<li><a href="../../Lucy/Docs/Cookbook/CustomQueryParser.html" 
class="podlinkpod"
+>CustomQueryParser</a> - Define your own custom search query syntax using <a 
href="../../Lucy/Search/QueryParser.html" class="podlinkpod"
+>QueryParser</a> and Parse::RecDescent.</li>
+</ul>
+
+<h3><a class='u'
+name="Materials"
+>Materials</a></h3>
+
+<p>Some of the recipes in the Cookbook reference the completed <a 
href="../../Lucy/Docs/Tutorial.html" class="podlinkpod"
+>Tutorial</a> application.
+These materials can be found in the <code>sample</code> directory at the root 
of the Lucy distribution:</p>
+
+<pre>sample/indexer.pl        # indexing app
+sample/search.cgi        # search app
+sample/us_constitution   # corpus</pre>
+
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed 
under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache 
Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be 
trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>


Added: 
websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Cookbook/CustomQuery.html
==============================================================================
--- 
websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Cookbook/CustomQuery.html
 (added)
+++ 
websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Cookbook/CustomQuery.html
 Mon Apr  4 09:23:29 2016
@@ -0,0 +1,409 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::Cookbook::CustomQuery â Apache Lucy 
Documentation</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache 
Lucyâ¢"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/"; title="Apache Software 
Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/"; 
title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html"; 
title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html"; 
title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " 
title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/";>Apache</a>&nbsp;&raquo&nbsp;<a 
href="/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/">Perl</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/Lucy/Docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/Lucy/Docs/Cookbook/">Cookbook</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" 
action="http://www.google.com/search"; method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/perl/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/";>Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY";>Issue 
Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/";>Lucene</a></li>
+              <li><a href="http://dezi.org/";>Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/";>Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/";>Lucene.NET</a></li>
+              <li><a 
href="http://lucene.apache.org/pylucene/";>PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::Cookbook::CustomQuery - Sample subclass of Query</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>Explore Apache Lucy&#8217;s support for custom query types by creating a 
&#8220;PrefixQuery&#8221; class to handle trailing wildcards.</p>
+
+<pre>my $prefix_query = PrefixQuery-&#62;new(
+    field        =&#62; &#39;content&#39;,
+    query_string =&#62; &#39;foo*&#39;,
+);
+my $hits = $searcher-&#62;hits( query =&#62; $prefix_query );
+...</pre>
+
+<h3><a class='u'
+name="Query,_Compiler,_and_Matcher"
+>Query,
+Compiler,
+and Matcher</a></h3>
+
+<p>To add support for a new query type,
+we need three classes: a Query,
+a Compiler,
+and a Matcher.</p>
+
+<ul>
+<li>PrefixQuery - a subclass of <a href="../../../Lucy/Search/Query.html" 
class="podlinkpod"
+>Query</a>,
+and the only class that client code will deal with directly.</li>
+
+<li>PrefixCompiler - a subclass of <a 
href="../../../Lucy/Search/Compiler.html" class="podlinkpod"
+>Compiler</a>,
+whose primary role is to compile a PrefixQuery to a PrefixMatcher.</li>
+
+<li>PrefixMatcher - a subclass of <a href="../../../Lucy/Search/Matcher.html" 
class="podlinkpod"
+>Matcher</a>,
+which does the heavy lifting: it applies the query to individual documents and 
assigns a score to each match.</li>
+</ul>
+
+<p>The PrefixQuery class on its own isn&#8217;t enough because a Query 
object&#8217;s role is limited to expressing an abstract specification for the 
search.
+A Query is basically nothing but metadata; execution is left to the 
Query&#8217;s companion Compiler and Matcher.</p>
+
+<p>Here&#8217;s a simplified sketch illustrating how a Searcher&#8217;s hits() 
method ties together the three classes.</p>
+
+<pre>sub hits {
+    my ( $self, $query ) = @_;
+    my $compiler = $query-&#62;make_compiler(
+        searcher =&#62; $self,
+        boost    =&#62; $query-&#62;get_boost,
+    );
+    my $matcher = $compiler-&#62;make_matcher(
+        reader     =&#62; $self-&#62;get_reader,
+        need_score =&#62; 1,
+    );
+    my @hits = $matcher-&#62;capture_hits;
+    return \@hits;
+}</pre>
+
+<h4><a class='u'
+name="PrefixQuery"
+>PrefixQuery</a></h4>
+
+<p>Our PrefixQuery class will have two attributes: a query string and a field 
name.</p>
+
+<pre>package PrefixQuery;
+use base qw( Lucy::Search::Query );
+use Carp;
+use Scalar::Util qw( blessed );
+
+# Inside-out member vars and hand-rolled accessors.
+my %query_string;
+my %field;
+sub get_query_string { my $self = shift; return $query_string{$$self} }
+sub get_field        { my $self = shift; return $field{$$self} }</pre>
+
+<p>PrefixQuery&#8217;s constructor collects and validates the attributes.</p>
+
+<pre>sub new {
+    my ( $class, %args ) = @_;
+    my $query_string = delete $args{query_string};
+    my $field        = delete $args{field};
+    my $self         = $class-&#62;SUPER::new(%args);
+    confess(&#34;&#39;query_string&#39; param is required&#34;)
+        unless defined $query_string;
+    confess(&#34;Invalid query_string: &#39;$query_string&#39;&#34;)
+        unless $query_string =~ /\*\s*$/;
+    confess(&#34;&#39;field&#39; param is required&#34;)
+        unless defined $field;
+    $query_string{$$self} = $query_string;
+    $field{$$self}        = $field;
+    return $self;
+}</pre>
+
+<p>Since this is an inside-out class,
+we&#8217;ll need a destructor:</p>
+
+<pre>sub DESTROY {
+    my $self = shift;
+    delete $query_string{$$self};
+    delete $field{$$self};
+    $self-&#62;SUPER::DESTROY;
+}</pre>
+
+<p>The equals() method determines whether two Queries are logically 
equivalent:</p>
+
+<pre>sub equals {
+    my ( $self, $other ) = @_;
+    return 0 unless blessed($other);
+    return 0 unless $other-&#62;isa(&#34;PrefixQuery&#34;);
+    return 0 unless $field{$$self} eq $field{$$other};
+    return 0 unless $query_string{$$self} eq $query_string{$$other};
+    return 1;
+}</pre>
+
+<p>The last thing we&#8217;ll need is a make_compiler() factory method which 
kicks out a subclass of <a href="../../../Lucy/Search/Compiler.html" 
class="podlinkpod"
+>Compiler</a>.</p>
+
+<pre>sub make_compiler {
+    my ( $self, %args ) = @_;
+    my $subordinate = delete $args{subordinate};
+    my $compiler = PrefixCompiler-&#62;new( %args, parent =&#62; $self );
+    $compiler-&#62;normalize unless $subordinate;
+    return $compiler;
+}</pre>
+
+<h4><a class='u'
+name="PrefixCompiler"
+>PrefixCompiler</a></h4>
+
+<p>PrefixQuery&#8217;s make_compiler() method will be called internally at 
search-time by objects which subclass <a 
href="../../../Lucy/Search/Searcher.html" class="podlinkpod"
+>Searcher</a> &#8211; such as <a 
href="../../../Lucy/Search/IndexSearcher.html" class="podlinkpod"
+>IndexSearchers</a>.</p>
+
+<p>A Searcher is associated with a particular collection of documents.
+These documents may all reside in one index,
+as with IndexSearcher,
+or they may be spread out across multiple indexes on one or more machines,
+as with LucyX::Remote::ClusterSearcher.</p>
+
+<p>Searcher objects have access to certain statistical information about the 
collections they represent; for instance,
+a Searcher can tell you how many documents are in the collection&#8230;</p>
+
+<pre>my $maximum_number_of_docs_in_collection = $searcher-&#62;doc_max;</pre>
+
+<p>&#8230; or how many documents a specific term appears in:</p>
+
+<pre>my $term_appears_in_this_many_docs = $searcher-&#62;doc_freq(
+    field =&#62; &#39;content&#39;,
+    term  =&#62; &#39;foo&#39;,
+);</pre>
+
+<p>Such information can be used by sophisticated Compiler implementations to 
assign more or less heft to individual queries or sub-queries.
+However,
+we&#8217;re not going to bother with weighting for this demo; we&#8217;ll just 
assign a fixed score of 1.0 to each matching document.</p>
+
+<p>We don&#8217;t need to write a constructor,
+as it will suffice to inherit new() from Lucy::Search::Compiler.
+The only method we need to implement for PrefixCompiler is make_matcher().</p>
+
+<pre>package PrefixCompiler;
+use base qw( Lucy::Search::Compiler );
+
+sub make_matcher {
+    my ( $self, %args ) = @_;
+    my $seg_reader = $args{reader};
+
+    # Retrieve low-level components LexiconReader and PostingListReader.
+    my $lex_reader
+        = $seg_reader-&#62;obtain(&#34;Lucy::Index::LexiconReader&#34;);
+    my $plist_reader
+        = $seg_reader-&#62;obtain(&#34;Lucy::Index::PostingListReader&#34;);
+
+    # Acquire a Lexicon and seek it to our query string.
+    my $substring = $self-&#62;get_parent-&#62;get_query_string;
+    $substring =~ s/\*.\s*$//;
+    my $field = $self-&#62;get_parent-&#62;get_field;
+    my $lexicon = $lex_reader-&#62;lexicon( field =&#62; $field );
+    return unless $lexicon;
+    $lexicon-&#62;seek($substring);
+
+    # Accumulate PostingLists for each matching term.
+    my @posting_lists;
+    while ( defined( my $term = $lexicon-&#62;get_term ) ) {
+        last unless $term =~ /^\Q$substring/;
+        my $posting_list = $plist_reader-&#62;posting_list(
+            field =&#62; $field,
+            term  =&#62; $term,
+        );
+        if ($posting_list) {
+            push @posting_lists, $posting_list;
+        }
+        last unless $lexicon-&#62;next;
+    }
+    return unless @posting_lists;
+
+    return PrefixMatcher-&#62;new( posting_lists =&#62; \@posting_lists );
+}</pre>
+
+<p>PrefixCompiler gets access to a <a 
href="../../../Lucy/Index/SegReader.html" class="podlinkpod"
+>SegReader</a> object when make_matcher() gets called.
+From the SegReader and its sub-components <a 
href="../../../Lucy/Index/LexiconReader.html" class="podlinkpod"
+>LexiconReader</a> and <a href="../../../Lucy/Index/PostingListReader.html" 
class="podlinkpod"
+>PostingListReader</a>,
+we acquire a <a href="../../../Lucy/Index/Lexicon.html" class="podlinkpod"
+>Lexicon</a>,
+scan through the Lexicon&#8217;s unique terms,
+and acquire a <a href="../../../Lucy/Index/PostingList.html" class="podlinkpod"
+>PostingList</a> for each term that matches our prefix.</p>
+
+<p>Each of these PostingList objects represents a set of documents which match 
the query.</p>
+
+<h4><a class='u'
+name="PrefixMatcher"
+>PrefixMatcher</a></h4>
+
+<p>The Matcher subclass is the most involved.</p>
+
+<pre>package PrefixMatcher;
+use base qw( Lucy::Search::Matcher );
+
+# Inside-out member vars.
+my %doc_ids;
+my %tick;
+
+sub new {
+    my ( $class, %args ) = @_;
+    my $posting_lists = delete $args{posting_lists};
+    my $self          = $class-&#62;SUPER::new(%args);
+
+    # Cheesy but simple way of interleaving PostingList doc sets.
+    my %all_doc_ids;
+    for my $posting_list (@$posting_lists) {
+        while ( my $doc_id = $posting_list-&#62;next ) {
+            $all_doc_ids{$doc_id} = undef;
+        }
+    }
+    my @doc_ids = sort { $a &#60;=&#62; $b } keys %all_doc_ids;
+    $doc_ids{$$self} = \@doc_ids;
+
+    # Track our position within the array of doc ids.
+    $tick{$$self} = -1;
+
+    return $self;
+}
+
+sub DESTROY {
+    my $self = shift;
+    delete $doc_ids{$$self};
+    delete $tick{$$self};
+    $self-&#62;SUPER::DESTROY;
+}</pre>
+
+<p>The doc ids must be in order,
+or some will be ignored; hence the <code>sort</code> above.</p>
+
+<p>In addition to the constructor and destructor,
+there are three methods that must be overridden.</p>
+
+<p>next() advances the Matcher to the next valid matching doc.</p>
+
+<pre>sub next {
+    my $self    = shift;
+    my $doc_ids = $doc_ids{$$self};
+    my $tick    = ++$tick{$$self};
+    return 0 if $tick &#62;= scalar @$doc_ids;
+    return $doc_ids-&#62;[$tick];
+}</pre>
+
+<p>get_doc_id() returns the current document id,
+or 0 if the Matcher is exhausted.
+(<a href="../../../Lucy/Docs/DocIDs.html" class="podlinkpod"
+>Document numbers</a> start at 1,
+so 0 is a sentinel.)</p>
+
+<pre>sub get_doc_id {
+    my $self    = shift;
+    my $tick    = $tick{$$self};
+    my $doc_ids = $doc_ids{$$self};
+    return $tick &#60; scalar @$doc_ids ? $doc_ids-&#62;[$tick] : 0;
+}</pre>
+
+<p>score() conveys the relevance score of the current match.
+We&#8217;ll just return a fixed score of 1.0:</p>
+
+<pre>sub score { 1.0 }</pre>
+
+<h3><a class='u'
+name="Usage"
+>Usage</a></h3>
+
+<p>To get a basic feel for PrefixQuery,
+insert the FlatQueryParser module described in <a 
href="../../../Lucy/Docs/Cookbook/CustomQueryParser.html" class="podlinkpod"
+>CustomQueryParser</a> (which supports PrefixQuery) into the search.cgi sample 
app.</p>
+
+<pre>my $parser = FlatQueryParser-&#62;new( schema =&#62; 
$searcher-&#62;get_schema );
+my $query  = $parser-&#62;parse($q);</pre>
+
+<p>If you&#8217;re planning on using PrefixQuery in earnest,
+though,
+you may want to change up analyzers to avoid stemming,
+because stemming &#8211; another approach to prefix conflation &#8211; is not 
perfectly compatible with prefix searches.</p>
+
+<pre># Polyanalyzer with no SnowballStemmer.
+my $analyzer = Lucy::Analysis::PolyAnalyzer-&#62;new(
+    analyzers =&#62; [
+        Lucy::Analysis::StandardTokenizer-&#62;new,
+        Lucy::Analysis::Normalizer-&#62;new,
+    ],
+);</pre>
+
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed 
under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache 
Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be 
trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>

Added: 
websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Cookbook/CustomQueryParser.html
==============================================================================
--- 
websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Cookbook/CustomQueryParser.html
 (added)
+++ 
websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Cookbook/CustomQueryParser.html
 Mon Apr  4 09:23:29 2016
@@ -0,0 +1,327 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::Cookbook::CustomQueryParser â Apache Lucy 
Documentation</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache 
Lucyâ¢"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/"; title="Apache Software 
Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/"; 
title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html"; 
title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html"; 
title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " 
title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/";>Apache</a>&nbsp;&raquo&nbsp;<a 
href="/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/">Perl</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/Lucy/Docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/Lucy/Docs/Cookbook/">Cookbook</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" 
action="http://www.google.com/search"; method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/perl/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/";>Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY";>Issue 
Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/";>Lucene</a></li>
+              <li><a href="http://dezi.org/";>Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/";>Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/";>Lucene.NET</a></li>
+              <li><a 
href="http://lucene.apache.org/pylucene/";>PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::Cookbook::CustomQueryParser - Sample subclass of 
QueryParser.</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>Implement a custom search query language using a subclass of <a 
href="../../../Lucy/Search/QueryParser.html" class="podlinkpod"
+>QueryParser</a>.</p>
+
+<h3><a class='u'
+name="The_language"
+>The language</a></h3>
+
+<p>At first,
+our query language will support only simple term queries and phrases delimited 
by double quotes.
+For simplicity&#8217;s sake,
+it will not support parenthetical groupings,
+boolean operators,
+or prepended plus/minus.
+The results for all subqueries will be unioned together &#8211; i.e.
+joined using an OR &#8211; which is usually the best approach for 
small-to-medium-sized document collections.</p>
+
+<p>Later,
+we&#8217;ll add support for trailing wildcards.</p>
+
+<h3><a class='u'
+name="Single-field_parser"
+>Single-field parser</a></h3>
+
+<p>Our initial parser implentation will generate queries against a single 
fixed field,
+&#8220;content&#8221;,
+and it will analyze text using a fixed choice of English EasyAnalyzer.
+We won&#8217;t subclass Lucy::Search::QueryParser just yet.</p>
+
+<pre>package FlatQueryParser;
+use Lucy::Search::TermQuery;
+use Lucy::Search::PhraseQuery;
+use Lucy::Search::ORQuery;
+use Carp;
+
+sub new { 
+    my $analyzer = Lucy::Analysis::EasyAnalyzer-&#62;new(
+        language =&#62; &#39;en&#39;,
+    );
+    return bless { 
+        field    =&#62; &#39;content&#39;,
+        analyzer =&#62; $analyzer,
+    }, __PACKAGE__;
+}</pre>
+
+<p>Some private helper subs for creating TermQuery and PhraseQuery objects 
will help keep the size of our main parse() subroutine down:</p>
+
+<pre>sub _make_term_query {
+    my ( $self, $term ) = @_;
+    return Lucy::Search::TermQuery-&#62;new(
+        field =&#62; $self-&#62;{field},
+        term  =&#62; $term,
+    );
+}
+
+sub _make_phrase_query {
+    my ( $self, $terms ) = @_;
+    return Lucy::Search::PhraseQuery-&#62;new(
+        field =&#62; $self-&#62;{field},
+        terms =&#62; $terms,
+    );
+}</pre>
+
+<p>Our private _tokenize() method treats double-quote delimited material as a 
single token and splits on whitespace everywhere else.</p>
+
+<pre>sub _tokenize {
+    my ( $self, $query_string ) = @_;
+    my @tokens;
+    while ( length $query_string ) {
+        if ( $query_string =~ s/^\s+// ) {
+            next;    # skip whitespace
+        }
+        elsif ( $query_string =~ s/^(&#34;[^&#34;]*(?:&#34;|$))// ) {
+            push @tokens, $1;    # double-quoted phrase
+        }
+        else {
+            $query_string =~ s/(\S+)//;
+            push @tokens, $1;    # single word
+        }
+    }
+    return \@tokens;
+}</pre>
+
+<p>The main parsing routine creates an array of tokens by calling _tokenize(),
+runs the tokens through through the EasyAnalyzer,
+creates TermQuery or PhraseQuery objects according to how many tokens emerge 
from the EasyAnalyzer&#8217;s split() method,
+and adds each of the sub-queries to the primary ORQuery.</p>
+
+<pre>sub parse {
+    my ( $self, $query_string ) = @_;
+    my $tokens   = $self-&#62;_tokenize($query_string);
+    my $analyzer = $self-&#62;{analyzer};
+    my $or_query = Lucy::Search::ORQuery-&#62;new;
+
+    for my $token (@$tokens) {
+        if ( $token =~ s/^&#34;// ) {
+            $token =~ s/&#34;$//;
+            my $terms = $analyzer-&#62;split($token);
+            my $query = $self-&#62;_make_phrase_query($terms);
+            $or_query-&#62;add_child($phrase_query);
+        }
+        else {
+            my $terms = $analyzer-&#62;split($token);
+            if ( @$terms == 1 ) {
+                my $query = $self-&#62;_make_term_query( $terms-&#62;[0] );
+                $or_query-&#62;add_child($query);
+            }
+            elsif ( @$terms &#62; 1 ) {
+                my $query = $self-&#62;_make_phrase_query($terms);
+                $or_query-&#62;add_child($query);
+            }
+        }
+    }
+
+    return $or_query;
+}</pre>
+
+<h3><a class='u'
+name="Multi-field_parser"
+>Multi-field parser</a></h3>
+
+<p>Most often,
+the end user will want their search query to match not only a single 
&#8216;content&#8217; field,
+but also &#8216;title&#8217; and so on.
+To make that happen,
+we have to turn queries such as this&#8230;</p>
+
+<pre>foo AND NOT bar</pre>
+
+<p>&#8230; into the logical equivalent of this:</p>
+
+<pre>(title:foo OR content:foo) AND NOT (title:bar OR content:bar)</pre>
+
+<p>Rather than continue with our own from-scratch parser class and write the 
routines to accomplish that expansion,
+we&#8217;re now going to subclass Lucy::Search::QueryParser and take advantage 
of some of its existing methods.</p>
+
+<p>Our first parser implementation had the &#8220;content&#8221; field name 
and the choice of English EasyAnalyzer hard-coded for simplicity,
+but we don&#8217;t need to do that once we subclass Lucy::Search::QueryParser.
+QueryParser&#8217;s constructor &#8211; which we will inherit,
+allowing us to eliminate our own constructor &#8211; requires a Schema which 
conveys field and Analyzer information,
+so we can just defer to that.</p>
+
+<pre>package FlatQueryParser;
+use base qw( Lucy::Search::QueryParser );
+use Lucy::Search::TermQuery;
+use Lucy::Search::PhraseQuery;
+use Lucy::Search::ORQuery;
+use PrefixQuery;
+use Carp;
+
+# Inherit new()</pre>
+
+<p>We&#8217;re also going to jettison our _make_term_query() and 
_make_phrase_query() helper subs and chop our parse() subroutine way down.
+Our revised parse() routine will generate Lucy::Search::LeafQuery objects 
instead of TermQueries and PhraseQueries:</p>
+
+<pre>sub parse {
+    my ( $self, $query_string ) = @_;
+    my $tokens = $self-&#62;_tokenize($query_string);
+    my $or_query = Lucy::Search::ORQuery-&#62;new;
+    for my $token (@$tokens) {
+        my $leaf_query = Lucy::Search::LeafQuery-&#62;new( text =&#62; $token 
);
+        $or_query-&#62;add_child($leaf_query);
+    }
+    return $self-&#62;expand($or_query);
+}</pre>
+
+<p>The magic happens in QueryParser&#8217;s expand() method,
+which walks the ORQuery object we supply to it looking for LeafQuery objects,
+and calls expand_leaf() for each one it finds.
+expand_leaf() performs field-specific analysis,
+decides whether each query should be a TermQuery or a PhraseQuery,
+and if multiple fields are required,
+creates an ORQuery which mults out e.g.
+<code>foo</code> into <code>(title:foo OR content:foo)</code>.</p>
+
+<h3><a class='u'
+name="Extending_the_query_language"
+>Extending the query language</a></h3>
+
+<p>To add support for trailing wildcards to our query language,
+we need to override expand_leaf() to accommodate PrefixQuery,
+while deferring to the parent class implementation on TermQuery and 
PhraseQuery.</p>
+
+<pre>sub expand_leaf {
+    my ( $self, $leaf_query ) = @_;
+    my $text = $leaf_query-&#62;get_text;
+    if ( $text =~ /\*$/ ) {
+        my $or_query = Lucy::Search::ORQuery-&#62;new;
+        for my $field ( @{ $self-&#62;get_fields } ) {
+            my $prefix_query = PrefixQuery-&#62;new(
+                field        =&#62; $field,
+                query_string =&#62; $text,
+            );
+            $or_query-&#62;add_child($prefix_query);
+        }
+        return $or_query;
+    }
+    else {
+        return $self-&#62;SUPER::expand_leaf($leaf_query);
+    }
+}</pre>
+
+<p>Ordinarily,
+those asterisks would have been stripped when running tokens through the 
EasyAnalyzer &#8211; query strings containing &#8220;foo*&#8221; would produce 
TermQueries for the term &#8220;foo&#8221;.
+Our override intercepts tokens with trailing asterisks and processes them as 
PrefixQueries before <code>SUPER::expand_leaf</code> can discard them,
+so that a search for &#8220;foo*&#8221; can match &#8220;food&#8221;,
+&#8220;foosball&#8221;,
+and so on.</p>
+
+<h3><a class='u'
+name="Usage"
+>Usage</a></h3>
+
+<p>Insert our custom parser into the search.cgi sample app to get a feel for 
how it behaves:</p>
+
+<pre>my $parser = FlatQueryParser-&#62;new( schema =&#62; 
$searcher-&#62;get_schema );
+my $query  = $parser-&#62;parse( decode( &#39;UTF-8&#39;, 
$cgi-&#62;param(&#39;q&#39;) || &#39;&#39; ) );
+my $hits   = $searcher-&#62;hits(
+    query      =&#62; $query,
+    offset     =&#62; $offset,
+    num_wanted =&#62; $page_size,
+);
+...</pre>
+
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed 
under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache 
Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be 
trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>

Added: 
websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Cookbook/FastUpdates.html
==============================================================================
--- 
websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Cookbook/FastUpdates.html
 (added)
+++ 
websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/Cookbook/FastUpdates.html
 Mon Apr  4 09:23:29 2016
@@ -0,0 +1,258 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::Cookbook::FastUpdates â Apache Lucy 
Documentation</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache 
Lucyâ¢"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/"; title="Apache Software 
Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/"; 
title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html"; 
title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html"; 
title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " 
title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/";>Apache</a>&nbsp;&raquo&nbsp;<a 
href="/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/">Perl</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/Lucy/Docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/Lucy/Docs/Cookbook/">Cookbook</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" 
action="http://www.google.com/search"; method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/perl/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/";>Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY";>Issue 
Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/";>Lucene</a></li>
+              <li><a href="http://dezi.org/";>Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/";>Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/";>Lucene.NET</a></li>
+              <li><a 
href="http://lucene.apache.org/pylucene/";>PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::Cookbook::FastUpdates - Near real-time index updates</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>While index updates are fast on average,
+worst-case update performance may be significantly slower.
+To make index updates consistently quick,
+we must manually intervene to control the process of index segment 
consolidation.</p>
+
+<h3><a class='u'
+name="The_problem"
+>The problem</a></h3>
+
+<p>Ordinarily,
+modifying an index is cheap.
+New data is added to new segments,
+and the time to write a new segment scales more or less linearly with the 
number of documents added during the indexing session.</p>
+
+<p>Deletions are also cheap most of the time,
+because we don&#8217;t remove documents immediately but instead mark them as 
deleted,
+and adding the deletion mark is cheap.</p>
+
+<p>However,
+as new segments are added and the deletion rate for existing segments 
increases,
+search-time performance slowly begins to degrade.
+At some point,
+it becomes necessary to consolidate existing segments,
+rewriting their data into a new segment.</p>
+
+<p>If the recycled segments are small,
+the time it takes to rewrite them may not be significant.
+Every once in a while,
+though,
+a large amount of data must be rewritten.</p>
+
+<h3><a class='u'
+name="Procrastinating_and_playing_catch-up"
+>Procrastinating and playing catch-up</a></h3>
+
+<p>The simplest way to force fast index updates is to avoid rewriting 
anything.</p>
+
+<p>Indexer relies upon <a href="../../../Lucy/Index/IndexManager.html" 
class="podlinkpod"
+>IndexManager</a>&#8217;s <a 
href="../../../Lucy/Index/IndexManager.html#recycle" class="podlinkpod"
+>recycle()</a> method to tell it which segments should be consolidated.
+If we subclass IndexManager and override the method so that it always returns 
an empty array,
+we get consistently quick performance:</p>
+
+<pre>package NoMergeManager;
+use base qw( Lucy::Index::IndexManager );
+sub recycle { [] }
+
+package main;
+my $indexer = Lucy::Index::Indexer-&#62;new(
+    index =&#62; &#39;/path/to/index&#39;,
+    manager =&#62; NoMergeManager-&#62;new,
+);
+...
+$indexer-&#62;commit;</pre>
+
+<p>However,
+we can&#8217;t procrastinate forever.
+Eventually,
+we&#8217;ll have to run an ordinary,
+uncontrolled indexing session,
+potentially triggering a large rewrite of lots of small and/or degraded 
segments:</p>
+
+<pre>my $indexer = Lucy::Index::Indexer-&#62;new( 
+    index =&#62; &#39;/path/to/index&#39;, 
+    # manager =&#62; NoMergeManager-&#62;new,
+);
+...
+$indexer-&#62;commit;</pre>
+
+<h3><a class='u'
+name="Acceptable_worst-case_update_time,_slower_degradation"
+>Acceptable worst-case update time,
+slower degradation</a></h3>
+
+<p>Never merging anything at all in the main indexing process is probably 
overkill.
+Small segments are relatively cheap to merge; we just need to guard against 
the big rewrites.</p>
+
+<p>Setting a ceiling on the number of documents in the segments to be recycled 
allows us to avoid a mass proliferation of tiny,
+single-document segments,
+while still offering decent worst-case update speed:</p>
+
+<pre>package LightMergeManager;
+use base qw( Lucy::Index::IndexManager );
+
+sub recycle {
+    my $self = shift;
+    my $seg_readers = $self-&#62;SUPER::recycle(@_);
+    @$seg_readers = grep { $_-&#62;doc_max &#60; 10 } @$seg_readers;
+    return $seg_readers;
+}</pre>
+
+<p>However,
+we still have to consolidate every once in a while,
+and while that happens content updates will be locked out.</p>
+
+<h3><a class='u'
+name="Background_merging"
+>Background merging</a></h3>
+
+<p>If it&#8217;s not acceptable to lock out updates while the index 
consolidation process runs,
+the alternative is to move the consolidation process out of band,
+using <a href="../../../Lucy/Index/BackgroundMerger.html" class="podlinkpod"
+>BackgroundMerger</a>.</p>
+
+<p>It&#8217;s never safe to have more than one Indexer attempting to modify 
the content of an index at the same time,
+but a BackgroundMerger and an Indexer can operate simultaneously:</p>
+
+<pre># Indexing process.
+use Scalar::Util qw( blessed );
+my $retries = 0;
+while (1) {
+    eval {
+        my $indexer = Lucy::Index::Indexer-&#62;new(
+                index =&#62; &#39;/path/to/index&#39;,
+                manager =&#62; LightMergeManager-&#62;new,
+            );
+        $indexer-&#62;add_doc($doc);
+        $indexer-&#62;commit;
+    };
+    last unless $@;
+    if ( blessed($@) and $@-&#62;isa(&#34;Lucy::Store::LockErr&#34;) ) {
+        # Catch LockErr.
+        warn &#34;Couldn&#39;t get lock ($retries retries)&#34;;
+        $retries++;
+    }
+    else {
+        die &#34;Write failed: $@&#34;;
+    }
+}
+
+# Background merge process.
+my $manager = Lucy::Index::IndexManager-&#62;new;
+$manager-&#62;set_write_lock_timeout(60_000);
+my $bg_merger = Lucy::Index::BackgroundMerger-&#62;new(
+    index   =&#62; &#39;/path/to/index&#39;,
+    manager =&#62; $manager,
+);
+$bg_merger-&#62;commit;</pre>
+
+<p>The exception handling code becomes useful once you have more than one 
index modification process happening simultaneously.
+By default,
+Indexer tries several times to acquire a write lock over the span of one 
second,
+then holds it until <a href="../../../Lucy/Index/Indexer.html#commit" 
class="podlinkpod"
+>commit()</a> completes.
+BackgroundMerger handles most of its work without the write lock,
+but it does need it briefly once at the beginning and once again near the end.
+Under normal loads,
+the internal retry logic will resolve conflicts,
+but if it&#8217;s not acceptable to miss an insert,
+you probably want to catch <a href="../../../Lucy/Store/LockErr.html" 
class="podlinkpod"
+>LockErr</a> exceptions thrown by Indexer.
+In contrast,
+a LockErr from BackgroundMerger probably just needs to be logged.</p>
+
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed 
under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache 
Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be 
trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>

Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/DevGuide.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/DevGuide.html 
(added)
+++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/DevGuide.html Mon 
Apr  4 09:23:29 2016
@@ -0,0 +1,142 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::DevGuide â Apache Lucy Documentation</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache 
Lucyâ¢"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/"; title="Apache Software 
Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/"; 
title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html"; 
title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html"; 
title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " 
title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/";>Apache</a>&nbsp;&raquo&nbsp;<a 
href="/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/">Perl</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/Lucy/Docs/">Docs</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" 
action="http://www.google.com/search"; method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/perl/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/";>Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY";>Issue 
Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/";>Lucene</a></li>
+              <li><a href="http://dezi.org/";>Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/";>Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/";>Lucene.NET</a></li>
+              <li><a 
href="http://lucene.apache.org/pylucene/";>PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::DevGuide - Quick-start guide to hacking on Apache Lucy.</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>The Apache Lucy code base is organized into roughly four layers:</p>
+
+<ul>
+<li>Charmonizer - compiler and OS configuration probing.</li>
+
+<li>Clownfish - header files.</li>
+
+<li>C - implementation files.</li>
+
+<li>Host - binding language.</li>
+</ul>
+
+<p>Charmonizer is a configuration prober which writes a single header file,
+&#8220;charmony.h&#8221;,
+describing the build environment and facilitating cross-platform development.
+It&#8217;s similar to Autoconf or Metaconfig,
+but written in pure C.</p>
+
+<p>The &#8220;.cfh&#8221; files within the Lucy core are Clownfish header 
files.
+Clownfish is a purpose-built,
+declaration-only language which superimposes a single-inheritance object model 
on top of C which is specifically designed to co-exist happily with variety of 
&#8220;host&#8221; languages and to allow limited run-time dynamic subclassing.
+For more information see the Clownfish docs,
+but if there&#8217;s one thing you should know about Clownfish OO before you 
start hacking,
+it&#8217;s that method calls are differentiated from functions by 
capitalization:</p>
+
+<pre>Indexer_Add_Doc   &#60;-- Method, typically uses dynamic dispatch.
+Indexer_add_doc   &#60;-- Function, always a direct invocation.</pre>
+
+<p>The C files within the Lucy core are where most of Lucy&#8217;s low-level 
functionality lies.
+They implement the interface defined by the Clownfish header files.</p>
+
+<p>The C core is intentionally left incomplete,
+however; to be usable,
+it must be bound to a &#8220;host&#8221; language.
+(In this context,
+even C is considered a &#8220;host&#8221; which must implement the missing 
pieces and be &#8220;bound&#8221; to the core.) Some of the binding code is 
autogenerated by Clownfish on a spec customized for each language.
+Other pieces are hand-coded in either C (using the host&#8217;s C API) or the 
host language itself.</p>
+
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed 
under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache 
Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be 
trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>

Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/DocIDs.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/DocIDs.html (added)
+++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/DocIDs.html Mon Apr 
 4 09:23:29 2016
@@ -0,0 +1,135 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::DocIDs â Apache Lucy Documentation</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache 
Lucyâ¢"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/"; title="Apache Software 
Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/"; 
title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html"; 
title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html"; 
title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " 
title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/";>Apache</a>&nbsp;&raquo&nbsp;<a 
href="/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/">Perl</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/Lucy/Docs/">Docs</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" 
action="http://www.google.com/search"; method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/perl/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/";>Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY";>Issue 
Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/";>Lucene</a></li>
+              <li><a href="http://dezi.org/";>Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/";>Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/";>Lucene.NET</a></li>
+              <li><a 
href="http://lucene.apache.org/pylucene/";>PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::DocIDs - Characteristics of Apache Lucy document ids.</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<h3><a class='u'
+name="Document_ids_are_signed_32-bit_integers"
+>Document ids are signed 32-bit integers</a></h3>
+
+<p>Document ids in Apache Lucy start at 1.
+Because 0 is never a valid doc id,
+we can use it as a sentinel value:</p>
+
+<pre>while ( my $doc_id = $posting_list-&#62;next ) {
+    ...
+}</pre>
+
+<h3><a class='u'
+name="Document_ids_are_ephemeral"
+>Document ids are ephemeral</a></h3>
+
+<p>The document ids used by Lucy are associated with a single index snapshot.
+The moment an index is updated,
+the mapping of document ids to documents is subject to change.</p>
+
+<p>Since IndexReader objects represent a point-in-time view of an index,
+document ids are guaranteed to remain static for the life of the reader.
+However,
+because they are not permanent,
+Lucy document ids cannot be used as foreign keys to locate records in external 
data sources.
+If you truly need a primary key field,
+you must define it and populate it yourself.</p>
+
+<p>Furthermore,
+the order of document ids does not tell you anything about the sequence in 
which documents were added to the index.</p>
+
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed 
under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache 
Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be 
trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>

Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/FileFormat.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/FileFormat.html 
(added)
+++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/FileFormat.html Mon 
Apr  4 09:23:29 2016
@@ -0,0 +1,358 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::FileFormat â Apache Lucy Documentation</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache 
Lucyâ¢"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/"; title="Apache Software 
Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/"; 
title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html"; 
title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html"; 
title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " 
title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/";>Apache</a>&nbsp;&raquo&nbsp;<a 
href="/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/">Perl</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/Lucy/Docs/">Docs</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" 
action="http://www.google.com/search"; method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/perl/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/";>Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY";>Issue 
Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/";>Lucene</a></li>
+              <li><a href="http://dezi.org/";>Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/";>Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/";>Lucene.NET</a></li>
+              <li><a 
href="http://lucene.apache.org/pylucene/";>PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::FileFormat - Overview of index file format</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>It is not necessary to understand the current implementation details of the 
index file format in order to use Apache Lucy effectively,
+but it may be helpful if you are interested in tweaking for high performance,
+exotic usage,
+or debugging and development.</p>
+
+<p>On a file system,
+an index is a directory.
+The files inside have a hierarchical relationship: an index is made up of 
&#8220;segments&#8221;,
+each of which is an independent inverted index with its own subdirectory; each 
segment is made up of several component parts.</p>
+
+<pre>[index]--|
+         |--snapshot_XXX.json
+         |--schema_XXX.json
+         |--write.lock
+         |
+         |--seg_1--|
+         |         |--segmeta.json
+         |         |--cfmeta.json
+         |         |--cf.dat-------|
+         |                         |--[lexicon]
+         |                         |--[postings]
+         |                         |--[documents]
+         |                         |--[highlight]
+         |                         |--[deletions]
+         |
+         |--seg_2--|
+         |         |--segmeta.json
+         |         |--cfmeta.json
+         |         |--cf.dat-------|
+         |                         |--[lexicon]
+         |                         |--[postings]
+         |                         |--[documents]
+         |                         |--[highlight]
+         |                         |--[deletions]
+         |
+         |--[...]--| </pre>
+
+<h3><a class='u'
+name="Write-once_philosophy"
+>Write-once philosophy</a></h3>
+
+<p>All segment directory names consist of the string &#8220;seg_&#8221; 
followed by a number in base 36: seg_1,
+seg_5m,
+seg_p9s2 and so on,
+with higher numbers indicating more recent segments.
+Once a segment is finished and committed,
+its name is never re-used and its files are never modified.</p>
+
+<p>Old segments become obsolete and can be removed when their data has been 
consolidated into new segments during the process of segment merging and 
optimization.
+A fully-optimized index has only one segment.</p>
+
+<h3><a class='u'
+name="Top-level_entries"
+>Top-level entries</a></h3>
+
+<p>There are a handful of &#8220;top-level&#8221; files and directories which 
belong to the entire index rather than to a particular segment.</p>
+
+<h4><a class='u'
+name="snapshot_XXX.json"
+>snapshot_XXX.json</a></h4>
+
+<p>A &#8220;snapshot&#8221; file,
+e.g.
+<code>snapshot_m7p.json</code>,
+is list of index files and directories.
+Because index files,
+once written,
+are never modified,
+the list of entries in a snapshot defines a point-in-time view of the data in 
an index.</p>
+
+<p>Like segment directories,
+snapshot files also utilize the unique-base-36-number naming convention; the 
higher the number,
+the more recent the file.
+The appearance of a new snapshot file within the index directory constitutes 
an index update.
+While a new segment is being written new files may be added to the index 
directory,
+but until a new snapshot file gets written,
+a Searcher opening the index for reading won&#8217;t know about them.</p>
+
+<h4><a class='u'
+name="schema_XXX.json"
+>schema_XXX.json</a></h4>
+
+<p>The schema file is a Schema object describing the index&#8217;s format,
+serialized as JSON.
+It,
+too,
+is versioned,
+and a given snapshot file will reference one and only one schema file.</p>
+
+<h4><a class='u'
+name="locks"
+>locks</a></h4>
+
+<p>By default,
+only one indexing process may safely modify the index at any given time.
+Processes reserve an index by laying claim to the <code>write.lock</code> file 
within the <code>locks/</code> directory.
+A smattering of other lock files may be used from time to time,
+as well.</p>
+
+<h3><a class='u'
+name="A_segment(8217)s_component_parts"
+>A segment&#8217;s component parts</a></h3>
+
+<p>By default,
+each segment has up to five logical components: lexicon,
+postings,
+document storage,
+highlight data,
+and deletions.
+Binary data from these components gets stored in virtual files within the 
&#8220;cf.dat&#8221; compound file; metadata is stored in a shared 
&#8220;segmeta.json&#8221; file.</p>
+
+<h4><a class='u'
+name="segmeta.json"
+>segmeta.json</a></h4>
+
+<p>The segmeta.json file is a central repository for segment metadata.
+In addition to information such as document counts and field numbers,
+it also warehouses arbitrary metadata on behalf of individual index 
components.</p>
+
+<h4><a class='u'
+name="Lexicon"
+>Lexicon</a></h4>
+
+<p>Each indexed field gets its own lexicon in each segment.
+The exact files involved depend on the field&#8217;s type,
+but generally speaking there will be two parts.
+First,
+there&#8217;s a primary <code>lexicon-XXX.dat</code> file which houses a 
complete term list associating terms with corpus frequency statistics,
+postings file locations,
+etc.
+Second,
+one or more &#8220;lexicon index&#8221; files may be present which contain 
periodic samples from the primary lexicon file to facilitate fast lookups.</p>
+
+<h4><a class='u'
+name="Postings"
+>Postings</a></h4>
+
+<p>&#8220;Posting&#8221; is a technical term from the field of <a 
href="../../Lucy/Docs/IRTheory.html" class="podlinkpod"
+>information retrieval</a>,
+defined as a single instance of a one term indexing one document.
+If you are looking at the index in the back of a book,
+and you see that &#8220;freedom&#8221; is referenced on pages 8,
+86,
+and 240,
+that would be three postings,
+which taken together form a &#8220;posting list&#8221;.
+The same terminology applies to an index in electronic form.</p>
+
+<p>Each segment has one postings file per indexed field.
+When a search is performed for a single term,
+first that term is looked up in the lexicon.
+If the term exists in the segment,
+the record in the lexicon will contain information about which postings file 
to look at and where to look.</p>
+
+<p>The first thing any posting record tells you is a document id.
+By iterating over all the postings associated with a term,
+you can find all the documents that match that term,
+a process which is analogous to looking up page numbers in a book&#8217;s 
index.
+However,
+each posting record typically contains other information in addition to 
document id,
+e.g.
+the positions at which the term occurs within the field.</p>
+
+<h4><a class='u'
+name="Documents"
+>Documents</a></h4>
+
+<p>The document storage section is a simple database,
+organized into two files:</p>
+
+<ul>
+<li><b>documents.dat</b> - Serialized documents.</li>
+
+<li><b>documents.ix</b> - Document storage index,
+a solid array of 64-bit integers where each integer location corresponds to a 
document id,
+and the value at that location points at a file position in the documents.dat 
file.</li>
+</ul>
+
+<h4><a class='u'
+name="Highlight_data"
+>Highlight data</a></h4>
+
+<p>The files which store data used for excerpting and highlighting are 
organized similarly to the files used to store documents.</p>
+
+<ul>
+<li><b>highlight.dat</b> - Chunks of serialized highlight data,
+one per doc id.</li>
+
+<li><b>highlight.ix</b> - Highlight data index &#8211; as with the 
<code>documents.ix</code> file,
+a solid array of 64-bit file pointers.</li>
+</ul>
+
+<h4><a class='u'
+name="Deletions"
+>Deletions</a></h4>
+
+<p>When a document is &#8220;deleted&#8221; from a segment,
+it is not actually purged right away; it is merely marked as 
&#8220;deleted&#8221; via a deletions file.
+Deletions files contains bit vectors with one bit for each document in the 
segment; if bit #254 is set then document 254 is deleted,
+and if that document turns up in a search it will be masked out.</p>
+
+<p>It is only when a segment&#8217;s contents are rewritten to a new segment 
during the segment-merging process that deleted documents truly go away.</p>
+
+<h3><a class='u'
+name="Compound_Files"
+>Compound Files</a></h3>
+
+<p>If you peer inside an index directory,
+you won&#8217;t actually find any files named &#8220;documents.dat&#8221;,
+&#8220;highlight.ix&#8221;,
+etc.
+unless there is an indexing process underway.
+What you will find instead is one &#8220;cf.dat&#8221; and one 
&#8220;cfmeta.json&#8221; file per segment.</p>
+
+<p>To minimize the need for file descriptors at search-time,
+all per-segment binary data files are concatenated together in 
&#8220;cf.dat&#8221; at the close of each indexing session.
+Information about where each file begins and ends is stored in 
<code>cfmeta.json</code>.
+When the segment is opened for reading,
+a single file descriptor per &#8220;cf.dat&#8221; file can be shared among 
several readers.</p>
+
+<h3><a class='u'
+name="A_Typical_Search"
+>A Typical Search</a></h3>
+
+<p>Here&#8217;s a simplified narrative,
+dramatizing how a search for &#8220;freedom&#8221; against a given segment 
plays out:</p>
+
+<ul>
+<li>The searcher asks the relevant Lexicon Index,
+&#8220;Do you know anything about &#8216;freedom&#8217;?&#8221; Lexicon Index 
replies,
+&#8220;Can&#8217;t say for sure,
+but if the main Lexicon file does,
+&#8216;freedom&#8217; is probably somewhere around byte 21008&#8221;.</li>
+
+<li>The main Lexicon tells the searcher &#8220;One moment,
+let me scan our records&#8230; Yes,
+we have 2 documents which contain &#8216;freedom&#8217;.
+You&#8217;ll find them in seg_6/postings-4.dat starting at byte 
66991.&#8221;</li>
+
+<li>The Postings file says &#8220;Yep,
+we have &#8216;freedom&#8217;,
+all right!
+Document id 40 has 1 &#8216;freedom&#8217;,
+and document 44 has 8.
+If you need to know more,
+like if any &#8216;freedom&#8217; is part of the phrase &#8216;freedom of 
speech&#8217;,
+ask me about positions!</li>
+
+<li>If the searcher is only looking for &#8216;freedom&#8217; in isolation,
+that&#8217;s where it stops.
+It now knows enough to assign the documents scores against 
&#8220;freedom&#8221;,
+with the 8-freedom document likely ranking higher than the single-freedom 
document.</li>
+</ul>
+
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed 
under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache 
Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be 
trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>

Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/FileLocking.html
==============================================================================
--- websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/FileLocking.html 
(added)
+++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Docs/FileLocking.html 
Mon Apr  4 09:23:29 2016
@@ -0,0 +1,181 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html lang="en">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+    <title>Lucy::Docs::FileLocking â Apache Lucy Documentation</title>
+    <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css">
+  </head>
+
+  <body>
+
+    <div id="lucy-rigid_wrapper">
+
+      <div id="lucy-top" class="container_16 lucy-white_box_3d">
+
+        <div id="lucy-logo_box" class="grid_8">
+          <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache 
Lucyâ¢"></a>
+        </div> <!-- lucy-logo_box -->
+
+        <div #id="lucy-top_nav_box" class="grid_8">
+          <div id="lucy-top_nav_bar" class="container_8">
+            <ul>
+              <li><a href="http://www.apache.org/"; title="Apache Software 
Foundation">Apache Software Foundation</a></li>
+              <li><a href="http://www.apache.org/licenses/"; 
title="License">License</a></li>
+              <li><a href="http://www.apache.org/foundation/sponsorship.html"; 
title="Sponsorship">Sponsorship</a></li>
+              <li><a href="http://www.apache.org/foundation/thanks.html"; 
title="Thanks">Thanks</a></li>
+              <li><a href="http://www.apache.org/security/ " 
title="Security">Security</a></li>
+            </ul>
+          </div> <!-- lucy-top_nav_bar -->
+          <p><a href="http://www.apache.org/";>Apache</a>&nbsp;&raquo&nbsp;<a 
href="/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/">Docs</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/">Perl</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/Lucy/">Lucy</a>&nbsp;&raquo&nbsp;<a 
href="/docs/perl/Lucy/Docs/">Docs</a></p>
+          <form name="lucy-top_search_box" id="lucy-top_search_box" 
action="http://www.google.com/search"; method="get">
+            <input value="*.apache.org" name="sitesearch" type="hidden"/>
+            <input type="text" name="q" id="query" style="width:85%">
+            <input type="submit" id="submit" value="Search">
+          </form>
+        </div> <!-- lucy-top_nav_box -->
+
+        <div class="clear"></div>
+
+      </div> <!-- lucy-top -->
+
+      <div id="lucy-main_content" class="container_16 lucy-white_box_3d">
+
+        <div class="grid_4" id="lucy-left_nav_box">
+          <h6>About</h6>
+            <ul>
+              <li><a href="/">Welcome</a></li>
+              <li><a href="/clownfish.html">Clownfish</a></li>
+              <li><a href="/faq.html">FAQ</a></li>
+              <li><a href="/people.html">People</a></li>
+            </ul>
+          <h6>Resources</h6>
+            <ul>
+              <li><a href="/download.html">Download</a></li>
+              <li><a href="/mailing_lists.html">Mailing Lists</a></li>
+              <li><a href="/docs/perl/">Documentation</a></li>
+              <li><a href="http://wiki.apache.org/lucy/";>Wiki</a></li>
+              <li><a href="https://issues.apache.org/jira/browse/LUCY";>Issue 
Tracker</a></li>
+              <li><a href="/version_control.html">Version Control</a></li>
+            </ul>
+          <h6>Related Projects</h6>
+            <ul>
+              <li><a href="http://lucene.apache.org/core/";>Lucene</a></li>
+              <li><a href="http://dezi.org/";>Dezi</a></li>
+              <li><a href="http://lucene.apache.org/solr/";>Solr</a></li>
+              <li><a href="http://lucenenet.apache.org/";>Lucene.NET</a></li>
+              <li><a 
href="http://lucene.apache.org/pylucene/";>PyLucene</a></li>
+            </ul>
+        </div> <!-- lucy-left_nav_box -->
+
+        <div id="lucy-main_content_box" class="grid_9">
+          <div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::FileLocking - Manage indexes on shared volumes.</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>Normally,
+index locking is an invisible process.
+Exclusive write access is controlled via lockfiles within the index directory 
and problems only arise if multiple processes attempt to acquire the write lock 
simultaneously; search-time processes do not ordinarily require locking at 
all.</p>
+
+<p>On shared volumes,
+however,
+the default locking mechanism fails,
+and manual intervention becomes necessary.</p>
+
+<p>Both read and write applications accessing an index on a shared volume need 
to identify themselves with a unique <code>host</code> id,
+e.g.
+hostname or ip address.
+Knowing the host id makes it possible to tell which lockfiles belong to other 
machines and therefore must not be removed when the lockfile&#8217;s pid number 
appears not to correspond to an active process.</p>
+
+<p>At index-time,
+the danger is that multiple indexing processes from different machines which 
fail to specify a unique <code>host</code> id can delete each others&#8217; 
lockfiles and then attempt to modify the index at the same time,
+causing index corruption.
+The search-time problem is more complex.</p>
+
+<p>Once an index file is no longer listed in the most recent snapshot,
+Indexer attempts to delete it as part of a post-<a href="lucy:Indexer.Commit" 
class="podlinkurl"
+>lucy:Indexer.Commit</a> cleanup routine.
+It is possible that at the moment an Indexer is deleting files which it 
believes no longer needed,
+a Searcher referencing an earlier snapshot is in fact using them.
+The more often that an index is either updated or searched,
+the more likely it is that this conflict will arise from time to time.</p>
+
+<p>Ordinarily,
+the deletion attempts are not a problem.
+On a typical unix volume,
+the files will be deleted in name only: any process which holds an open 
filehandle against a given file will continue to have access,
+and the file won&#8217;t actually get vaporized until the last filehandle is 
cleared.
+Thanks to &#8220;delete on last close semantics&#8221;,
+an Indexer can&#8217;t truly delete the file out from underneath an active 
Searcher.
+On Windows,
+where file deletion fails whenever any process holds an open handle,
+the situation is different but still workable: Indexer just keeps retrying 
after each commit until deletion finally succeeds.</p>
+
+<p>On NFS,
+however,
+the system breaks,
+because NFS allows files to be deleted out from underneath active processes.
+Should this happen,
+the unlucky read process will crash with a &#8220;Stale NFS filehandle&#8221; 
exception.</p>
+
+<p>Under normal circumstances,
+it is neither necessary nor desirable for IndexReaders to secure read locks 
against an index,
+but for NFS we have to make an exception.
+LockFactory&#8217;s <a href="lucy:LockFactory.Make_Shared_Lock" 
class="podlinkurl"
+>lucy:LockFactory.Make_Shared_Lock</a> method exists for this reason; 
supplying an IndexManager instance to IndexReader&#8217;s constructor activates 
an internal locking mechanism using <a href="lucy:LockFactory.Make_Shared_Lock" 
class="podlinkurl"
+>lucy:LockFactory.Make_Shared_Lock</a> which prevents concurrent indexing 
processes from deleting files that are needed by active readers.</p>
+
+<pre>use Sys::Hostname qw( hostname );
+my $hostname = hostname() or die &#34;Can&#39;t get unique hostname&#34;;
+my $manager = Lucy::Index::IndexManager-&#62;new( host =&#62; $hostname );
+
+# Index time:
+my $indexer = Lucy::Index::Indexer-&#62;new(
+    index   =&#62; &#39;/path/to/index&#39;,
+    manager =&#62; $manager,
+);
+
+# Search time:
+my $reader = Lucy::Index::IndexReader-&#62;open(
+    index   =&#62; &#39;/path/to/index&#39;,
+    manager =&#62; $manager,
+);
+my $searcher = Lucy::Search::IndexSearcher-&#62;new( index =&#62; $reader 
);</pre>
+
+<p>Since shared locks are implemented using lockfiles located in the index 
directory (as are exclusive locks),
+reader applications must have write access for read locking to work.
+Stale lock files from crashed processes are ordinarily cleared away the next 
time the same machine &#8211; as identified by the <code>host</code> parameter 
&#8211; opens another IndexReader.
+(The classic technique of timing out lock files is not feasible because search 
processes may lie dormant indefinitely.) However,
+please be aware that if the last thing a given machine does is crash,
+lock files belonging to it may persist,
+preventing deletion of obsolete index data.</p>
+
+</div>
+
+        </div> <!-- lucy-main_content_box --> 
+        <div class="clear"></div>
+
+      </div> <!-- lucy-main_content -->
+
+      <div id="lucy-copyright" class="container_16">
+        <p>Copyright &#169; 2010-2015 The Apache Software Foundation, Licensed 
under the 
+           <a href="http://www.apache.org/licenses/LICENSE-2.0";>Apache 
License, Version 2.0</a>.
+           <br/>
+           Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache 
Lucy project logo are trademarks of The
+           Apache Software Foundation.  All other marks mentioned may be 
trademarks or registered trademarks of their
+           respective owners.
+        </p>
+      </div> <!-- lucy-copyright -->
+
+    </div> <!-- lucy-rigid_wrapper -->
+
+  </body>
+</html>

svn commit: r984656 [2/9] - in /websites/staging/lucy/trunk/content: ./ docs/perl/ docs/perl/Lucy/ docs/perl/Lucy/Analysis/ docs/perl/Lucy/Docs/ docs/perl/Lucy/Docs/Cookbook/ docs/perl/Lucy/Document/ docs/perl/Lucy/Highlight/ docs/perl/Lucy/Index/ docs...

Reply via email to