H...

nwellnhof Mon, 04 Apr 2016 02:23:42 -0700

Added: lucy/site/trunk/content/docs/perl/Lucy.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/perl/Lucy.mdtext?rev=1737642&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/perl/Lucy.mdtext (added)
+++ lucy/site/trunk/content/docs/perl/Lucy.mdtext Mon Apr  4 09:22:30 2016
@@ -0,0 +1,243 @@
+Title: Lucy â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy - Apache Lucy search engine library.</p>
+
+<h2><a class='u'
+name="VERSION"
+>VERSION</a></h2>
+
+<p>0.5.0</p>
+
+<h2><a class='u'
+name="SYNOPSIS"
+>SYNOPSIS</a></h2>
+
+<p>First,
+plan out your index structure,
+create the index,
+and add documents:</p>
+
+<pre># indexer.pl
+
+use Lucy::Index::Indexer;
+use Lucy::Plan::Schema;
+use Lucy::Analysis::EasyAnalyzer;
+use Lucy::Plan::FullTextType;
+
+# Create a Schema which defines index fields.
+my $schema = Lucy::Plan::Schema-&#62;new;
+my $easyanalyzer = Lucy::Analysis::EasyAnalyzer-&#62;new(
+    language =&#62; &#39;en&#39;,
+);
+my $type = Lucy::Plan::FullTextType-&#62;new(
+    analyzer =&#62; $easyanalyzer,
+);
+$schema-&#62;spec_field( name =&#62; &#39;title&#39;,   type =&#62; $type );
+$schema-&#62;spec_field( name =&#62; &#39;content&#39;, type =&#62; $type );
+
+# Create the index and add documents.
+my $indexer = Lucy::Index::Indexer-&#62;new(
+    schema =&#62; $schema,   
+    index  =&#62; &#39;/path/to/index&#39;,
+    create =&#62; 1,
+);
+while ( my ( $title, $content ) = each %source_docs ) {
+    $indexer-&#62;add_doc({
+        title   =&#62; $title,
+        content =&#62; $content,
+    });
+}
+$indexer-&#62;commit;</pre>
+
+<p>Then,
+search the index:</p>
+
+<pre># search.pl
+
+use Lucy::Search::IndexSearcher;
+
+my $searcher = Lucy::Search::IndexSearcher-&#62;new( 
+    index =&#62; &#39;/path/to/index&#39; 
+);
+my $hits = $searcher-&#62;hits( query =&#62; &#34;foo bar&#34; );
+while ( my $hit = $hits-&#62;next ) {
+    print &#34;$hit-&#62;{title}\n&#34;;
+}</pre>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>The Apache Lucy search engine library delivers high-performance,
+modular full-text search.</p>
+
+<h3><a class='u'
+name="Features"
+>Features</a></h3>
+
+<ul>
+<li>Extremely fast.
+A single machine can handle millions of documents.</li>
+
+<li>Scalable to multiple machines.</li>
+
+<li>Incremental indexing (addition/deletion of documents to/from an existing 
index).</li>
+
+<li>Configurable near-real-time index updates.</li>
+
+<li>Unicode support.</li>
+
+<li>Support for boolean operators AND,
+OR,
+and AND NOT; parenthetical groupings; prepended +plus and -minus.</li>
+
+<li>Algorithmic selection of relevant excerpts and highlighting of search 
terms within excerpts.</li>
+
+<li>Highly customizable query and indexing APIs.</li>
+
+<li>Customizable sorting.</li>
+
+<li>Phrase matching.</li>
+
+<li>Stemming.</li>
+
+<li>Stoplists.</li>
+</ul>
+
+<h3><a class='u'
+name="Getting_Started"
+>Getting Started</a></h3>
+
+<p><a href="./Lucy/Simple.html" class="podlinkpod"
+>Lucy::Simple</a> provides a stripped down API which may suffice for many 
tasks.</p>
+
+<p><a href="./Lucy/Docs/Tutorial.html" class="podlinkpod"
+>Lucy::Docs::Tutorial</a> demonstrates how to build a basic CGI search 
application.</p>
+
+<p>The tutorial spends most of its time on these five classes:</p>
+
+<ul>
+<li><a href="./Lucy/Plan/Schema.html" class="podlinkpod"
+>Lucy::Plan::Schema</a> - Plan out your index.</li>
+
+<li><a href="./Lucy/Plan/FieldType.html" class="podlinkpod"
+>Lucy::Plan::FieldType</a> - Define index fields.</li>
+
+<li><a href="./Lucy/Index/Indexer.html" class="podlinkpod"
+>Lucy::Index::Indexer</a> - Manipulate index content.</li>
+
+<li><a href="./Lucy/Search/IndexSearcher.html" class="podlinkpod"
+>Lucy::Search::IndexSearcher</a> - Search an index.</li>
+
+<li><a href="./Lucy/Analysis/EasyAnalyzer.html" class="podlinkpod"
+>Lucy::Analysis::EasyAnalyzer</a> - A one-size-fits-all parser/tokenizer.</li>
+</ul>
+
+<h3><a class='u'
+name="Delving_Deeper"
+>Delving Deeper</a></h3>
+
+<p><a href="./Lucy/Docs/Cookbook.html" class="podlinkpod"
+>Lucy::Docs::Cookbook</a> augments the tutorial with more advanced recipes.</p>
+
+<p>For creating complex queries,
+see <a href="./Lucy/Search/Query.html" class="podlinkpod"
+>Lucy::Search::Query</a> and its subclasses <a 
href="./Lucy/Search/TermQuery.html" class="podlinkpod"
+>TermQuery</a>,
+<a href="./Lucy/Search/PhraseQuery.html" class="podlinkpod"
+>PhraseQuery</a>,
+<a href="./Lucy/Search/ANDQuery.html" class="podlinkpod"
+>ANDQuery</a>,
+<a href="./Lucy/Search/ORQuery.html" class="podlinkpod"
+>ORQuery</a>,
+<a href="./Lucy/Search/NOTQuery.html" class="podlinkpod"
+>NOTQuery</a>,
+<a href="./Lucy/Search/RequiredOptionalQuery.html" class="podlinkpod"
+>RequiredOptionalQuery</a>,
+<a href="./Lucy/Search/MatchAllQuery.html" class="podlinkpod"
+>MatchAllQuery</a>,
+and <a href="./Lucy/Search/NoMatchQuery.html" class="podlinkpod"
+>NoMatchQuery</a>,
+plus <a href="./Lucy/Search/QueryParser.html" class="podlinkpod"
+>Lucy::Search::QueryParser</a>.</p>
+
+<p>For distributed searching,
+see <a href="./LucyX/Remote/SearchServer.html" class="podlinkpod"
+>LucyX::Remote::SearchServer</a>,
+<a href="./LucyX/Remote/SearchClient.html" class="podlinkpod"
+>LucyX::Remote::SearchClient</a>,
+and <a href="./LucyX/Remote/ClusterSearcher.html" class="podlinkpod"
+>LucyX::Remote::ClusterSearcher</a>.</p>
+
+<h3><a class='u'
+name="Backwards_Compatibility_Policy"
+>Backwards Compatibility Policy</a></h3>
+
+<p>Lucy will spin off stable forks into new namespaces periodically.
+The first will be named &#34;Lucy1&#34;.
+Users who require strong backwards compatibility should use a stable fork.</p>
+
+<p>The main namespace,
+&#34;Lucy&#34;,
+is an API-unstable development branch (as hinted at by its 0.x.x version 
number).
+Superficial interface changes happen frequently.
+Hard file format compatibility breaks which require reindexing are rare,
+as we generally try to provide continuity across multiple releases,
+but we reserve the right to make such changes.</p>
+
+<h2><a class='u'
+name="CLASS_METHODS"
+>CLASS METHODS</a></h2>
+
+<p>The Lucy module itself does not have a large interface,
+providing only a single public class method.</p>
+
+<h3><a class='u'
+name="error"
+>error</a></h3>
+
+<pre>my $instream = $folder-&#62;open_in( file =&#62; &#39;foo&#39; ) or die 
Clownfish-&#62;error;</pre>
+
+<p>Access a shared variable which is set by some routines on failure.
+It will always be either a <a href="./Clownfish/Err.html" class="podlinkpod"
+>Clownfish::Err</a> object or undef.</p>
+
+<h2><a class='u'
+name="SUPPORT"
+>SUPPORT</a></h2>
+
+<p>The Apache Lucy homepage,
+where you&#39;ll find links to our mailing lists and so on,
+is <a href="http://lucy.apache.org"; class="podlinkurl"
+>http://lucy.apache.org</a>.
+Please direct support questions to the Lucy users mailing list.</p>
+
+<h2><a class='u'
+name="BUGS"
+>BUGS</a></h2>
+
+<p>Not thread-safe.</p>
+
+<p>Some exceptions leak memory.</p>
+
+<p>If you find a bug,
+please inquire on the Lucy users mailing list about it,
+then report it on the Lucy issue tracker once it has been confirmed: <a 
href="https://issues.apache.org/jira/browse/LUCY"; class="podlinkurl"
+>https://issues.apache.org/jira/browse/LUCY</a>.</p>
+
+<h2><a class='u'
+name="COPYRIGHT"
+>COPYRIGHT</a></h2>
+
+<p>Apache Lucy is distributed under the Apache License,
+Version 2.0,
+as described in the file <code>LICENSE</code> included with the 
distribution.</p>
+
+</div>


Added: lucy/site/trunk/content/docs/perl/Lucy/Analysis/Analyzer.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/perl/Lucy/Analysis/Analyzer.mdtext?rev=1737642&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/perl/Lucy/Analysis/Analyzer.mdtext (added)
+++ lucy/site/trunk/content/docs/perl/Lucy/Analysis/Analyzer.mdtext Mon Apr  4 
09:22:30 2016
@@ -0,0 +1,143 @@
+Title: Lucy::Analysis::Analyzer â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Analysis::Analyzer - Tokenize/modify/filter text.</p>
+
+<h2><a class='u'
+name="SYNOPSIS"
+>SYNOPSIS</a></h2>
+
+<pre># Abstract base class.</pre>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>An Analyzer is a filter which processes text,
+transforming it from one form into another.
+For instance,
+an analyzer might break up a long text into smaller pieces (<a 
href="../../Lucy/Analysis/RegexTokenizer.html" class="podlinkpod"
+>RegexTokenizer</a>),
+or it might perform case folding to facilitate case-insensitive search (<a 
href="../../Lucy/Analysis/Normalizer.html" class="podlinkpod"
+>Normalizer</a>).</p>
+
+<h2><a class='u'
+name="CONSTRUCTORS"
+>CONSTRUCTORS</a></h2>
+
+<h3><a class='u'
+name="new"
+>new</a></h3>
+
+<pre>package MyAnalyzer;
+use base qw( Lucy::Analysis::Analyzer );
+our %foo;
+sub new {
+    my $self = shift-&#62;SUPER::new;
+    my %args = @_;
+    $foo{$$self} = $args{foo};
+    return $self;
+}</pre>
+
+<p>Abstract constructor.
+Takes no arguments.</p>
+
+<h2><a class='u'
+name="ABSTRACT_METHODS"
+>ABSTRACT METHODS</a></h2>
+
+<h3><a class='u'
+name="transform"
+>transform</a></h3>
+
+<pre>my $inversion = $analyzer-&#62;transform($inversion);</pre>
+
+<p>Take a single <a href="../../Lucy/Analysis/Inversion.html" 
class="podlinkpod"
+>Inversion</a> as input and returns an Inversion,
+either the same one (presumably transformed in some way),
+or a new one.</p>
+
+<ul>
+<li><b>inversion</b> - An inversion.</li>
+</ul>
+
+<h2><a class='u'
+name="METHODS"
+>METHODS</a></h2>
+
+<h3><a class='u'
+name="transform_text"
+>transform_text</a></h3>
+
+<pre>my $inversion = $analyzer-&#62;transform_text($text);</pre>
+
+<p>Kick off an analysis chain,
+creating an Inversion from string input.
+The default implementation simply creates an initial Inversion with a single 
Token,
+then calls <a href="#transform" class="podlinkpod"
+>transform()</a>,
+but occasionally subclasses will provide an optimized implementation which 
minimizes string copies.</p>
+
+<ul>
+<li><b>text</b> - A string.</li>
+</ul>
+
+<h3><a class='u'
+name="split"
+>split</a></h3>
+
+<pre>my $arrayref = $analyzer-&#62;split($text);</pre>
+
+<p>Analyze text and return an array of token texts.</p>
+
+<ul>
+<li><b>text</b> - A string.</li>
+</ul>
+
+<h3><a class='u'
+name="dump"
+>dump</a></h3>
+
+<pre>my $obj = $analyzer-&#62;dump();</pre>
+
+<p>Dump the analyzer as hash.</p>
+
+<p>Subclasses should call <a href="#dump" class="podlinkpod"
+>dump()</a> on the superclass.
+The returned object is a hash which should be populated with parameters of the 
analyzer.</p>
+
+<p>Returns: A hash containing a description of the analyzer.</p>
+
+<h3><a class='u'
+name="load"
+>load</a></h3>
+
+<pre>my $obj = $analyzer-&#62;load($dump);</pre>
+
+<p>Reconstruct an analyzer from a dump.</p>
+
+<p>Subclasses should first call <a href="#load" class="podlinkpod"
+>load()</a> on the superclass.
+The returned object is an analyzer which should be reconstructed by setting 
the dumped parameters from the hash contained in <code>dump</code>.</p>
+
+<p>Note that the invocant analyzer is unused.</p>
+
+<ul>
+<li><b>dump</b> - A hash.</li>
+</ul>
+
+<p>Returns: An analyzer.</p>
+
+<h2><a class='u'
+name="INHERITANCE"
+>INHERITANCE</a></h2>
+
+<p>Lucy::Analysis::Analyzer isa Clownfish::Obj.</p>
+
+</div>

Added: lucy/site/trunk/content/docs/perl/Lucy/Analysis/CaseFolder.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/perl/Lucy/Analysis/CaseFolder.mdtext?rev=1737642&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/perl/Lucy/Analysis/CaseFolder.mdtext (added)
+++ lucy/site/trunk/content/docs/perl/Lucy/Analysis/CaseFolder.mdtext Mon Apr  
4 09:22:30 2016
@@ -0,0 +1,73 @@
+Title: Lucy::Analysis::CaseFolder â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Analysis::CaseFolder - Normalize case,
+facilitating case-insensitive search.</p>
+
+<h2><a class='u'
+name="SYNOPSIS"
+>SYNOPSIS</a></h2>
+
+<pre>my $case_folder = Lucy::Analysis::CaseFolder-&#62;new;
+
+my $polyanalyzer = Lucy::Analysis::PolyAnalyzer-&#62;new(
+    analyzers =&#62; [ $tokenizer, $case_folder, $stemmer ],
+);</pre>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>CaseFolder is DEPRECATED.
+Use <a href="../../Lucy/Analysis/Normalizer.html" class="podlinkpod"
+>Normalizer</a> instead.</p>
+
+<p>CaseFolder normalizes text according to Unicode case-folding rules,
+so that searches will be case-insensitive.</p>
+
+<h2><a class='u'
+name="CONSTRUCTORS"
+>CONSTRUCTORS</a></h2>
+
+<h3><a class='u'
+name="new"
+>new</a></h3>
+
+<pre>my $case_folder = Lucy::Analysis::CaseFolder-&#62;new;</pre>
+
+<p>Constructor.
+Takes no arguments.</p>
+
+<h2><a class='u'
+name="METHODS"
+>METHODS</a></h2>
+
+<h3><a class='u'
+name="transform"
+>transform</a></h3>
+
+<pre>my $inversion = $case_folder-&#62;transform($inversion);</pre>
+
+<p>Take a single <a href="../../Lucy/Analysis/Inversion.html" 
class="podlinkpod"
+>Inversion</a> as input and returns an Inversion,
+either the same one (presumably transformed in some way),
+or a new one.</p>
+
+<ul>
+<li><b>inversion</b> - An inversion.</li>
+</ul>
+
+<h2><a class='u'
+name="INHERITANCE"
+>INHERITANCE</a></h2>
+
+<p>Lucy::Analysis::CaseFolder isa <a href="../../Lucy/Analysis/Analyzer.html" 
class="podlinkpod"
+>Lucy::Analysis::Analyzer</a> isa Clownfish::Obj.</p>
+
+</div>

Added: lucy/site/trunk/content/docs/perl/Lucy/Analysis/EasyAnalyzer.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/perl/Lucy/Analysis/EasyAnalyzer.mdtext?rev=1737642&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/perl/Lucy/Analysis/EasyAnalyzer.mdtext (added)
+++ lucy/site/trunk/content/docs/perl/Lucy/Analysis/EasyAnalyzer.mdtext Mon Apr 
 4 09:22:30 2016
@@ -0,0 +1,99 @@
+Title: Lucy::Analysis::EasyAnalyzer â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Analysis::EasyAnalyzer - A simple analyzer chain.</p>
+
+<h2><a class='u'
+name="SYNOPSIS"
+>SYNOPSIS</a></h2>
+
+<pre>my $schema = Lucy::Plan::Schema-&#62;new;
+my $analyzer = Lucy::Analysis::EasyAnalyzer-&#62;new(
+    language =&#62; &#39;en&#39;,
+);
+my $type = Lucy::Plan::FullTextType-&#62;new(
+    analyzer =&#62; $analyzer,
+);
+$schema-&#62;spec_field( name =&#62; &#39;title&#39;,   type =&#62; $type );
+$schema-&#62;spec_field( name =&#62; &#39;content&#39;, type =&#62; $type 
);</pre>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>EasyAnalyzer is an analyzer chain consisting of a <a 
href="../../Lucy/Analysis/StandardTokenizer.html" class="podlinkpod"
+>StandardTokenizer</a>,
+a <a href="../../Lucy/Analysis/Normalizer.html" class="podlinkpod"
+>Normalizer</a>,
+and a <a href="../../Lucy/Analysis/SnowballStemmer.html" class="podlinkpod"
+>SnowballStemmer</a>.</p>
+
+<p>Supported languages:</p>
+
+<pre>en =&#62; English,
+da =&#62; Danish,
+de =&#62; German,
+es =&#62; Spanish,
+fi =&#62; Finnish,
+fr =&#62; French,
+hu =&#62; Hungarian,
+it =&#62; Italian,
+nl =&#62; Dutch,
+no =&#62; Norwegian,
+pt =&#62; Portuguese,
+ro =&#62; Romanian,
+ru =&#62; Russian,
+sv =&#62; Swedish,
+tr =&#62; Turkish,</pre>
+
+<h2><a class='u'
+name="CONSTRUCTORS"
+>CONSTRUCTORS</a></h2>
+
+<h3><a class='u'
+name="new"
+>new</a></h3>
+
+<pre>my $analyzer = Lucy::Analysis::EasyAnalyzer-&#62;new(
+    language  =&#62; &#39;es&#39;,
+);</pre>
+
+<p>Create a new EasyAnalyzer.</p>
+
+<ul>
+<li><b>language</b> - An ISO code from the list of supported languages.</li>
+</ul>
+
+<h2><a class='u'
+name="METHODS"
+>METHODS</a></h2>
+
+<h3><a class='u'
+name="transform"
+>transform</a></h3>
+
+<pre>my $inversion = $easy_analyzer-&#62;transform($inversion);</pre>
+
+<p>Take a single <a href="../../Lucy/Analysis/Inversion.html" 
class="podlinkpod"
+>Inversion</a> as input and returns an Inversion,
+either the same one (presumably transformed in some way),
+or a new one.</p>
+
+<ul>
+<li><b>inversion</b> - An inversion.</li>
+</ul>
+
+<h2><a class='u'
+name="INHERITANCE"
+>INHERITANCE</a></h2>
+
+<p>Lucy::Analysis::EasyAnalyzer isa <a 
href="../../Lucy/Analysis/Analyzer.html" class="podlinkpod"
+>Lucy::Analysis::Analyzer</a> isa Clownfish::Obj.</p>
+
+</div>

Added: lucy/site/trunk/content/docs/perl/Lucy/Analysis/Inversion.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/perl/Lucy/Analysis/Inversion.mdtext?rev=1737642&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/perl/Lucy/Analysis/Inversion.mdtext (added)
+++ lucy/site/trunk/content/docs/perl/Lucy/Analysis/Inversion.mdtext Mon Apr  4 
09:22:30 2016
@@ -0,0 +1,87 @@
+Title: Lucy::Analysis::Inversion â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Analysis::Inversion - A collection of Tokens.</p>
+
+<h2><a class='u'
+name="SYNOPSIS"
+>SYNOPSIS</a></h2>
+
+<pre>my $result = Lucy::Analysis::Inversion-&#62;new;
+
+while (my $token = $inversion-&#62;next) {
+    $result-&#62;append($token);
+}</pre>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>An Inversion is a collection of Token objects which you can add to,
+then iterate over.</p>
+
+<h2><a class='u'
+name="CONSTRUCTORS"
+>CONSTRUCTORS</a></h2>
+
+<h3><a class='u'
+name="new"
+>new</a></h3>
+
+<pre>my $inversion = Lucy::Analysis::Inversion-&#62;new(
+    $seed,  # optional
+);</pre>
+
+<p>Create a new Inversion.</p>
+
+<ul>
+<li><b>seed</b> - An initial Token to start things off,
+which may be undef.</li>
+</ul>
+
+<h2><a class='u'
+name="METHODS"
+>METHODS</a></h2>
+
+<h3><a class='u'
+name="append"
+>append</a></h3>
+
+<pre>$inversion-&#62;append($token);</pre>
+
+<p>Tack a token onto the end of the Inversion.</p>
+
+<ul>
+<li><b>token</b> - A Token.</li>
+</ul>
+
+<h3><a class='u'
+name="next"
+>next</a></h3>
+
+<pre>my $token = $inversion-&#62;next();</pre>
+
+<p>Return the next token in the Inversion until out of tokens.</p>
+
+<h3><a class='u'
+name="reset"
+>reset</a></h3>
+
+<pre>$inversion-&#62;reset();</pre>
+
+<p>Reset the Inversion&#8217;s iterator,
+so that the next call to next() returns the first Token in the inversion.</p>
+
+<h2><a class='u'
+name="INHERITANCE"
+>INHERITANCE</a></h2>
+
+<p>Lucy::Analysis::Inversion isa Clownfish::Obj.</p>
+
+</div>

Added: lucy/site/trunk/content/docs/perl/Lucy/Analysis/Normalizer.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/perl/Lucy/Analysis/Normalizer.mdtext?rev=1737642&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/perl/Lucy/Analysis/Normalizer.mdtext (added)
+++ lucy/site/trunk/content/docs/perl/Lucy/Analysis/Normalizer.mdtext Mon Apr  
4 09:22:30 2016
@@ -0,0 +1,92 @@
+Title: Lucy::Analysis::Normalizer â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Analysis::Normalizer - Unicode normalization,
+case folding and accent stripping.</p>
+
+<h2><a class='u'
+name="SYNOPSIS"
+>SYNOPSIS</a></h2>
+
+<pre>my $normalizer = Lucy::Analysis::Normalizer-&#62;new;
+
+my $polyanalyzer = Lucy::Analysis::PolyAnalyzer-&#62;new(
+    analyzers =&#62; [ $tokenizer, $normalizer, $stemmer ],
+);</pre>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>Normalizer is an <a href="../../Lucy/Analysis/Analyzer.html" 
class="podlinkpod"
+>Analyzer</a> which normalizes tokens to one of the Unicode normalization 
forms.
+Optionally,
+it performs Unicode case folding and converts accented characters to their 
base character.</p>
+
+<p>If you use highlighting,
+Normalizer should be run after tokenization because it might add or remove 
characters.</p>
+
+<h2><a class='u'
+name="CONSTRUCTORS"
+>CONSTRUCTORS</a></h2>
+
+<h3><a class='u'
+name="new"
+>new</a></h3>
+
+<pre>my $normalizer = Lucy::Analysis::Normalizer-&#62;new(
+    normalization_form =&#62; &#39;NFKC&#39;,
+    case_fold          =&#62; 1,
+    strip_accents      =&#62; 0,
+);</pre>
+
+<p>Create a new Normalizer.</p>
+
+<ul>
+<li><b>normalization_form</b> - Unicode normalization form,
+can be one of &#8216;NFC&#8217;,
+&#8216;NFKC&#8217;,
+&#8216;NFD&#8217;,
+&#8216;NFKD&#8217;.
+Defaults to &#8216;NFKC&#8217;.</li>
+
+<li><b>case_fold</b> - Perform case folding,
+default is true.</li>
+
+<li><b>strip_accents</b> - Strip accents,
+default is false.</li>
+</ul>
+
+<h2><a class='u'
+name="METHODS"
+>METHODS</a></h2>
+
+<h3><a class='u'
+name="transform"
+>transform</a></h3>
+
+<pre>my $inversion = $normalizer-&#62;transform($inversion);</pre>
+
+<p>Take a single <a href="../../Lucy/Analysis/Inversion.html" 
class="podlinkpod"
+>Inversion</a> as input and returns an Inversion,
+either the same one (presumably transformed in some way),
+or a new one.</p>
+
+<ul>
+<li><b>inversion</b> - An inversion.</li>
+</ul>
+
+<h2><a class='u'
+name="INHERITANCE"
+>INHERITANCE</a></h2>
+
+<p>Lucy::Analysis::Normalizer isa <a href="../../Lucy/Analysis/Analyzer.html" 
class="podlinkpod"
+>Lucy::Analysis::Analyzer</a> isa Clownfish::Obj.</p>
+
+</div>

Added: lucy/site/trunk/content/docs/perl/Lucy/Analysis/PolyAnalyzer.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/perl/Lucy/Analysis/PolyAnalyzer.mdtext?rev=1737642&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/perl/Lucy/Analysis/PolyAnalyzer.mdtext (added)
+++ lucy/site/trunk/content/docs/perl/Lucy/Analysis/PolyAnalyzer.mdtext Mon Apr 
 4 09:22:30 2016
@@ -0,0 +1,134 @@
+Title: Lucy::Analysis::PolyAnalyzer â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Analysis::PolyAnalyzer - Multiple Analyzers in series.</p>
+
+<h2><a class='u'
+name="SYNOPSIS"
+>SYNOPSIS</a></h2>
+
+<pre>my $schema = Lucy::Plan::Schema-&#62;new;
+my $polyanalyzer = Lucy::Analysis::PolyAnalyzer-&#62;new( 
+    analyzers =&#62; \@analyzers,
+);
+my $type = Lucy::Plan::FullTextType-&#62;new(
+    analyzer =&#62; $polyanalyzer,
+);
+$schema-&#62;spec_field( name =&#62; &#39;title&#39;,   type =&#62; $type );
+$schema-&#62;spec_field( name =&#62; &#39;content&#39;, type =&#62; $type 
);</pre>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>A PolyAnalyzer is a series of <a href="../../Lucy/Analysis/Analyzer.html" 
class="podlinkpod"
+>Analyzers</a>,
+each of which will be called upon to &#8220;analyze&#8221; text in turn.
+You can either provide the Analyzers yourself,
+or you can specify a supported language,
+in which case a PolyAnalyzer consisting of a <a 
href="../../Lucy/Analysis/CaseFolder.html" class="podlinkpod"
+>CaseFolder</a>,
+a <a href="../../Lucy/Analysis/RegexTokenizer.html" class="podlinkpod"
+>RegexTokenizer</a>,
+and a <a href="../../Lucy/Analysis/SnowballStemmer.html" class="podlinkpod"
+>SnowballStemmer</a> will be generated for you.</p>
+
+<p>The language parameter is DEPRECATED.
+Use <a href="../../Lucy/Analysis/EasyAnalyzer.html" class="podlinkpod"
+>EasyAnalyzer</a> instead.</p>
+
+<p>Supported languages:</p>
+
+<pre>en =&#62; English,
+da =&#62; Danish,
+de =&#62; German,
+es =&#62; Spanish,
+fi =&#62; Finnish,
+fr =&#62; French,
+hu =&#62; Hungarian,
+it =&#62; Italian,
+nl =&#62; Dutch,
+no =&#62; Norwegian,
+pt =&#62; Portuguese,
+ro =&#62; Romanian,
+ru =&#62; Russian,
+sv =&#62; Swedish,
+tr =&#62; Turkish,</pre>
+
+<h2><a class='u'
+name="CONSTRUCTORS"
+>CONSTRUCTORS</a></h2>
+
+<h3><a class='u'
+name="new"
+>new</a></h3>
+
+<pre>my $tokenizer    = Lucy::Analysis::StandardTokenizer-&#62;new;
+my $normalizer   = Lucy::Analysis::Normalizer-&#62;new;
+my $stemmer      = Lucy::Analysis::SnowballStemmer-&#62;new( language =&#62; 
&#39;en&#39; );
+my $polyanalyzer = Lucy::Analysis::PolyAnalyzer-&#62;new(
+    analyzers =&#62; [ $tokenizer, $normalizer, $stemmer, ], );</pre>
+
+<p>Create a new PolyAnalyzer.</p>
+
+<ul>
+<li><b>language</b> - An ISO code from the list of supported languages.
+DEPRECATED,
+use <a href="../../Lucy/Analysis/EasyAnalyzer.html" class="podlinkpod"
+>EasyAnalyzer</a> instead.</li>
+
+<li><b>analyzers</b> - An array of Analyzers.
+The order of the analyzers matters.
+Don&#8217;t put a SnowballStemmer before a RegexTokenizer (can&#8217;t stem 
whole documents or paragraphs &#8211; just individual words),
+or a SnowballStopFilter after a SnowballStemmer (stemmed words,
+e.g.
+&#8220;themselv&#8221;,
+will not appear in a stoplist).
+In general,
+the sequence should be: tokenize,
+normalize,
+stopalize,
+stem.</li>
+</ul>
+
+<h2><a class='u'
+name="METHODS"
+>METHODS</a></h2>
+
+<h3><a class='u'
+name="get_analyzers"
+>get_analyzers</a></h3>
+
+<pre>my $arrayref = $poly_analyzer-&#62;get_analyzers();</pre>
+
+<p>Getter for &#8220;analyzers&#8221; member.</p>
+
+<h3><a class='u'
+name="transform"
+>transform</a></h3>
+
+<pre>my $inversion = $poly_analyzer-&#62;transform($inversion);</pre>
+
+<p>Take a single <a href="../../Lucy/Analysis/Inversion.html" 
class="podlinkpod"
+>Inversion</a> as input and returns an Inversion,
+either the same one (presumably transformed in some way),
+or a new one.</p>
+
+<ul>
+<li><b>inversion</b> - An inversion.</li>
+</ul>
+
+<h2><a class='u'
+name="INHERITANCE"
+>INHERITANCE</a></h2>
+
+<p>Lucy::Analysis::PolyAnalyzer isa <a 
href="../../Lucy/Analysis/Analyzer.html" class="podlinkpod"
+>Lucy::Analysis::Analyzer</a> isa Clownfish::Obj.</p>
+
+</div>

Added: lucy/site/trunk/content/docs/perl/Lucy/Analysis/RegexTokenizer.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/perl/Lucy/Analysis/RegexTokenizer.mdtext?rev=1737642&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/perl/Lucy/Analysis/RegexTokenizer.mdtext 
(added)
+++ lucy/site/trunk/content/docs/perl/Lucy/Analysis/RegexTokenizer.mdtext Mon 
Apr  4 09:22:30 2016
@@ -0,0 +1,108 @@
+Title: Lucy::Analysis::RegexTokenizer â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Analysis::RegexTokenizer - Split a string into tokens.</p>
+
+<h2><a class='u'
+name="SYNOPSIS"
+>SYNOPSIS</a></h2>
+
+<pre>my $whitespace_tokenizer
+    = Lucy::Analysis::RegexTokenizer-&#62;new( pattern =&#62; &#39;\S+&#39; );
+
+# or...
+my $word_char_tokenizer
+    = Lucy::Analysis::RegexTokenizer-&#62;new( pattern =&#62; &#39;\w+&#39; );
+
+# or...
+my $apostrophising_tokenizer = Lucy::Analysis::RegexTokenizer-&#62;new;
+
+# Then... once you have a tokenizer, put it into a PolyAnalyzer:
+my $polyanalyzer = Lucy::Analysis::PolyAnalyzer-&#62;new(
+    analyzers =&#62; [ $word_char_tokenizer, $normalizer, $stemmer ], );</pre>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>Generically,
+&#8220;tokenizing&#8221; is a process of breaking up a string into an array of 
&#8220;tokens&#8221;.
+For instance,
+the string &#8220;three blind mice&#8221; might be tokenized into 
&#8220;three&#8221;,
+&#8220;blind&#8221;,
+&#8220;mice&#8221;.</p>
+
+<p>Lucy::Analysis::RegexTokenizer decides where it should break up the text 
based on a regular expression compiled from a supplied <code>pattern</code> 
matching one token.
+If our source string is&#8230;</p>
+
+<pre>&#34;Eats, Shoots and Leaves.&#34;</pre>
+
+<p>&#8230; then a &#8220;whitespace tokenizer&#8221; with a 
<code>pattern</code> of <code>&#34;\\S+&#34;</code> produces&#8230;</p>
+
+<pre>Eats,
+Shoots
+and
+Leaves.</pre>
+
+<p>&#8230; while a &#8220;word character tokenizer&#8221; with a 
<code>pattern</code> of <code>&#34;\\w+&#34;</code> produces&#8230;</p>
+
+<pre>Eats
+Shoots
+and
+Leaves</pre>
+
+<p>&#8230; the difference being that the word character tokenizer skips over 
punctuation as well as whitespace when determining token boundaries.</p>
+
+<h2><a class='u'
+name="CONSTRUCTORS"
+>CONSTRUCTORS</a></h2>
+
+<h3><a class='u'
+name="new"
+>new</a></h3>
+
+<pre>my $word_char_tokenizer = Lucy::Analysis::RegexTokenizer-&#62;new(
+    pattern =&#62; &#39;\w+&#39;,    # required
+);</pre>
+
+<p>Create a new RegexTokenizer.</p>
+
+<ul>
+<li><b>pattern</b> - A string specifying a Perl-syntax regular expression 
which should match one token.
+The default value is <code>\w+(?:[\x{2019}&#39;]\w+)*</code>,
+which matches &#8220;it&#8217;s&#8221; as well as &#8220;it&#8221; and 
&#8220;O&#8217;Henry&#8217;s&#8221; as well as &#8220;Henry&#8221;.</li>
+</ul>
+
+<h2><a class='u'
+name="METHODS"
+>METHODS</a></h2>
+
+<h3><a class='u'
+name="transform"
+>transform</a></h3>
+
+<pre>my $inversion = $regex_tokenizer-&#62;transform($inversion);</pre>
+
+<p>Take a single <a href="../../Lucy/Analysis/Inversion.html" 
class="podlinkpod"
+>Inversion</a> as input and returns an Inversion,
+either the same one (presumably transformed in some way),
+or a new one.</p>
+
+<ul>
+<li><b>inversion</b> - An inversion.</li>
+</ul>
+
+<h2><a class='u'
+name="INHERITANCE"
+>INHERITANCE</a></h2>
+
+<p>Lucy::Analysis::RegexTokenizer isa <a 
href="../../Lucy/Analysis/Analyzer.html" class="podlinkpod"
+>Lucy::Analysis::Analyzer</a> isa Clownfish::Obj.</p>
+
+</div>

Added: lucy/site/trunk/content/docs/perl/Lucy/Analysis/SnowballStemmer.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/perl/Lucy/Analysis/SnowballStemmer.mdtext?rev=1737642&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/perl/Lucy/Analysis/SnowballStemmer.mdtext 
(added)
+++ lucy/site/trunk/content/docs/perl/Lucy/Analysis/SnowballStemmer.mdtext Mon 
Apr  4 09:22:30 2016
@@ -0,0 +1,78 @@
+Title: Lucy::Analysis::SnowballStemmer â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Analysis::SnowballStemmer - Reduce related words to a shared root.</p>
+
+<h2><a class='u'
+name="SYNOPSIS"
+>SYNOPSIS</a></h2>
+
+<pre>my $stemmer = Lucy::Analysis::SnowballStemmer-&#62;new( language =&#62; 
&#39;es&#39; );
+
+my $polyanalyzer = Lucy::Analysis::PolyAnalyzer-&#62;new(
+    analyzers =&#62; [ $tokenizer, $normalizer, $stemmer ],
+);</pre>
+
+<p>This class is a wrapper around the Snowball stemming library,
+so it supports the same languages.</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>SnowballStemmer is an <a href="../../Lucy/Analysis/Analyzer.html" 
class="podlinkpod"
+>Analyzer</a> which reduces related words to a root form (using the 
&#8220;Snowball&#8221; stemming library).
+For instance,
+&#8220;horse&#8221;,
+&#8220;horses&#8221;,
+and &#8220;horsing&#8221; all become &#8220;hors&#8221; &#8211; so that a 
search for &#8216;horse&#8217; will also match documents containing 
&#8216;horses&#8217; and &#8216;horsing&#8217;.</p>
+
+<h2><a class='u'
+name="CONSTRUCTORS"
+>CONSTRUCTORS</a></h2>
+
+<h3><a class='u'
+name="new"
+>new</a></h3>
+
+<pre>my $stemmer = Lucy::Analysis::SnowballStemmer-&#62;new( language =&#62; 
&#39;es&#39; );</pre>
+
+<p>Create a new SnowballStemmer.</p>
+
+<ul>
+<li><b>language</b> - A two-letter ISO code identifying a language supported 
by Snowball.</li>
+</ul>
+
+<h2><a class='u'
+name="METHODS"
+>METHODS</a></h2>
+
+<h3><a class='u'
+name="transform"
+>transform</a></h3>
+
+<pre>my $inversion = $snowball_stemmer-&#62;transform($inversion);</pre>
+
+<p>Take a single <a href="../../Lucy/Analysis/Inversion.html" 
class="podlinkpod"
+>Inversion</a> as input and returns an Inversion,
+either the same one (presumably transformed in some way),
+or a new one.</p>
+
+<ul>
+<li><b>inversion</b> - An inversion.</li>
+</ul>
+
+<h2><a class='u'
+name="INHERITANCE"
+>INHERITANCE</a></h2>
+
+<p>Lucy::Analysis::SnowballStemmer isa <a 
href="../../Lucy/Analysis/Analyzer.html" class="podlinkpod"
+>Lucy::Analysis::Analyzer</a> isa Clownfish::Obj.</p>
+
+</div>

Added: lucy/site/trunk/content/docs/perl/Lucy/Analysis/SnowballStopFilter.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/perl/Lucy/Analysis/SnowballStopFilter.mdtext?rev=1737642&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/perl/Lucy/Analysis/SnowballStopFilter.mdtext 
(added)
+++ lucy/site/trunk/content/docs/perl/Lucy/Analysis/SnowballStopFilter.mdtext 
Mon Apr  4 09:22:30 2016
@@ -0,0 +1,115 @@
+Title: Lucy::Analysis::SnowballStopFilter â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Analysis::SnowballStopFilter - Suppress a &#8220;stoplist&#8221; of 
common words.</p>
+
+<h2><a class='u'
+name="SYNOPSIS"
+>SYNOPSIS</a></h2>
+
+<pre>my $stopfilter = Lucy::Analysis::SnowballStopFilter-&#62;new(
+    language =&#62; &#39;fr&#39;,
+);
+my $polyanalyzer = Lucy::Analysis::PolyAnalyzer-&#62;new(
+    analyzers =&#62; [ $tokenizer, $normalizer, $stopfilter, $stemmer ],
+);</pre>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>A &#8220;stoplist&#8221; is collection of &#8220;stopwords&#8221;: words 
which are common enough to be of little value when determining search results.
+For example,
+so many documents in English contain &#8220;the&#8221;,
+&#8220;if&#8221;,
+and &#8220;maybe&#8221; that it may improve both performance and relevance to 
block them.</p>
+
+<p>Before filtering stopwords:</p>
+
+<pre>(&#34;i&#34;, &#34;am&#34;, &#34;the&#34;, &#34;walrus&#34;)</pre>
+
+<p>After filtering stopwords:</p>
+
+<pre>(&#34;walrus&#34;)</pre>
+
+<p>SnowballStopFilter provides default stoplists for several languages,
+courtesy of the <a href="http://snowball.tartarus.org"; class="podlinkurl"
+>Snowball project</a>,
+or you may supply your own.</p>
+
+<pre>|-----------------------|
+| ISO CODE | LANGUAGE   |
+|-----------------------|
+| da       | Danish     |
+| de       | German     |
+| en       | English    |
+| es       | Spanish    |
+| fi       | Finnish    |
+| fr       | French     |
+| hu       | Hungarian  |
+| it       | Italian    |
+| nl       | Dutch      |
+| no       | Norwegian  |
+| pt       | Portuguese |
+| sv       | Swedish    |
+| ru       | Russian    |
+|-----------------------|</pre>
+
+<h2><a class='u'
+name="CONSTRUCTORS"
+>CONSTRUCTORS</a></h2>
+
+<h3><a class='u'
+name="new"
+>new</a></h3>
+
+<pre>my $stopfilter = Lucy::Analysis::SnowballStopFilter-&#62;new(
+    language =&#62; &#39;de&#39;,
+);
+
+# or...
+my $stopfilter = Lucy::Analysis::SnowballStopFilter-&#62;new(
+    stoplist =&#62; \%stoplist,
+);</pre>
+
+<p>Create a new SnowballStopFilter.</p>
+
+<ul>
+<li><b>stoplist</b> - A hash with stopwords as the keys.</li>
+
+<li><b>language</b> - The ISO code for a supported language.</li>
+</ul>
+
+<h2><a class='u'
+name="METHODS"
+>METHODS</a></h2>
+
+<h3><a class='u'
+name="transform"
+>transform</a></h3>
+
+<pre>my $inversion = $snowball_stop_filter-&#62;transform($inversion);</pre>
+
+<p>Take a single <a href="../../Lucy/Analysis/Inversion.html" 
class="podlinkpod"
+>Inversion</a> as input and returns an Inversion,
+either the same one (presumably transformed in some way),
+or a new one.</p>
+
+<ul>
+<li><b>inversion</b> - An inversion.</li>
+</ul>
+
+<h2><a class='u'
+name="INHERITANCE"
+>INHERITANCE</a></h2>
+
+<p>Lucy::Analysis::SnowballStopFilter isa <a 
href="../../Lucy/Analysis/Analyzer.html" class="podlinkpod"
+>Lucy::Analysis::Analyzer</a> isa Clownfish::Obj.</p>
+
+</div>

Added: lucy/site/trunk/content/docs/perl/Lucy/Analysis/StandardTokenizer.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/perl/Lucy/Analysis/StandardTokenizer.mdtext?rev=1737642&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/perl/Lucy/Analysis/StandardTokenizer.mdtext 
(added)
+++ lucy/site/trunk/content/docs/perl/Lucy/Analysis/StandardTokenizer.mdtext 
Mon Apr  4 09:22:30 2016
@@ -0,0 +1,75 @@
+Title: Lucy::Analysis::StandardTokenizer â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Analysis::StandardTokenizer - Split a string into tokens.</p>
+
+<h2><a class='u'
+name="SYNOPSIS"
+>SYNOPSIS</a></h2>
+
+<pre>my $tokenizer = Lucy::Analysis::StandardTokenizer-&#62;new;
+
+# Then... once you have a tokenizer, put it into a PolyAnalyzer:
+my $polyanalyzer = Lucy::Analysis::PolyAnalyzer-&#62;new(
+    analyzers =&#62; [ $tokenizer, $normalizer, $stemmer ], );</pre>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>Generically,
+&#8220;tokenizing&#8221; is a process of breaking up a string into an array of 
&#8220;tokens&#8221;.
+For instance,
+the string &#8220;three blind mice&#8221; might be tokenized into 
&#8220;three&#8221;,
+&#8220;blind&#8221;,
+&#8220;mice&#8221;.</p>
+
+<p>Lucy::Analysis::StandardTokenizer breaks up the text at the word boundaries 
defined in Unicode Standard Annex #29.
+It then returns those words that contain alphabetic or numeric characters.</p>
+
+<h2><a class='u'
+name="CONSTRUCTORS"
+>CONSTRUCTORS</a></h2>
+
+<h3><a class='u'
+name="new"
+>new</a></h3>
+
+<pre>my $tokenizer = Lucy::Analysis::StandardTokenizer-&#62;new;</pre>
+
+<p>Constructor.
+Takes no arguments.</p>
+
+<h2><a class='u'
+name="METHODS"
+>METHODS</a></h2>
+
+<h3><a class='u'
+name="transform"
+>transform</a></h3>
+
+<pre>my $inversion = $standard_tokenizer-&#62;transform($inversion);</pre>
+
+<p>Take a single <a href="../../Lucy/Analysis/Inversion.html" 
class="podlinkpod"
+>Inversion</a> as input and returns an Inversion,
+either the same one (presumably transformed in some way),
+or a new one.</p>
+
+<ul>
+<li><b>inversion</b> - An inversion.</li>
+</ul>
+
+<h2><a class='u'
+name="INHERITANCE"
+>INHERITANCE</a></h2>
+
+<p>Lucy::Analysis::StandardTokenizer isa <a 
href="../../Lucy/Analysis/Analyzer.html" class="podlinkpod"
+>Lucy::Analysis::Analyzer</a> isa Clownfish::Obj.</p>
+
+</div>

Added: lucy/site/trunk/content/docs/perl/Lucy/Analysis/Token.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/perl/Lucy/Analysis/Token.mdtext?rev=1737642&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/perl/Lucy/Analysis/Token.mdtext (added)
+++ lucy/site/trunk/content/docs/perl/Lucy/Analysis/Token.mdtext Mon Apr  4 
09:22:30 2016
@@ -0,0 +1,154 @@
+Title: Lucy::Analysis::Token â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Analysis::Token - Unit of text.</p>
+
+<h2><a class='u'
+name="SYNOPSIS"
+>SYNOPSIS</a></h2>
+
+<pre>    my $token = Lucy::Analysis::Token-&#62;new(
+        text         =&#62; &#39;blind&#39;,
+        start_offset =&#62; 8,
+        end_offset   =&#62; 13,
+    );
+
+    $token-&#62;set_text(&#39;mice&#39;);</pre>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>Token is the fundamental unit used by Apache Lucy&#8217;s Analyzer 
subclasses.
+Each Token has 5 attributes: <code>text</code>,
+<code>start_offset</code>,
+<code>end_offset</code>,
+<code>boost</code>,
+and <code>pos_inc</code>.</p>
+
+<p>The <code>text</code> attribute is a Unicode string encoded as UTF-8.</p>
+
+<p><code>start_offset</code> is the start point of the token text,
+measured in Unicode code points from the top of the stored field; 
<code>end_offset</code> delimits the corresponding closing boundary.
+<code>start_offset</code> and <code>end_offset</code> locate the Token within 
a larger context,
+even if the Token&#8217;s text attribute gets modified &#8211; by stemming,
+for instance.
+The Token for &#8220;beating&#8221; in the text &#8220;beating a dead 
horse&#8221; begins life with a start_offset of 0 and an end_offset of 7; after 
stemming,
+the text is &#8220;beat&#8221;,
+but the start_offset is still 0 and the end_offset is still 7.
+This allows &#8220;beating&#8221; to be highlighted correctly after a search 
matches &#8220;beat&#8221;.</p>
+
+<p><code>boost</code> is a per-token weight.
+Use this when you want to assign more or less importance to a particular token,
+as you might for emboldened text within an HTML document,
+for example.
+(Note: The field this token belongs to must be spec&#8217;d to use a posting 
of type RichPosting.)</p>
+
+<p><code>pos_inc</code> is the POSition INCrement,
+measured in Tokens.
+This attribute,
+which defaults to 1,
+is a an advanced tool for manipulating phrase matching.
+Ordinarily,
+Tokens are assigned consecutive position numbers: 0,
+1,
+and 2 for <code>&#34;three blind mice&#34;</code>.
+However,
+if you set the position increment for &#8220;blind&#8221; to,
+say,
+1000,
+then the three tokens will end up assigned to positions 0,
+1,
+and 1001 &#8211; and will no longer produce a phrase match for the query 
<code>&#34;three blind mice&#34;</code>.</p>
+
+<h2><a class='u'
+name="CONSTRUCTORS"
+>CONSTRUCTORS</a></h2>
+
+<h3><a class='u'
+name="new"
+>new</a></h3>
+
+<pre>my $token = Lucy::Analysis::Token-&#62;new(
+    text         =&#62; $text,          # required
+    start_offset =&#62; $start_offset,  # required
+    end_offset   =&#62; $end_offset,    # required
+    boost        =&#62; 1.0,            # optional
+    pos_inc      =&#62; 1,              # optional
+);</pre>
+
+<ul>
+<li><b>text</b> - A string.</li>
+
+<li><b>start_offset</b> - Start offset into the original document in Unicode 
code points.</li>
+
+<li><b>start_offset</b> - End offset into the original document in Unicode 
code points.</li>
+
+<li><b>boost</b> - Per-token weight.</li>
+
+<li><b>pos_inc</b> - Position increment for phrase matching.</li>
+</ul>
+
+<h2><a class='u'
+name="METHODS"
+>METHODS</a></h2>
+
+<h3><a class='u'
+name="get_text"
+>get_text</a></h3>
+
+<pre>my $text = $token-&#62;get_text;</pre>
+
+<p>Get the token&#39;s text.</p>
+
+<h3><a class='u'
+name="set_text"
+>set_text</a></h3>
+
+<pre>$token-&#62;set_text($text);</pre>
+
+<p>Set the token&#39;s text.</p>
+
+<h3><a class='u'
+name="get_start_offset"
+>get_start_offset</a></h3>
+
+<pre>my $int = $token-&#62;get_start_offset();</pre>
+
+<h3><a class='u'
+name="get_end_offset"
+>get_end_offset</a></h3>
+
+<pre>my $int = $token-&#62;get_end_offset();</pre>
+
+<h3><a class='u'
+name="get_boost"
+>get_boost</a></h3>
+
+<pre>my $float = $token-&#62;get_boost();</pre>
+
+<h3><a class='u'
+name="get_pos_inc"
+>get_pos_inc</a></h3>
+
+<pre>my $int = $token-&#62;get_pos_inc();</pre>
+
+<h3><a class='u'
+name="get_len"
+>get_len</a></h3>
+
+<pre>my $int = $token-&#62;get_len();</pre>
+
+<h2><a class='u'
+name="INHERITANCE"
+>INHERITANCE</a></h2>
+
+<p>Lucy::Analysis::Token isa Clownfish::Obj.</p>
+
+</div>

Added: lucy/site/trunk/content/docs/perl/Lucy/Docs/Cookbook.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/perl/Lucy/Docs/Cookbook.mdtext?rev=1737642&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/perl/Lucy/Docs/Cookbook.mdtext (added)
+++ lucy/site/trunk/content/docs/perl/Lucy/Docs/Cookbook.mdtext Mon Apr  4 
09:22:30 2016
@@ -0,0 +1,52 @@
+Title: Lucy::Docs::Cookbook â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::Cookbook - Apache Lucy recipes</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>The Cookbook provides thematic documentation covering some of Apache 
Lucy&#8217;s more sophisticated features.
+For a step-by-step introduction to Lucy,
+see <a href="../../Lucy/Docs/Tutorial.html" class="podlinkpod"
+>Tutorial</a>.</p>
+
+<h3><a class='u'
+name="Chapters"
+>Chapters</a></h3>
+
+<ul>
+<li><a href="../../Lucy/Docs/Cookbook/FastUpdates.html" class="podlinkpod"
+>FastUpdates</a> - While index updates are fast on average,
+worst-case update performance may be significantly slower.
+To make index updates consistently quick,
+we must manually intervene to control the process of index segment 
consolidation.</li>
+
+<li><a href="../../Lucy/Docs/Cookbook/CustomQuery.html" class="podlinkpod"
+>CustomQuery</a> - Explore Lucy&#8217;s support for custom query types by 
creating a &#8220;PrefixQuery&#8221; class to handle trailing wildcards.</li>
+
+<li><a href="../../Lucy/Docs/Cookbook/CustomQueryParser.html" 
class="podlinkpod"
+>CustomQueryParser</a> - Define your own custom search query syntax using <a 
href="../../Lucy/Search/QueryParser.html" class="podlinkpod"
+>QueryParser</a> and Parse::RecDescent.</li>
+</ul>
+
+<h3><a class='u'
+name="Materials"
+>Materials</a></h3>
+
+<p>Some of the recipes in the Cookbook reference the completed <a 
href="../../Lucy/Docs/Tutorial.html" class="podlinkpod"
+>Tutorial</a> application.
+These materials can be found in the <code>sample</code> directory at the root 
of the Lucy distribution:</p>
+
+<pre>sample/indexer.pl        # indexing app
+sample/search.cgi        # search app
+sample/us_constitution   # corpus</pre>
+
+</div>

Added: lucy/site/trunk/content/docs/perl/Lucy/Docs/Cookbook/CustomQuery.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/perl/Lucy/Docs/Cookbook/CustomQuery.mdtext?rev=1737642&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/perl/Lucy/Docs/Cookbook/CustomQuery.mdtext 
(added)
+++ lucy/site/trunk/content/docs/perl/Lucy/Docs/Cookbook/CustomQuery.mdtext Mon 
Apr  4 09:22:30 2016
@@ -0,0 +1,321 @@
+Title: Lucy::Docs::Cookbook::CustomQuery â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::Cookbook::CustomQuery - Sample subclass of Query</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>Explore Apache Lucy&#8217;s support for custom query types by creating a 
&#8220;PrefixQuery&#8221; class to handle trailing wildcards.</p>
+
+<pre>my $prefix_query = PrefixQuery-&#62;new(
+    field        =&#62; &#39;content&#39;,
+    query_string =&#62; &#39;foo*&#39;,
+);
+my $hits = $searcher-&#62;hits( query =&#62; $prefix_query );
+...</pre>
+
+<h3><a class='u'
+name="Query,_Compiler,_and_Matcher"
+>Query,
+Compiler,
+and Matcher</a></h3>
+
+<p>To add support for a new query type,
+we need three classes: a Query,
+a Compiler,
+and a Matcher.</p>
+
+<ul>
+<li>PrefixQuery - a subclass of <a href="../../../Lucy/Search/Query.html" 
class="podlinkpod"
+>Query</a>,
+and the only class that client code will deal with directly.</li>
+
+<li>PrefixCompiler - a subclass of <a 
href="../../../Lucy/Search/Compiler.html" class="podlinkpod"
+>Compiler</a>,
+whose primary role is to compile a PrefixQuery to a PrefixMatcher.</li>
+
+<li>PrefixMatcher - a subclass of <a href="../../../Lucy/Search/Matcher.html" 
class="podlinkpod"
+>Matcher</a>,
+which does the heavy lifting: it applies the query to individual documents and 
assigns a score to each match.</li>
+</ul>
+
+<p>The PrefixQuery class on its own isn&#8217;t enough because a Query 
object&#8217;s role is limited to expressing an abstract specification for the 
search.
+A Query is basically nothing but metadata; execution is left to the 
Query&#8217;s companion Compiler and Matcher.</p>
+
+<p>Here&#8217;s a simplified sketch illustrating how a Searcher&#8217;s hits() 
method ties together the three classes.</p>
+
+<pre>sub hits {
+    my ( $self, $query ) = @_;
+    my $compiler = $query-&#62;make_compiler(
+        searcher =&#62; $self,
+        boost    =&#62; $query-&#62;get_boost,
+    );
+    my $matcher = $compiler-&#62;make_matcher(
+        reader     =&#62; $self-&#62;get_reader,
+        need_score =&#62; 1,
+    );
+    my @hits = $matcher-&#62;capture_hits;
+    return \@hits;
+}</pre>
+
+<h4><a class='u'
+name="PrefixQuery"
+>PrefixQuery</a></h4>
+
+<p>Our PrefixQuery class will have two attributes: a query string and a field 
name.</p>
+
+<pre>package PrefixQuery;
+use base qw( Lucy::Search::Query );
+use Carp;
+use Scalar::Util qw( blessed );
+
+# Inside-out member vars and hand-rolled accessors.
+my %query_string;
+my %field;
+sub get_query_string { my $self = shift; return $query_string{$$self} }
+sub get_field        { my $self = shift; return $field{$$self} }</pre>
+
+<p>PrefixQuery&#8217;s constructor collects and validates the attributes.</p>
+
+<pre>sub new {
+    my ( $class, %args ) = @_;
+    my $query_string = delete $args{query_string};
+    my $field        = delete $args{field};
+    my $self         = $class-&#62;SUPER::new(%args);
+    confess(&#34;&#39;query_string&#39; param is required&#34;)
+        unless defined $query_string;
+    confess(&#34;Invalid query_string: &#39;$query_string&#39;&#34;)
+        unless $query_string =~ /\*\s*$/;
+    confess(&#34;&#39;field&#39; param is required&#34;)
+        unless defined $field;
+    $query_string{$$self} = $query_string;
+    $field{$$self}        = $field;
+    return $self;
+}</pre>
+
+<p>Since this is an inside-out class,
+we&#8217;ll need a destructor:</p>
+
+<pre>sub DESTROY {
+    my $self = shift;
+    delete $query_string{$$self};
+    delete $field{$$self};
+    $self-&#62;SUPER::DESTROY;
+}</pre>
+
+<p>The equals() method determines whether two Queries are logically 
equivalent:</p>
+
+<pre>sub equals {
+    my ( $self, $other ) = @_;
+    return 0 unless blessed($other);
+    return 0 unless $other-&#62;isa(&#34;PrefixQuery&#34;);
+    return 0 unless $field{$$self} eq $field{$$other};
+    return 0 unless $query_string{$$self} eq $query_string{$$other};
+    return 1;
+}</pre>
+
+<p>The last thing we&#8217;ll need is a make_compiler() factory method which 
kicks out a subclass of <a href="../../../Lucy/Search/Compiler.html" 
class="podlinkpod"
+>Compiler</a>.</p>
+
+<pre>sub make_compiler {
+    my ( $self, %args ) = @_;
+    my $subordinate = delete $args{subordinate};
+    my $compiler = PrefixCompiler-&#62;new( %args, parent =&#62; $self );
+    $compiler-&#62;normalize unless $subordinate;
+    return $compiler;
+}</pre>
+
+<h4><a class='u'
+name="PrefixCompiler"
+>PrefixCompiler</a></h4>
+
+<p>PrefixQuery&#8217;s make_compiler() method will be called internally at 
search-time by objects which subclass <a 
href="../../../Lucy/Search/Searcher.html" class="podlinkpod"
+>Searcher</a> &#8211; such as <a 
href="../../../Lucy/Search/IndexSearcher.html" class="podlinkpod"
+>IndexSearchers</a>.</p>
+
+<p>A Searcher is associated with a particular collection of documents.
+These documents may all reside in one index,
+as with IndexSearcher,
+or they may be spread out across multiple indexes on one or more machines,
+as with LucyX::Remote::ClusterSearcher.</p>
+
+<p>Searcher objects have access to certain statistical information about the 
collections they represent; for instance,
+a Searcher can tell you how many documents are in the collection&#8230;</p>
+
+<pre>my $maximum_number_of_docs_in_collection = $searcher-&#62;doc_max;</pre>
+
+<p>&#8230; or how many documents a specific term appears in:</p>
+
+<pre>my $term_appears_in_this_many_docs = $searcher-&#62;doc_freq(
+    field =&#62; &#39;content&#39;,
+    term  =&#62; &#39;foo&#39;,
+);</pre>
+
+<p>Such information can be used by sophisticated Compiler implementations to 
assign more or less heft to individual queries or sub-queries.
+However,
+we&#8217;re not going to bother with weighting for this demo; we&#8217;ll just 
assign a fixed score of 1.0 to each matching document.</p>
+
+<p>We don&#8217;t need to write a constructor,
+as it will suffice to inherit new() from Lucy::Search::Compiler.
+The only method we need to implement for PrefixCompiler is make_matcher().</p>
+
+<pre>package PrefixCompiler;
+use base qw( Lucy::Search::Compiler );
+
+sub make_matcher {
+    my ( $self, %args ) = @_;
+    my $seg_reader = $args{reader};
+
+    # Retrieve low-level components LexiconReader and PostingListReader.
+    my $lex_reader
+        = $seg_reader-&#62;obtain(&#34;Lucy::Index::LexiconReader&#34;);
+    my $plist_reader
+        = $seg_reader-&#62;obtain(&#34;Lucy::Index::PostingListReader&#34;);
+    
+    # Acquire a Lexicon and seek it to our query string.
+    my $substring = $self-&#62;get_parent-&#62;get_query_string;
+    $substring =~ s/\*.\s*$//;
+    my $field = $self-&#62;get_parent-&#62;get_field;
+    my $lexicon = $lex_reader-&#62;lexicon( field =&#62; $field );
+    return unless $lexicon;
+    $lexicon-&#62;seek($substring);
+    
+    # Accumulate PostingLists for each matching term.
+    my @posting_lists;
+    while ( defined( my $term = $lexicon-&#62;get_term ) ) {
+        last unless $term =~ /^\Q$substring/;
+        my $posting_list = $plist_reader-&#62;posting_list(
+            field =&#62; $field,
+            term  =&#62; $term,
+        );
+        if ($posting_list) {
+            push @posting_lists, $posting_list;
+        }
+        last unless $lexicon-&#62;next;
+    }
+    return unless @posting_lists;
+    
+    return PrefixMatcher-&#62;new( posting_lists =&#62; \@posting_lists );
+}</pre>
+
+<p>PrefixCompiler gets access to a <a 
href="../../../Lucy/Index/SegReader.html" class="podlinkpod"
+>SegReader</a> object when make_matcher() gets called.
+From the SegReader and its sub-components <a 
href="../../../Lucy/Index/LexiconReader.html" class="podlinkpod"
+>LexiconReader</a> and <a href="../../../Lucy/Index/PostingListReader.html" 
class="podlinkpod"
+>PostingListReader</a>,
+we acquire a <a href="../../../Lucy/Index/Lexicon.html" class="podlinkpod"
+>Lexicon</a>,
+scan through the Lexicon&#8217;s unique terms,
+and acquire a <a href="../../../Lucy/Index/PostingList.html" class="podlinkpod"
+>PostingList</a> for each term that matches our prefix.</p>
+
+<p>Each of these PostingList objects represents a set of documents which match 
the query.</p>
+
+<h4><a class='u'
+name="PrefixMatcher"
+>PrefixMatcher</a></h4>
+
+<p>The Matcher subclass is the most involved.</p>
+
+<pre>package PrefixMatcher;
+use base qw( Lucy::Search::Matcher );
+
+# Inside-out member vars.
+my %doc_ids;
+my %tick;
+
+sub new {
+    my ( $class, %args ) = @_;
+    my $posting_lists = delete $args{posting_lists};
+    my $self          = $class-&#62;SUPER::new(%args);
+    
+    # Cheesy but simple way of interleaving PostingList doc sets.
+    my %all_doc_ids;
+    for my $posting_list (@$posting_lists) {
+        while ( my $doc_id = $posting_list-&#62;next ) {
+            $all_doc_ids{$doc_id} = undef;
+        }
+    }
+    my @doc_ids = sort { $a &#60;=&#62; $b } keys %all_doc_ids;
+    $doc_ids{$$self} = \@doc_ids;
+    
+    # Track our position within the array of doc ids.
+    $tick{$$self} = -1;
+    
+    return $self;
+}
+
+sub DESTROY {
+    my $self = shift;
+    delete $doc_ids{$$self};
+    delete $tick{$$self};
+    $self-&#62;SUPER::DESTROY;
+}</pre>
+
+<p>The doc ids must be in order,
+or some will be ignored; hence the <code>sort</code> above.</p>
+
+<p>In addition to the constructor and destructor,
+there are three methods that must be overridden.</p>
+
+<p>next() advances the Matcher to the next valid matching doc.</p>
+
+<pre>sub next {
+    my $self    = shift;
+    my $doc_ids = $doc_ids{$$self};
+    my $tick    = ++$tick{$$self};
+    return 0 if $tick &#62;= scalar @$doc_ids;
+    return $doc_ids-&#62;[$tick];
+}</pre>
+
+<p>get_doc_id() returns the current document id,
+or 0 if the Matcher is exhausted.
+(<a href="../../../Lucy/Docs/DocIDs.html" class="podlinkpod"
+>Document numbers</a> start at 1,
+so 0 is a sentinel.)</p>
+
+<pre>sub get_doc_id {
+    my $self    = shift;
+    my $tick    = $tick{$$self};
+    my $doc_ids = $doc_ids{$$self};
+    return $tick &#60; scalar @$doc_ids ? $doc_ids-&#62;[$tick] : 0;
+}</pre>
+
+<p>score() conveys the relevance score of the current match.
+We&#8217;ll just return a fixed score of 1.0:</p>
+
+<pre>sub score { 1.0 }</pre>
+
+<h3><a class='u'
+name="Usage"
+>Usage</a></h3>
+
+<p>To get a basic feel for PrefixQuery,
+insert the FlatQueryParser module described in <a 
href="../../../Lucy/Docs/Cookbook/CustomQueryParser.html" class="podlinkpod"
+>CustomQueryParser</a> (which supports PrefixQuery) into the search.cgi sample 
app.</p>
+
+<pre>my $parser = FlatQueryParser-&#62;new( schema =&#62; 
$searcher-&#62;get_schema );
+my $query  = $parser-&#62;parse($q);</pre>
+
+<p>If you&#8217;re planning on using PrefixQuery in earnest,
+though,
+you may want to change up analyzers to avoid stemming,
+because stemming &#8211; another approach to prefix conflation &#8211; is not 
perfectly compatible with prefix searches.</p>
+
+<pre># Polyanalyzer with no SnowballStemmer.
+my $analyzer = Lucy::Analysis::PolyAnalyzer-&#62;new(
+    analyzers =&#62; [
+        Lucy::Analysis::StandardTokenizer-&#62;new,
+        Lucy::Analysis::Normalizer-&#62;new,
+    ],
+);</pre>
+
+</div>

Added: 
lucy/site/trunk/content/docs/perl/Lucy/Docs/Cookbook/CustomQueryParser.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/perl/Lucy/Docs/Cookbook/CustomQueryParser.mdtext?rev=1737642&view=auto
==============================================================================
--- 
lucy/site/trunk/content/docs/perl/Lucy/Docs/Cookbook/CustomQueryParser.mdtext 
(added)
+++ 
lucy/site/trunk/content/docs/perl/Lucy/Docs/Cookbook/CustomQueryParser.mdtext 
Mon Apr  4 09:22:30 2016
@@ -0,0 +1,239 @@
+Title: Lucy::Docs::Cookbook::CustomQueryParser â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::Cookbook::CustomQueryParser - Sample subclass of 
QueryParser.</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>Implement a custom search query language using a subclass of <a 
href="../../../Lucy/Search/QueryParser.html" class="podlinkpod"
+>QueryParser</a>.</p>
+
+<h3><a class='u'
+name="The_language"
+>The language</a></h3>
+
+<p>At first,
+our query language will support only simple term queries and phrases delimited 
by double quotes.
+For simplicity&#8217;s sake,
+it will not support parenthetical groupings,
+boolean operators,
+or prepended plus/minus.
+The results for all subqueries will be unioned together &#8211; i.e.
+joined using an OR &#8211; which is usually the best approach for 
small-to-medium-sized document collections.</p>
+
+<p>Later,
+we&#8217;ll add support for trailing wildcards.</p>
+
+<h3><a class='u'
+name="Single-field_parser"
+>Single-field parser</a></h3>
+
+<p>Our initial parser implentation will generate queries against a single 
fixed field,
+&#8220;content&#8221;,
+and it will analyze text using a fixed choice of English EasyAnalyzer.
+We won&#8217;t subclass Lucy::Search::QueryParser just yet.</p>
+
+<pre>package FlatQueryParser;
+use Lucy::Search::TermQuery;
+use Lucy::Search::PhraseQuery;
+use Lucy::Search::ORQuery;
+use Carp;
+
+sub new { 
+    my $analyzer = Lucy::Analysis::EasyAnalyzer-&#62;new(
+        language =&#62; &#39;en&#39;,
+    );
+    return bless { 
+        field    =&#62; &#39;content&#39;,
+        analyzer =&#62; $analyzer,
+    }, __PACKAGE__;
+}</pre>
+
+<p>Some private helper subs for creating TermQuery and PhraseQuery objects 
will help keep the size of our main parse() subroutine down:</p>
+
+<pre>sub _make_term_query {
+    my ( $self, $term ) = @_;
+    return Lucy::Search::TermQuery-&#62;new(
+        field =&#62; $self-&#62;{field},
+        term  =&#62; $term,
+    );
+}
+
+sub _make_phrase_query {
+    my ( $self, $terms ) = @_;
+    return Lucy::Search::PhraseQuery-&#62;new(
+        field =&#62; $self-&#62;{field},
+        terms =&#62; $terms,
+    );
+}</pre>
+
+<p>Our private _tokenize() method treats double-quote delimited material as a 
single token and splits on whitespace everywhere else.</p>
+
+<pre>sub _tokenize {
+    my ( $self, $query_string ) = @_;
+    my @tokens;
+    while ( length $query_string ) {
+        if ( $query_string =~ s/^\s+// ) {
+            next;    # skip whitespace
+        }
+        elsif ( $query_string =~ s/^(&#34;[^&#34;]*(?:&#34;|$))// ) {
+            push @tokens, $1;    # double-quoted phrase
+        }
+        else {
+            $query_string =~ s/(\S+)//;
+            push @tokens, $1;    # single word
+        }
+    }
+    return \@tokens;
+}</pre>
+
+<p>The main parsing routine creates an array of tokens by calling _tokenize(),
+runs the tokens through through the EasyAnalyzer,
+creates TermQuery or PhraseQuery objects according to how many tokens emerge 
from the EasyAnalyzer&#8217;s split() method,
+and adds each of the sub-queries to the primary ORQuery.</p>
+
+<pre>sub parse {
+    my ( $self, $query_string ) = @_;
+    my $tokens   = $self-&#62;_tokenize($query_string);
+    my $analyzer = $self-&#62;{analyzer};
+    my $or_query = Lucy::Search::ORQuery-&#62;new;
+
+    for my $token (@$tokens) {
+        if ( $token =~ s/^&#34;// ) {
+            $token =~ s/&#34;$//;
+            my $terms = $analyzer-&#62;split($token);
+            my $query = $self-&#62;_make_phrase_query($terms);
+            $or_query-&#62;add_child($phrase_query);
+        }
+        else {
+            my $terms = $analyzer-&#62;split($token);
+            if ( @$terms == 1 ) {
+                my $query = $self-&#62;_make_term_query( $terms-&#62;[0] );
+                $or_query-&#62;add_child($query);
+            }
+            elsif ( @$terms &#62; 1 ) {
+                my $query = $self-&#62;_make_phrase_query($terms);
+                $or_query-&#62;add_child($query);
+            }
+        }
+    }
+
+    return $or_query;
+}</pre>
+
+<h3><a class='u'
+name="Multi-field_parser"
+>Multi-field parser</a></h3>
+
+<p>Most often,
+the end user will want their search query to match not only a single 
&#8216;content&#8217; field,
+but also &#8216;title&#8217; and so on.
+To make that happen,
+we have to turn queries such as this&#8230;</p>
+
+<pre>foo AND NOT bar</pre>
+
+<p>&#8230; into the logical equivalent of this:</p>
+
+<pre>(title:foo OR content:foo) AND NOT (title:bar OR content:bar)</pre>
+
+<p>Rather than continue with our own from-scratch parser class and write the 
routines to accomplish that expansion,
+we&#8217;re now going to subclass Lucy::Search::QueryParser and take advantage 
of some of its existing methods.</p>
+
+<p>Our first parser implementation had the &#8220;content&#8221; field name 
and the choice of English EasyAnalyzer hard-coded for simplicity,
+but we don&#8217;t need to do that once we subclass Lucy::Search::QueryParser.
+QueryParser&#8217;s constructor &#8211; which we will inherit,
+allowing us to eliminate our own constructor &#8211; requires a Schema which 
conveys field and Analyzer information,
+so we can just defer to that.</p>
+
+<pre>package FlatQueryParser;
+use base qw( Lucy::Search::QueryParser );
+use Lucy::Search::TermQuery;
+use Lucy::Search::PhraseQuery;
+use Lucy::Search::ORQuery;
+use PrefixQuery;
+use Carp;
+
+# Inherit new()</pre>
+
+<p>We&#8217;re also going to jettison our _make_term_query() and 
_make_phrase_query() helper subs and chop our parse() subroutine way down.
+Our revised parse() routine will generate Lucy::Search::LeafQuery objects 
instead of TermQueries and PhraseQueries:</p>
+
+<pre>sub parse {
+    my ( $self, $query_string ) = @_;
+    my $tokens = $self-&#62;_tokenize($query_string);
+    my $or_query = Lucy::Search::ORQuery-&#62;new;
+    for my $token (@$tokens) {
+        my $leaf_query = Lucy::Search::LeafQuery-&#62;new( text =&#62; $token 
);
+        $or_query-&#62;add_child($leaf_query);
+    }
+    return $self-&#62;expand($or_query);
+}</pre>
+
+<p>The magic happens in QueryParser&#8217;s expand() method,
+which walks the ORQuery object we supply to it looking for LeafQuery objects,
+and calls expand_leaf() for each one it finds.
+expand_leaf() performs field-specific analysis,
+decides whether each query should be a TermQuery or a PhraseQuery,
+and if multiple fields are required,
+creates an ORQuery which mults out e.g.
+<code>foo</code> into <code>(title:foo OR content:foo)</code>.</p>
+
+<h3><a class='u'
+name="Extending_the_query_language"
+>Extending the query language</a></h3>
+
+<p>To add support for trailing wildcards to our query language,
+we need to override expand_leaf() to accommodate PrefixQuery,
+while deferring to the parent class implementation on TermQuery and 
PhraseQuery.</p>
+
+<pre>sub expand_leaf {
+    my ( $self, $leaf_query ) = @_;
+    my $text = $leaf_query-&#62;get_text;
+    if ( $text =~ /\*$/ ) {
+        my $or_query = Lucy::Search::ORQuery-&#62;new;
+        for my $field ( @{ $self-&#62;get_fields } ) {
+            my $prefix_query = PrefixQuery-&#62;new(
+                field        =&#62; $field,
+                query_string =&#62; $text,
+            );
+            $or_query-&#62;add_child($prefix_query);
+        }
+        return $or_query;
+    }
+    else {
+        return $self-&#62;SUPER::expand_leaf($leaf_query);
+    }
+}</pre>
+
+<p>Ordinarily,
+those asterisks would have been stripped when running tokens through the 
EasyAnalyzer &#8211; query strings containing &#8220;foo*&#8221; would produce 
TermQueries for the term &#8220;foo&#8221;.
+Our override intercepts tokens with trailing asterisks and processes them as 
PrefixQueries before <code>SUPER::expand_leaf</code> can discard them,
+so that a search for &#8220;foo*&#8221; can match &#8220;food&#8221;,
+&#8220;foosball&#8221;,
+and so on.</p>
+
+<h3><a class='u'
+name="Usage"
+>Usage</a></h3>
+
+<p>Insert our custom parser into the search.cgi sample app to get a feel for 
how it behaves:</p>
+
+<pre>my $parser = FlatQueryParser-&#62;new( schema =&#62; 
$searcher-&#62;get_schema );
+my $query  = $parser-&#62;parse( decode( &#39;UTF-8&#39;, 
$cgi-&#62;param(&#39;q&#39;) || &#39;&#39; ) );
+my $hits   = $searcher-&#62;hits(
+    query      =&#62; $query,
+    offset     =&#62; $offset,
+    num_wanted =&#62; $page_size,
+);
+...</pre>
+
+</div>

Added: lucy/site/trunk/content/docs/perl/Lucy/Docs/Cookbook/FastUpdates.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/perl/Lucy/Docs/Cookbook/FastUpdates.mdtext?rev=1737642&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/perl/Lucy/Docs/Cookbook/FastUpdates.mdtext 
(added)
+++ lucy/site/trunk/content/docs/perl/Lucy/Docs/Cookbook/FastUpdates.mdtext Mon 
Apr  4 09:22:30 2016
@@ -0,0 +1,170 @@
+Title: Lucy::Docs::Cookbook::FastUpdates â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::Cookbook::FastUpdates - Near real-time index updates</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>While index updates are fast on average,
+worst-case update performance may be significantly slower.
+To make index updates consistently quick,
+we must manually intervene to control the process of index segment 
consolidation.</p>
+
+<h3><a class='u'
+name="The_problem"
+>The problem</a></h3>
+
+<p>Ordinarily,
+modifying an index is cheap.
+New data is added to new segments,
+and the time to write a new segment scales more or less linearly with the 
number of documents added during the indexing session.</p>
+
+<p>Deletions are also cheap most of the time,
+because we don&#8217;t remove documents immediately but instead mark them as 
deleted,
+and adding the deletion mark is cheap.</p>
+
+<p>However,
+as new segments are added and the deletion rate for existing segments 
increases,
+search-time performance slowly begins to degrade.
+At some point,
+it becomes necessary to consolidate existing segments,
+rewriting their data into a new segment.</p>
+
+<p>If the recycled segments are small,
+the time it takes to rewrite them may not be significant.
+Every once in a while,
+though,
+a large amount of data must be rewritten.</p>
+
+<h3><a class='u'
+name="Procrastinating_and_playing_catch-up"
+>Procrastinating and playing catch-up</a></h3>
+
+<p>The simplest way to force fast index updates is to avoid rewriting 
anything.</p>
+
+<p>Indexer relies upon <a href="../../../Lucy/Index/IndexManager.html" 
class="podlinkpod"
+>IndexManager</a>&#8217;s <a 
href="../../../Lucy/Index/IndexManager.html#recycle" class="podlinkpod"
+>recycle()</a> method to tell it which segments should be consolidated.
+If we subclass IndexManager and override the method so that it always returns 
an empty array,
+we get consistently quick performance:</p>
+
+<pre>package NoMergeManager;
+use base qw( Lucy::Index::IndexManager );
+sub recycle { [] }
+
+package main;
+my $indexer = Lucy::Index::Indexer-&#62;new(
+    index =&#62; &#39;/path/to/index&#39;,
+    manager =&#62; NoMergeManager-&#62;new,
+);
+...
+$indexer-&#62;commit;</pre>
+
+<p>However,
+we can&#8217;t procrastinate forever.
+Eventually,
+we&#8217;ll have to run an ordinary,
+uncontrolled indexing session,
+potentially triggering a large rewrite of lots of small and/or degraded 
segments:</p>
+
+<pre>my $indexer = Lucy::Index::Indexer-&#62;new( 
+    index =&#62; &#39;/path/to/index&#39;, 
+    # manager =&#62; NoMergeManager-&#62;new,
+);
+...
+$indexer-&#62;commit;</pre>
+
+<h3><a class='u'
+name="Acceptable_worst-case_update_time,_slower_degradation"
+>Acceptable worst-case update time,
+slower degradation</a></h3>
+
+<p>Never merging anything at all in the main indexing process is probably 
overkill.
+Small segments are relatively cheap to merge; we just need to guard against 
the big rewrites.</p>
+
+<p>Setting a ceiling on the number of documents in the segments to be recycled 
allows us to avoid a mass proliferation of tiny,
+single-document segments,
+while still offering decent worst-case update speed:</p>
+
+<pre>package LightMergeManager;
+use base qw( Lucy::Index::IndexManager );
+
+sub recycle {
+    my $self = shift;
+    my $seg_readers = $self-&#62;SUPER::recycle(@_);
+    @$seg_readers = grep { $_-&#62;doc_max &#60; 10 } @$seg_readers;
+    return $seg_readers;
+}</pre>
+
+<p>However,
+we still have to consolidate every once in a while,
+and while that happens content updates will be locked out.</p>
+
+<h3><a class='u'
+name="Background_merging"
+>Background merging</a></h3>
+
+<p>If it&#8217;s not acceptable to lock out updates while the index 
consolidation process runs,
+the alternative is to move the consolidation process out of band,
+using <a href="../../../Lucy/Index/BackgroundMerger.html" class="podlinkpod"
+>BackgroundMerger</a>.</p>
+
+<p>It&#8217;s never safe to have more than one Indexer attempting to modify 
the content of an index at the same time,
+but a BackgroundMerger and an Indexer can operate simultaneously:</p>
+
+<pre># Indexing process.
+use Scalar::Util qw( blessed );
+my $retries = 0;
+while (1) {
+    eval {
+        my $indexer = Lucy::Index::Indexer-&#62;new(
+                index =&#62; &#39;/path/to/index&#39;,
+                manager =&#62; LightMergeManager-&#62;new,
+            );
+        $indexer-&#62;add_doc($doc);
+        $indexer-&#62;commit;
+    };
+    last unless $@;
+    if ( blessed($@) and $@-&#62;isa(&#34;Lucy::Store::LockErr&#34;) ) {
+        # Catch LockErr.
+        warn &#34;Couldn&#39;t get lock ($retries retries)&#34;;
+        $retries++;
+    }
+    else {
+        die &#34;Write failed: $@&#34;;
+    }
+}
+
+# Background merge process.
+my $manager = Lucy::Index::IndexManager-&#62;new;
+$manager-&#62;set_write_lock_timeout(60_000);
+my $bg_merger = Lucy::Index::BackgroundMerger-&#62;new(
+    index   =&#62; &#39;/path/to/index&#39;,
+    manager =&#62; $manager,
+);
+$bg_merger-&#62;commit;</pre>
+
+<p>The exception handling code becomes useful once you have more than one 
index modification process happening simultaneously.
+By default,
+Indexer tries several times to acquire a write lock over the span of one 
second,
+then holds it until <a href="../../../Lucy/Index/Indexer.html#commit" 
class="podlinkpod"
+>commit()</a> completes.
+BackgroundMerger handles most of its work without the write lock,
+but it does need it briefly once at the beginning and once again near the end.
+Under normal loads,
+the internal retry logic will resolve conflicts,
+but if it&#8217;s not acceptable to miss an insert,
+you probably want to catch <a href="../../../Lucy/Store/LockErr.html" 
class="podlinkpod"
+>LockErr</a> exceptions thrown by Indexer.
+In contrast,
+a LockErr from BackgroundMerger probably just needs to be logged.</p>
+
+</div>

Added: lucy/site/trunk/content/docs/perl/Lucy/Docs/DevGuide.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/perl/Lucy/Docs/DevGuide.mdtext?rev=1737642&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/perl/Lucy/Docs/DevGuide.mdtext (added)
+++ lucy/site/trunk/content/docs/perl/Lucy/Docs/DevGuide.mdtext Mon Apr  4 
09:22:30 2016
@@ -0,0 +1,54 @@
+Title: Lucy::Docs::DevGuide â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::DevGuide - Quick-start guide to hacking on Apache Lucy.</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>The Apache Lucy code base is organized into roughly four layers:</p>
+
+<ul>
+<li>Charmonizer - compiler and OS configuration probing.</li>
+
+<li>Clownfish - header files.</li>
+
+<li>C - implementation files.</li>
+
+<li>Host - binding language.</li>
+</ul>
+
+<p>Charmonizer is a configuration prober which writes a single header file,
+&#8220;charmony.h&#8221;,
+describing the build environment and facilitating cross-platform development.
+It&#8217;s similar to Autoconf or Metaconfig,
+but written in pure C.</p>
+
+<p>The &#8220;.cfh&#8221; files within the Lucy core are Clownfish header 
files.
+Clownfish is a purpose-built,
+declaration-only language which superimposes a single-inheritance object model 
on top of C which is specifically designed to co-exist happily with variety of 
&#8220;host&#8221; languages and to allow limited run-time dynamic subclassing.
+For more information see the Clownfish docs,
+but if there&#8217;s one thing you should know about Clownfish OO before you 
start hacking,
+it&#8217;s that method calls are differentiated from functions by 
capitalization:</p>
+
+<pre>Indexer_Add_Doc   &#60;-- Method, typically uses dynamic dispatch.
+Indexer_add_doc   &#60;-- Function, always a direct invocation.</pre>
+
+<p>The C files within the Lucy core are where most of Lucy&#8217;s low-level 
functionality lies.
+They implement the interface defined by the Clownfish header files.</p>
+
+<p>The C core is intentionally left incomplete,
+however; to be usable,
+it must be bound to a &#8220;host&#8221; language.
+(In this context,
+even C is considered a &#8220;host&#8221; which must implement the missing 
pieces and be &#8220;bound&#8221; to the core.) Some of the binding code is 
autogenerated by Clownfish on a spec customized for each language.
+Other pieces are hand-coded in either C (using the host&#8217;s C API) or the 
host language itself.</p>
+
+</div>

Added: lucy/site/trunk/content/docs/perl/Lucy/Docs/DocIDs.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/perl/Lucy/Docs/DocIDs.mdtext?rev=1737642&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/perl/Lucy/Docs/DocIDs.mdtext (added)
+++ lucy/site/trunk/content/docs/perl/Lucy/Docs/DocIDs.mdtext Mon Apr  4 
09:22:30 2016
@@ -0,0 +1,47 @@
+Title: Lucy::Docs::DocIDs â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::DocIDs - Characteristics of Apache Lucy document ids.</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<h3><a class='u'
+name="Document_ids_are_signed_32-bit_integers"
+>Document ids are signed 32-bit integers</a></h3>
+
+<p>Document ids in Apache Lucy start at 1.
+Because 0 is never a valid doc id,
+we can use it as a sentinel value:</p>
+
+<pre>while ( my $doc_id = $posting_list-&#62;next ) {
+    ...
+}</pre>
+
+<h3><a class='u'
+name="Document_ids_are_ephemeral"
+>Document ids are ephemeral</a></h3>
+
+<p>The document ids used by Lucy are associated with a single index snapshot.
+The moment an index is updated,
+the mapping of document ids to documents is subject to change.</p>
+
+<p>Since IndexReader objects represent a point-in-time view of an index,
+document ids are guaranteed to remain static for the life of the reader.
+However,
+because they are not permanent,
+Lucy document ids cannot be used as foreign keys to locate records in external 
data sources.
+If you truly need a primary key field,
+you must define it and populate it yourself.</p>
+
+<p>Furthermore,
+the order of document ids does not tell you anything about the sequence in 
which documents were added to the index.</p>
+
+</div>

Added: lucy/site/trunk/content/docs/perl/Lucy/Docs/FileFormat.mdtext
URL: 
http://svn.apache.org/viewvc/lucy/site/trunk/content/docs/perl/Lucy/Docs/FileFormat.mdtext?rev=1737642&view=auto
==============================================================================
--- lucy/site/trunk/content/docs/perl/Lucy/Docs/FileFormat.mdtext (added)
+++ lucy/site/trunk/content/docs/perl/Lucy/Docs/FileFormat.mdtext Mon Apr  4 
09:22:30 2016
@@ -0,0 +1,270 @@
+Title: Lucy::Docs::FileFormat â Apache Lucy Documentation
+
+<div>
+<a name='___top' class='dummyTopAnchor' ></a>
+
+<h2><a class='u'
+name="NAME"
+>NAME</a></h2>
+
+<p>Lucy::Docs::FileFormat - Overview of index file format</p>
+
+<h2><a class='u'
+name="DESCRIPTION"
+>DESCRIPTION</a></h2>
+
+<p>It is not necessary to understand the current implementation details of the 
index file format in order to use Apache Lucy effectively,
+but it may be helpful if you are interested in tweaking for high performance,
+exotic usage,
+or debugging and development.</p>
+
+<p>On a file system,
+an index is a directory.
+The files inside have a hierarchical relationship: an index is made up of 
&#8220;segments&#8221;,
+each of which is an independent inverted index with its own subdirectory; each 
segment is made up of several component parts.</p>
+
+<pre>[index]--|
+         |--snapshot_XXX.json
+         |--schema_XXX.json
+         |--write.lock
+         |
+         |--seg_1--|
+         |         |--segmeta.json
+         |         |--cfmeta.json
+         |         |--cf.dat-------|
+         |                         |--[lexicon]
+         |                         |--[postings]
+         |                         |--[documents]
+         |                         |--[highlight]
+         |                         |--[deletions]
+         |
+         |--seg_2--|
+         |         |--segmeta.json
+         |         |--cfmeta.json
+         |         |--cf.dat-------|
+         |                         |--[lexicon]
+         |                         |--[postings]
+         |                         |--[documents]
+         |                         |--[highlight]
+         |                         |--[deletions]
+         |
+         |--[...]--| </pre>
+
+<h3><a class='u'
+name="Write-once_philosophy"
+>Write-once philosophy</a></h3>
+
+<p>All segment directory names consist of the string &#8220;seg_&#8221; 
followed by a number in base 36: seg_1,
+seg_5m,
+seg_p9s2 and so on,
+with higher numbers indicating more recent segments.
+Once a segment is finished and committed,
+its name is never re-used and its files are never modified.</p>
+
+<p>Old segments become obsolete and can be removed when their data has been 
consolidated into new segments during the process of segment merging and 
optimization.
+A fully-optimized index has only one segment.</p>
+
+<h3><a class='u'
+name="Top-level_entries"
+>Top-level entries</a></h3>
+
+<p>There are a handful of &#8220;top-level&#8221; files and directories which 
belong to the entire index rather than to a particular segment.</p>
+
+<h4><a class='u'
+name="snapshot_XXX.json"
+>snapshot_XXX.json</a></h4>
+
+<p>A &#8220;snapshot&#8221; file,
+e.g.
+<code>snapshot_m7p.json</code>,
+is list of index files and directories.
+Because index files,
+once written,
+are never modified,
+the list of entries in a snapshot defines a point-in-time view of the data in 
an index.</p>
+
+<p>Like segment directories,
+snapshot files also utilize the unique-base-36-number naming convention; the 
higher the number,
+the more recent the file.
+The appearance of a new snapshot file within the index directory constitutes 
an index update.
+While a new segment is being written new files may be added to the index 
directory,
+but until a new snapshot file gets written,
+a Searcher opening the index for reading won&#8217;t know about them.</p>
+
+<h4><a class='u'
+name="schema_XXX.json"
+>schema_XXX.json</a></h4>
+
+<p>The schema file is a Schema object describing the index&#8217;s format,
+serialized as JSON.
+It,
+too,
+is versioned,
+and a given snapshot file will reference one and only one schema file.</p>
+
+<h4><a class='u'
+name="locks"
+>locks</a></h4>
+
+<p>By default,
+only one indexing process may safely modify the index at any given time.
+Processes reserve an index by laying claim to the <code>write.lock</code> file 
within the <code>locks/</code> directory.
+A smattering of other lock files may be used from time to time,
+as well.</p>
+
+<h3><a class='u'
+name="A_segment(8217)s_component_parts"
+>A segment&#8217;s component parts</a></h3>
+
+<p>By default,
+each segment has up to five logical components: lexicon,
+postings,
+document storage,
+highlight data,
+and deletions.
+Binary data from these components gets stored in virtual files within the 
&#8220;cf.dat&#8221; compound file; metadata is stored in a shared 
&#8220;segmeta.json&#8221; file.</p>
+
+<h4><a class='u'
+name="segmeta.json"
+>segmeta.json</a></h4>
+
+<p>The segmeta.json file is a central repository for segment metadata.
+In addition to information such as document counts and field numbers,
+it also warehouses arbitrary metadata on behalf of individual index 
components.</p>
+
+<h4><a class='u'
+name="Lexicon"
+>Lexicon</a></h4>
+
+<p>Each indexed field gets its own lexicon in each segment.
+The exact files involved depend on the field&#8217;s type,
+but generally speaking there will be two parts.
+First,
+there&#8217;s a primary <code>lexicon-XXX.dat</code> file which houses a 
complete term list associating terms with corpus frequency statistics,
+postings file locations,
+etc.
+Second,
+one or more &#8220;lexicon index&#8221; files may be present which contain 
periodic samples from the primary lexicon file to facilitate fast lookups.</p>
+
+<h4><a class='u'
+name="Postings"
+>Postings</a></h4>
+
+<p>&#8220;Posting&#8221; is a technical term from the field of <a 
href="../../Lucy/Docs/IRTheory.html" class="podlinkpod"
+>information retrieval</a>,
+defined as a single instance of a one term indexing one document.
+If you are looking at the index in the back of a book,
+and you see that &#8220;freedom&#8221; is referenced on pages 8,
+86,
+and 240,
+that would be three postings,
+which taken together form a &#8220;posting list&#8221;.
+The same terminology applies to an index in electronic form.</p>
+
+<p>Each segment has one postings file per indexed field.
+When a search is performed for a single term,
+first that term is looked up in the lexicon.
+If the term exists in the segment,
+the record in the lexicon will contain information about which postings file 
to look at and where to look.</p>
+
+<p>The first thing any posting record tells you is a document id.
+By iterating over all the postings associated with a term,
+you can find all the documents that match that term,
+a process which is analogous to looking up page numbers in a book&#8217;s 
index.
+However,
+each posting record typically contains other information in addition to 
document id,
+e.g.
+the positions at which the term occurs within the field.</p>
+
+<h4><a class='u'
+name="Documents"
+>Documents</a></h4>
+
+<p>The document storage section is a simple database,
+organized into two files:</p>
+
+<ul>
+<li><b>documents.dat</b> - Serialized documents.</li>
+
+<li><b>documents.ix</b> - Document storage index,
+a solid array of 64-bit integers where each integer location corresponds to a 
document id,
+and the value at that location points at a file position in the documents.dat 
file.</li>
+</ul>
+
+<h4><a class='u'
+name="Highlight_data"
+>Highlight data</a></h4>
+
+<p>The files which store data used for excerpting and highlighting are 
organized similarly to the files used to store documents.</p>
+
+<ul>
+<li><b>highlight.dat</b> - Chunks of serialized highlight data,
+one per doc id.</li>
+
+<li><b>highlight.ix</b> - Highlight data index &#8211; as with the 
<code>documents.ix</code> file,
+a solid array of 64-bit file pointers.</li>
+</ul>
+
+<h4><a class='u'
+name="Deletions"
+>Deletions</a></h4>
+
+<p>When a document is &#8220;deleted&#8221; from a segment,
+it is not actually purged right away; it is merely marked as 
&#8220;deleted&#8221; via a deletions file.
+Deletions files contains bit vectors with one bit for each document in the 
segment; if bit #254 is set then document 254 is deleted,
+and if that document turns up in a search it will be masked out.</p>
+
+<p>It is only when a segment&#8217;s contents are rewritten to a new segment 
during the segment-merging process that deleted documents truly go away.</p>
+
+<h3><a class='u'
+name="Compound_Files"
+>Compound Files</a></h3>
+
+<p>If you peer inside an index directory,
+you won&#8217;t actually find any files named &#8220;documents.dat&#8221;,
+&#8220;highlight.ix&#8221;,
+etc.
+unless there is an indexing process underway.
+What you will find instead is one &#8220;cf.dat&#8221; and one 
&#8220;cfmeta.json&#8221; file per segment.</p>
+
+<p>To minimize the need for file descriptors at search-time,
+all per-segment binary data files are concatenated together in 
&#8220;cf.dat&#8221; at the close of each indexing session.
+Information about where each file begins and ends is stored in 
<code>cfmeta.json</code>.
+When the segment is opened for reading,
+a single file descriptor per &#8220;cf.dat&#8221; file can be shared among 
several readers.</p>
+
+<h3><a class='u'
+name="A_Typical_Search"
+>A Typical Search</a></h3>
+
+<p>Here&#8217;s a simplified narrative,
+dramatizing how a search for &#8220;freedom&#8221; against a given segment 
plays out:</p>
+
+<ul>
+<li>The searcher asks the relevant Lexicon Index,
+&#8220;Do you know anything about &#8216;freedom&#8217;?&#8221; Lexicon Index 
replies,
+&#8220;Can&#8217;t say for sure,
+but if the main Lexicon file does,
+&#8216;freedom&#8217; is probably somewhere around byte 21008&#8221;.</li>
+
+<li>The main Lexicon tells the searcher &#8220;One moment,
+let me scan our records&#8230; Yes,
+we have 2 documents which contain &#8216;freedom&#8217;.
+You&#8217;ll find them in seg_6/postings-4.dat starting at byte 
66991.&#8221;</li>
+
+<li>The Postings file says &#8220;Yep,
+we have &#8216;freedom&#8217;,
+all right!
+Document id 40 has 1 &#8216;freedom&#8217;,
+and document 44 has 8.
+If you need to know more,
+like if any &#8216;freedom&#8217; is part of the phrase &#8216;freedom of 
speech&#8217;,
+ask me about positions!</li>
+
+<li>If the searcher is only looking for &#8216;freedom&#8217; in isolation,
+that&#8217;s where it stops.
+It now knows enough to assign the documents scores against 
&#8220;freedom&#8221;,
+with the 8-freedom document likely ranking higher than the single-freedom 
document.</li>
+</ul>
+
+</div>

svn commit: r1737642 [3/7] - in /lucy/site/trunk/content/docs: 0.4.0/ 0.4.0/perl/ 0.4.0/perl/Lucy/ 0.4.0/perl/Lucy/Analysis/ 0.4.0/perl/Lucy/Docs/ 0.4.0/perl/Lucy/Docs/Cookbook/ 0.4.0/perl/Lucy/Docs/Tutorial/ 0.4.0/perl/Lucy/Document/ 0.4.0/perl/Lucy/H...

Reply via email to