Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/IndexReader.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/IndexReader.html (added) +++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/IndexReader.html Mon Apr 4 09:23:29 2016 @@ -0,0 +1,252 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Index::IndexReader â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/perl/">Perl</a> » <a href="/docs/perl/Lucy/">Lucy</a> » <a href="/docs/perl/Lucy/Index/">Index</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/perl/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Index::IndexReader - Read from an inverted index.</p> + +<h2><a class='u' +name="SYNOPSIS" +>SYNOPSIS</a></h2> + +<pre>my $reader = Lucy::Index::IndexReader->open( + index => '/path/to/index', +); +my $seg_readers = $reader->seg_readers; +for my $seg_reader (@$seg_readers) { + my $seg_name = $seg_reader->get_segment->get_name; + my $num_docs = $seg_reader->doc_max; + print "Segment $seg_name ($num_docs documents):\n"; + my $doc_reader = $seg_reader->obtain("Lucy::Index::DocReader"); + for my $doc_id ( 1 .. $num_docs ) { + my $doc = $doc_reader->fetch_doc($doc_id); + print " $doc_id: $doc->{title}\n"; + } +}</pre> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>IndexReader is the interface through which <a href="../../Lucy/Search/IndexSearcher.html" class="podlinkpod" +>IndexSearcher</a> objects access the content of an index.</p> + +<p>IndexReader objects always represent a point-in-time view of an index as it existed at the moment the reader was created. +If you want search results to reflect modifications to an index, +you must create a new IndexReader after the update process completes.</p> + +<p>IndexReaders are composites; most of the work is done by individual <a href="../../Lucy/Index/DataReader.html" class="podlinkpod" +>DataReader</a> sub-components, +which may be accessed via <a href="#fetch" class="podlinkpod" +>fetch()</a> and <a href="#obtain" class="podlinkpod" +>obtain()</a>. +The most efficient and powerful access to index data happens at the segment level via <a href="../../Lucy/Index/SegReader.html" class="podlinkpod" +>SegReader</a>’s sub-components.</p> + +<h2><a class='u' +name="CONSTRUCTORS" +>CONSTRUCTORS</a></h2> + +<h3><a class='u' +name="open" +>open</a></h3> + +<pre>my $reader = Lucy::Index::IndexReader->open( + index => '/path/to/index', # required + snapshot => $snapshot, + manager => $index_manager, +);</pre> + +<p>IndexReader is an abstract base class; open() returns the IndexReader subclass PolyReader, +which channels the output of 0 or more SegReaders.</p> + +<ul> +<li><b>index</b> - Either a string filepath or a Folder.</li> + +<li><b>snapshot</b> - A Snapshot. +If not supplied, +the most recent snapshot file will be used.</li> + +<li><b>manager</b> - An <a href="../../Lucy/Index/IndexManager.html" class="podlinkpod" +>IndexManager</a>. +Read-locking is off by default; supplying this argument turns it on.</li> +</ul> + +<h2><a class='u' +name="ABSTRACT_METHODS" +>ABSTRACT METHODS</a></h2> + +<h3><a class='u' +name="doc_max" +>doc_max</a></h3> + +<pre>my $int = $index_reader->doc_max();</pre> + +<p>Return the maximum number of documents available to the reader, +which is also the highest possible internal document id. +Documents which have been marked as deleted but not yet purged from the index are included in this count.</p> + +<h3><a class='u' +name="doc_count" +>doc_count</a></h3> + +<pre>my $int = $index_reader->doc_count();</pre> + +<p>Return the number of documents available to the reader, +subtracting any that are marked as deleted.</p> + +<h3><a class='u' +name="del_count" +>del_count</a></h3> + +<pre>my $int = $index_reader->del_count();</pre> + +<p>Return the number of documents which have been marked as deleted but not yet purged from the index.</p> + +<h3><a class='u' +name="offsets" +>offsets</a></h3> + +<pre>my $i32_array = $index_reader->offsets();</pre> + +<p>Return an array with one entry for each segment, +corresponding to segment doc_id start offset.</p> + +<h3><a class='u' +name="seg_readers" +>seg_readers</a></h3> + +<pre>my $arrayref = $index_reader->seg_readers();</pre> + +<p>Return an array of all the SegReaders represented within the IndexReader.</p> + +<h2><a class='u' +name="METHODS" +>METHODS</a></h2> + +<h3><a class='u' +name="obtain" +>obtain</a></h3> + +<pre>my $data_reader = $index_reader->obtain($api);</pre> + +<p>Fetch a component, +or throw an error if the component can’t be found.</p> + +<ul> +<li><b>api</b> - The name of the DataReader subclass that the desired component must implement.</li> +</ul> + +<h3><a class='u' +name="fetch" +>fetch</a></h3> + +<pre>my $data_reader = $index_reader->fetch($api);</pre> + +<p>Fetch a component, +or return undef if the component can’t be found.</p> + +<ul> +<li><b>api</b> - The name of the DataReader subclass that the desired component must implement.</li> +</ul> + +<h2><a class='u' +name="INHERITANCE" +>INHERITANCE</a></h2> + +<p>Lucy::Index::IndexReader isa <a href="../../Lucy/Index/DataReader.html" class="podlinkpod" +>Lucy::Index::DataReader</a> isa Clownfish::Obj.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html>
Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/Indexer.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/Indexer.html (added) +++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/Indexer.html Mon Apr 4 09:23:29 2016 @@ -0,0 +1,341 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Index::Indexer â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/perl/">Perl</a> » <a href="/docs/perl/Lucy/">Lucy</a> » <a href="/docs/perl/Lucy/Index/">Index</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/perl/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Index::Indexer - Build inverted indexes.</p> + +<h2><a class='u' +name="SYNOPSIS" +>SYNOPSIS</a></h2> + +<pre>my $indexer = Lucy::Index::Indexer->new( + schema => $schema, + index => '/path/to/index', + create => 1, +); +while ( my ( $title, $content ) = each %source_docs ) { + $indexer->add_doc({ + title => $title, + content => $content, + }); +} +$indexer->commit;</pre> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>The Indexer class is Apache Lucy’s primary tool for managing the content of inverted indexes, +which may later be searched using <a href="../../Lucy/Search/IndexSearcher.html" class="podlinkpod" +>IndexSearcher</a>.</p> + +<p>In general, +only one Indexer at a time may write to an index safely. +If a write lock cannot be secured, +new() will throw an exception.</p> + +<p>If an index is located on a shared volume, +each writer application must identify itself by supplying an <a href="../../Lucy/Index/IndexManager.html" class="podlinkpod" +>IndexManager</a> with a unique <code>host</code> id to Indexer’s constructor or index corruption will occur. +See <a href="../../Lucy/Docs/FileLocking.html" class="podlinkpod" +>FileLocking</a> for a detailed discussion.</p> + +<p>Note: at present, +<a href="#delete_by_term" class="podlinkpod" +>delete_by_term()</a> and <a href="#delete_by_query" class="podlinkpod" +>delete_by_query()</a> only affect documents which had been previously committed to the index – and not any documents added this indexing session but not yet committed. +This may change in a future update.</p> + +<h2><a class='u' +name="CONSTRUCTORS" +>CONSTRUCTORS</a></h2> + +<h3><a class='u' +name="new" +>new</a></h3> + +<pre>my $indexer = Lucy::Index::Indexer->new( + schema => $schema, # required at index creation + index => '/path/to/index', # required + create => 1, # default: 0 + truncate => 1, # default: 0 + manager => $manager # default: created internally +);</pre> + +<ul> +<li><b>schema</b> - A Schema. +Required when index is being created; if not supplied, +will be extracted from the index folder.</li> + +<li><b>index</b> - Either a filepath to an index or a Folder.</li> + +<li><b>create</b> - If true and the index directory does not exist, +attempt to create it.</li> + +<li><b>truncate</b> - If true, +proceed with the intention of discarding all previous indexing data. +The old data will remain intact and visible until commit() succeeds.</li> + +<li><b>manager</b> - An IndexManager.</li> +</ul> + +<h2><a class='u' +name="METHODS" +>METHODS</a></h2> + +<h3><a class='u' +name="add_doc" +>add_doc</a></h3> + +<pre>$indexer->add_doc($doc); +$indexer->add_doc( { field_name => $field_value } ); +$indexer->add_doc( + doc => { field_name => $field_value }, + boost => 2.5, # default: 1.0 +);</pre> + +<p>Add a document to the index. +Accepts either a single argument or labeled params.</p> + +<ul> +<li><b>doc</b> - Either a Lucy::Document::Doc object, +or a hashref (which will be attached to a Lucy::Document::Doc object internally).</li> + +<li><b>boost</b> - A floating point weight which affects how this document scores.</li> +</ul> + +<h3><a class='u' +name="add_index" +>add_index</a></h3> + +<pre>$indexer->add_index($index);</pre> + +<p>Absorb an existing index into this one. +The two indexes must have matching Schemas.</p> + +<ul> +<li><b>index</b> - Either an index path name or a Folder.</li> +</ul> + +<h3><a class='u' +name="delete_by_term" +>delete_by_term</a></h3> + +<pre>$indexer->delete_by_term( + field => $field # required + term => $term # required +);</pre> + +<p>Mark documents which contain the supplied term as deleted, +so that they will be excluded from search results and eventually removed altogether. +The change is not apparent to search apps until after <a href="#commit" class="podlinkpod" +>commit()</a> succeeds.</p> + +<ul> +<li><b>field</b> - The name of an indexed field. +(If it is not spec’d as <code>indexed</code>, +an error will occur.)</li> + +<li><b>term</b> - The term which identifies docs to be marked as deleted. +If <code>field</code> is associated with an Analyzer, +<code>term</code> will be processed automatically (so don’t pre-process it yourself).</li> +</ul> + +<h3><a class='u' +name="delete_by_query" +>delete_by_query</a></h3> + +<pre>$indexer->delete_by_query($query);</pre> + +<p>Mark documents which match the supplied Query as deleted.</p> + +<ul> +<li><b>query</b> - A <a href="../../Lucy/Search/Query.html" class="podlinkpod" +>Query</a>.</li> +</ul> + +<h3><a class='u' +name="delete_by_doc_id" +>delete_by_doc_id</a></h3> + +<pre>$indexer->delete_by_doc_id($doc_id);</pre> + +<p>Mark the document identified by the supplied document ID as deleted.</p> + +<ul> +<li><b>doc_id</b> - A <a href="../../Lucy/Docs/DocIDs.html" class="podlinkpod" +>document id</a>.</li> +</ul> + +<h3><a class='u' +name="optimize" +>optimize</a></h3> + +<pre>$indexer->optimize();</pre> + +<p>Optimize the index for search-time performance. +This may take a while, +as it can involve rewriting large amounts of data.</p> + +<p>Every Indexer session which changes index content and ends in a <a href="#commit" class="podlinkpod" +>commit()</a> creates a new segment. +Once written, +segments are never modified. +However, +they are periodically recycled by feeding their content into the segment currently being written.</p> + +<p>The <a href="#optimize" class="podlinkpod" +>optimize()</a> method causes all existing index content to be fed back into the Indexer. +When <a href="#commit" class="podlinkpod" +>commit()</a> completes after an <a href="#optimize" class="podlinkpod" +>optimize()</a>, +the index will consist of one segment. +So <a href="#optimize" class="podlinkpod" +>optimize()</a> must be called before <a href="#commit" class="podlinkpod" +>commit()</a>. +Also, +optimizing a fresh index created from scratch has no effect.</p> + +<p>Historically, +there was a significant search-time performance benefit to collapsing down to a single segment versus even two segments. +Now the effect of collapsing is much less significant, +and calling <a href="#optimize" class="podlinkpod" +>optimize()</a> is rarely justified.</p> + +<h3><a class='u' +name="commit" +>commit</a></h3> + +<pre>$indexer->commit();</pre> + +<p>Commit any changes made to the index. +Until this is called, +none of the changes made during an indexing session are permanent.</p> + +<p>Calling <a href="#commit" class="podlinkpod" +>commit()</a> invalidates the Indexer, +so if you want to make more changes you’ll need a new one.</p> + +<h3><a class='u' +name="prepare_commit" +>prepare_commit</a></h3> + +<pre>$indexer->prepare_commit();</pre> + +<p>Perform the expensive setup for <a href="#commit" class="podlinkpod" +>commit()</a> in advance, +so that <a href="#commit" class="podlinkpod" +>commit()</a> completes quickly. +(If <a href="#prepare_commit" class="podlinkpod" +>prepare_commit()</a> is not called explicitly by the user, +<a href="#commit" class="podlinkpod" +>commit()</a> will call it internally.)</p> + +<h3><a class='u' +name="get_schema" +>get_schema</a></h3> + +<pre>my $schema = $indexer->get_schema();</pre> + +<p>Accessor for schema.</p> + +<h2><a class='u' +name="INHERITANCE" +>INHERITANCE</a></h2> + +<p>Lucy::Index::Indexer isa Clownfish::Obj.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/Lexicon.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/Lexicon.html (added) +++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/Lexicon.html Mon Apr 4 09:23:29 2016 @@ -0,0 +1,175 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Index::Lexicon â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/perl/">Perl</a> » <a href="/docs/perl/Lucy/">Lucy</a> » <a href="/docs/perl/Lucy/Index/">Index</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/perl/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Index::Lexicon - Iterator for a field’s terms.</p> + +<h2><a class='u' +name="SYNOPSIS" +>SYNOPSIS</a></h2> + +<pre>my $lex_reader = $seg_reader->obtain('Lucy::Index::LexiconReader'); +my $lexicon = $lex_reader->lexicon( field => 'content' ); +while ( $lexicon->next ) { + print $lexicon->get_term . "\n"; +}</pre> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>A Lexicon is an iterator which provides access to all the unique terms for a given field in sorted order.</p> + +<p>If an index consists of two documents with a ‘content’ field holding “three blind mice” and “three musketeers” respectively, +then iterating through the ‘content’ field’s lexicon would produce this list:</p> + +<pre>blind +mice +musketeers +three</pre> + +<h2><a class='u' +name="ABSTRACT_METHODS" +>ABSTRACT METHODS</a></h2> + +<h3><a class='u' +name="seek" +>seek</a></h3> + +<pre>$lexicon->seek($target); +$lexicon->seek(); # default: undef</pre> + +<p>Seek the Lexicon to the first iterator state which is greater than or equal to <code>target</code>. +If <code>target</code> is undef, +reset the iterator.</p> + +<h3><a class='u' +name="next" +>next</a></h3> + +<pre>my $bool = $lexicon->next();</pre> + +<p>Proceed to the next term.</p> + +<p>Returns: true until the iterator is exhausted, +then false.</p> + +<h3><a class='u' +name="reset" +>reset</a></h3> + +<pre>$lexicon->reset();</pre> + +<p>Reset the iterator. +<a href="#next" class="podlinkpod" +>next()</a> must be called to proceed to the first element.</p> + +<h3><a class='u' +name="get_term" +>get_term</a></h3> + +<pre>my $obj = $lexicon->get_term();</pre> + +<p>Return the current term, +or undef if the iterator is not in a valid state.</p> + +<h2><a class='u' +name="INHERITANCE" +>INHERITANCE</a></h2> + +<p>Lucy::Index::Lexicon isa Clownfish::Obj.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/LexiconReader.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/LexiconReader.html (added) +++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/LexiconReader.html Mon Apr 4 09:23:29 2016 @@ -0,0 +1,175 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Index::LexiconReader â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/perl/">Perl</a> » <a href="/docs/perl/Lucy/">Lucy</a> » <a href="/docs/perl/Lucy/Index/">Index</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/perl/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Index::LexiconReader - Read Lexicon data.</p> + +<h2><a class='u' +name="SYNOPSIS" +>SYNOPSIS</a></h2> + +<pre>my $lex_reader = $seg_reader->obtain("Lucy::Index::LexiconReader"); +my $lexicon = $lex_reader->lexicon( field => 'title' );</pre> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>LexiconReader reads term dictionary information.</p> + +<h2><a class='u' +name="ABSTRACT_METHODS" +>ABSTRACT METHODS</a></h2> + +<h3><a class='u' +name="lexicon" +>lexicon</a></h3> + +<pre>my $lexicon = $lexicon_reader->lexicon( + field => $field # required + term => $term # default: undef +);</pre> + +<p>Return a new Lexicon for the given <code>field</code>. +Will return undef if either the field is not indexed, +or if no documents contain a value for the field.</p> + +<ul> +<li><b>field</b> - Field name.</li> + +<li><b>term</b> - Pre-locate the Lexicon to this term.</li> +</ul> + +<h3><a class='u' +name="doc_freq" +>doc_freq</a></h3> + +<pre>my $int = $lexicon_reader->doc_freq( + field => $field # required + term => $term # required +);</pre> + +<p>Return the number of documents where the specified term is present.</p> + +<h2><a class='u' +name="METHODS" +>METHODS</a></h2> + +<h3><a class='u' +name="aggregator" +>aggregator</a></h3> + +<pre>my $result = $lexicon_reader->aggregator( + readers => $readers # required + offsets => $offsets # required +);</pre> + +<p>Return a LexiconReader which merges the output of other LexiconReaders.</p> + +<ul> +<li><b>readers</b> - An array of LexiconReaders.</li> + +<li><b>offsets</b> - Doc id start offsets for each reader.</li> +</ul> + +<h2><a class='u' +name="INHERITANCE" +>INHERITANCE</a></h2> + +<p>Lucy::Index::LexiconReader isa <a href="../../Lucy/Index/DataReader.html" class="podlinkpod" +>Lucy::Index::DataReader</a> isa Clownfish::Obj.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/PolyReader.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/PolyReader.html (added) +++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/PolyReader.html Mon Apr 4 09:23:29 2016 @@ -0,0 +1,183 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Index::PolyReader â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/perl/">Perl</a> » <a href="/docs/perl/Lucy/">Lucy</a> » <a href="/docs/perl/Lucy/Index/">Index</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/perl/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Index::PolyReader - Multi-segment implementation of IndexReader.</p> + +<h2><a class='u' +name="SYNOPSIS" +>SYNOPSIS</a></h2> + +<pre>my $polyreader = Lucy::Index::IndexReader->open( + index => '/path/to/index', +); +my $doc_reader = $polyreader->obtain("Lucy::Index::DocReader"); +for my $doc_id ( 1 .. $polyreader->doc_max ) { + my $doc = $doc_reader->fetch_doc($doc_id); + print " $doc_id: $doc->{title}\n"; +}</pre> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>PolyReader conflates index data from multiple segments. +For instance, +if an index contains three segments with 10 documents each, +PolyReader’s <a href="../../Lucy/Index/IndexReader.html#doc_max" class="podlinkpod" +>doc_max()</a> method will return 30.</p> + +<p>Some of PolyReader’s <a href="../../Lucy/Index/DataReader.html" class="podlinkpod" +>DataReader</a> components may be less efficient or complete than the single-segment implementations accessed via <a href="../../Lucy/Index/SegReader.html" class="podlinkpod" +>SegReader</a>.</p> + +<h2><a class='u' +name="METHODS" +>METHODS</a></h2> + +<h3><a class='u' +name="doc_max" +>doc_max</a></h3> + +<pre>my $int = $poly_reader->doc_max();</pre> + +<p>Return the maximum number of documents available to the reader, +which is also the highest possible internal document id. +Documents which have been marked as deleted but not yet purged from the index are included in this count.</p> + +<h3><a class='u' +name="doc_count" +>doc_count</a></h3> + +<pre>my $int = $poly_reader->doc_count();</pre> + +<p>Return the number of documents available to the reader, +subtracting any that are marked as deleted.</p> + +<h3><a class='u' +name="del_count" +>del_count</a></h3> + +<pre>my $int = $poly_reader->del_count();</pre> + +<p>Return the number of documents which have been marked as deleted but not yet purged from the index.</p> + +<h3><a class='u' +name="offsets" +>offsets</a></h3> + +<pre>my $i32_array = $poly_reader->offsets();</pre> + +<p>Return an array with one entry for each segment, +corresponding to segment doc_id start offset.</p> + +<h3><a class='u' +name="seg_readers" +>seg_readers</a></h3> + +<pre>my $arrayref = $poly_reader->seg_readers();</pre> + +<p>Return an array of all the SegReaders represented within the IndexReader.</p> + +<h2><a class='u' +name="INHERITANCE" +>INHERITANCE</a></h2> + +<p>Lucy::Index::PolyReader isa <a href="../../Lucy/Index/IndexReader.html" class="podlinkpod" +>Lucy::Index::IndexReader</a> isa <a href="../../Lucy/Index/DataReader.html" class="podlinkpod" +>Lucy::Index::DataReader</a> isa Clownfish::Obj.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/PostingList.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/PostingList.html (added) +++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/PostingList.html Mon Apr 4 09:23:29 2016 @@ -0,0 +1,158 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Index::PostingList â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/perl/">Perl</a> » <a href="/docs/perl/Lucy/">Lucy</a> » <a href="/docs/perl/Lucy/Index/">Index</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/perl/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Index::PostingList - Term-Document pairings.</p> + +<h2><a class='u' +name="SYNOPSIS" +>SYNOPSIS</a></h2> + +<pre>my $posting_list_reader + = $seg_reader->obtain("Lucy::Index::PostingListReader"); +my $posting_list = $posting_list_reader->posting_list( + field => 'content', + term => 'foo', +); +while ( my $doc_id = $posting_list->next ) { + say "Matching doc id: $doc_id"; +}</pre> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>PostingList is an iterator which supplies a list of document ids that match a given term.</p> + +<p>See <a href="../../Lucy/Docs/IRTheory.html" class="podlinkpod" +>IRTheory</a> for definitions of “posting” and “posting list”.</p> + +<h2><a class='u' +name="ABSTRACT_METHODS" +>ABSTRACT METHODS</a></h2> + +<h3><a class='u' +name="get_doc_freq" +>get_doc_freq</a></h3> + +<pre>my $int = $posting_list->get_doc_freq();</pre> + +<p>Return the number of documents that the PostingList contains. +(This number will include any documents which have been marked as deleted but not yet purged.)</p> + +<h3><a class='u' +name="seek" +>seek</a></h3> + +<pre>$posting_list->seek($target); +$posting_list->seek(); # default: undef</pre> + +<p>Prepare the PostingList object to iterate over matches for documents that match <code>target</code>.</p> + +<ul> +<li><b>target</b> - The term to match. +If undef, +the iterator will be empty.</li> +</ul> + +<h2><a class='u' +name="INHERITANCE" +>INHERITANCE</a></h2> + +<p>Lucy::Index::PostingList isa <a href="../../Lucy/Search/Matcher.html" class="podlinkpod" +>Lucy::Search::Matcher</a> isa Clownfish::Obj.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/PostingListReader.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/PostingListReader.html (added) +++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/PostingListReader.html Mon Apr 4 09:23:29 2016 @@ -0,0 +1,164 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Index::PostingListReader â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/perl/">Perl</a> » <a href="/docs/perl/Lucy/">Lucy</a> » <a href="/docs/perl/Lucy/Index/">Index</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/perl/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Index::PostingListReader - Read postings data.</p> + +<h2><a class='u' +name="SYNOPSIS" +>SYNOPSIS</a></h2> + +<pre>my $posting_list_reader + = $seg_reader->obtain("Lucy::Index::PostingListReader"); +my $posting_list = $posting_list_reader->posting_list( + field => 'title', + term => 'foo', +);</pre> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>PostingListReaders produce <a href="../../Lucy/Index/PostingList.html" class="podlinkpod" +>PostingList</a> objects which convey document matching information.</p> + +<h2><a class='u' +name="ABSTRACT_METHODS" +>ABSTRACT METHODS</a></h2> + +<h3><a class='u' +name="posting_list" +>posting_list</a></h3> + +<pre>my $posting_list = $posting_list_reader->posting_list( + field => $field # default: undef + term => $term # default: undef +);</pre> + +<p>Returns a PostingList, +or undef if either <code>field</code> is undef or <code>field</code> is not present in any documents.</p> + +<ul> +<li><b>field</b> - A field name.</li> + +<li><b>term</b> - If supplied, +the PostingList will be pre-located to this term using <a href="../../Lucy/Index/PostingList.html#seek" class="podlinkpod" +>seek()</a>.</li> +</ul> + +<h2><a class='u' +name="METHODS" +>METHODS</a></h2> + +<h3><a class='u' +name="aggregator" +>aggregator</a></h3> + +<pre>my $result = $posting_list_reader->aggregator( + readers => $readers # required + offsets => $offsets # required +);</pre> + +<p>Returns undef since PostingLists may only be iterated at the segment level.</p> + +<h2><a class='u' +name="INHERITANCE" +>INHERITANCE</a></h2> + +<p>Lucy::Index::PostingListReader isa <a href="../../Lucy/Index/DataReader.html" class="podlinkpod" +>Lucy::Index::DataReader</a> isa Clownfish::Obj.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/SegReader.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/SegReader.html (added) +++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/SegReader.html Mon Apr 4 09:23:29 2016 @@ -0,0 +1,203 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Index::SegReader â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/perl/">Perl</a> » <a href="/docs/perl/Lucy/">Lucy</a> » <a href="/docs/perl/Lucy/Index/">Index</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/perl/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Index::SegReader - Single-segment IndexReader.</p> + +<h2><a class='u' +name="SYNOPSIS" +>SYNOPSIS</a></h2> + +<pre>my $polyreader = Lucy::Index::IndexReader->open( + index => '/path/to/index', +); +my $seg_readers = $polyreader->seg_readers; +for my $seg_reader (@$seg_readers) { + my $seg_name = $seg_reader->get_seg_name; + my $num_docs = $seg_reader->doc_max; + print "Segment $seg_name ($num_docs documents):\n"; + my $doc_reader = $seg_reader->obtain("Lucy::Index::DocReader"); + for my $doc_id ( 1 .. $num_docs ) { + my $doc = $doc_reader->fetch_doc($doc_id); + print " $doc_id: $doc->{title}\n"; + } +}</pre> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>SegReader interprets the data within a single segment of an index.</p> + +<p>Generally speaking, +only advanced users writing subclasses which manipulate data at the segment level need to deal with the SegReader API directly.</p> + +<p>Nearly all of SegReader’s functionality is implemented by pluggable components spawned by <a href="../../Lucy/Plan/Architecture.html" class="podlinkpod" +>Architecture</a>’s factory methods.</p> + +<h2><a class='u' +name="METHODS" +>METHODS</a></h2> + +<h3><a class='u' +name="get_seg_name" +>get_seg_name</a></h3> + +<pre>my $string = $seg_reader->get_seg_name();</pre> + +<p>Return the name of the segment.</p> + +<h3><a class='u' +name="get_seg_num" +>get_seg_num</a></h3> + +<pre>my $int = $seg_reader->get_seg_num();</pre> + +<p>Return the number of the segment.</p> + +<h3><a class='u' +name="del_count" +>del_count</a></h3> + +<pre>my $int = $seg_reader->del_count();</pre> + +<p>Return the number of documents which have been marked as deleted but not yet purged from the index.</p> + +<h3><a class='u' +name="doc_max" +>doc_max</a></h3> + +<pre>my $int = $seg_reader->doc_max();</pre> + +<p>Return the maximum number of documents available to the reader, +which is also the highest possible internal document id. +Documents which have been marked as deleted but not yet purged from the index are included in this count.</p> + +<h3><a class='u' +name="doc_count" +>doc_count</a></h3> + +<pre>my $int = $seg_reader->doc_count();</pre> + +<p>Return the number of documents available to the reader, +subtracting any that are marked as deleted.</p> + +<h3><a class='u' +name="_offsets" +>_offsets</a></h3> + +<pre>my $i32_array = $seg_reader->_offsets();</pre> + +<p>Return an array with one entry for each segment, +corresponding to segment doc_id start offset.</p> + +<h3><a class='u' +name="seg_readers" +>seg_readers</a></h3> + +<pre>my $arrayref = $seg_reader->seg_readers();</pre> + +<p>Return an array of all the SegReaders represented within the IndexReader.</p> + +<h2><a class='u' +name="INHERITANCE" +>INHERITANCE</a></h2> + +<p>Lucy::Index::SegReader isa <a href="../../Lucy/Index/IndexReader.html" class="podlinkpod" +>Lucy::Index::IndexReader</a> isa <a href="../../Lucy/Index/DataReader.html" class="podlinkpod" +>Lucy::Index::DataReader</a> isa Clownfish::Obj.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/SegWriter.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/SegWriter.html (added) +++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/SegWriter.html Mon Apr 4 09:23:29 2016 @@ -0,0 +1,214 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Index::SegWriter â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/perl/">Perl</a> » <a href="/docs/perl/Lucy/">Lucy</a> » <a href="/docs/perl/Lucy/Index/">Index</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/perl/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Index::SegWriter - Write one segment of an index.</p> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>SegWriter is a conduit through which information fed to Indexer passes. +It manages <a href="../../Lucy/Index/Segment.html" class="podlinkpod" +>Segment</a> and Inverter, +invokes the <a href="../../Lucy/Analysis/Analyzer.html" class="podlinkpod" +>Analyzer</a> chain, +and feeds low level <a href="../../Lucy/Index/DataWriter.html" class="podlinkpod" +>DataWriters</a> such as PostingListWriter and DocWriter.</p> + +<p>The sub-components of a SegWriter are determined by <a href="../../Lucy/Plan/Architecture.html" class="podlinkpod" +>Architecture</a>. +DataWriter components which are added to the stack of writers via <a href="#add_writer" class="podlinkpod" +>add_writer()</a> have Add_Inverted_Doc() invoked for each document supplied to SegWriter’s <a href="#add_doc" class="podlinkpod" +>add_doc()</a>.</p> + +<h2><a class='u' +name="METHODS" +>METHODS</a></h2> + +<h3><a class='u' +name="register" +>register</a></h3> + +<pre>$seg_writer->register( + api => $api # required + component => $component # required +);</pre> + +<p>Register a DataWriter component with the SegWriter. +(Note that registration simply makes the writer available via <a href="#fetch" class="podlinkpod" +>fetch()</a>, +so you may also want to call <a href="#add_writer" class="podlinkpod" +>add_writer()</a>).</p> + +<ul> +<li><b>api</b> - The name of the DataWriter api which <code>writer</code> implements.</li> + +<li><b>component</b> - A DataWriter.</li> +</ul> + +<h3><a class='u' +name="fetch" +>fetch</a></h3> + +<pre>my $obj = $seg_writer->fetch($api);</pre> + +<p>Retrieve a registered component.</p> + +<ul> +<li><b>api</b> - The name of the DataWriter api which the component implements.</li> +</ul> + +<h3><a class='u' +name="add_writer" +>add_writer</a></h3> + +<pre>$seg_writer->add_writer($writer);</pre> + +<p>Add a DataWriter to the SegWriter’s stack of writers.</p> + +<h3><a class='u' +name="add_doc" +>add_doc</a></h3> + +<pre>$seg_writer->add_doc( + doc => $doc # required + boost => $boost # default: 1.0 +);</pre> + +<p>Add a document to the segment. +Inverts <code>doc</code>, +increments the Segment’s internal document id, +then calls Add_Inverted_Doc(), +feeding all sub-writers.</p> + +<h3><a class='u' +name="add_segment" +>add_segment</a></h3> + +<pre>$seg_writer->add_segment( + reader => $reader # required + doc_map => $doc_map # default: undef +);</pre> + +<p>Add content from an existing segment into the one currently being written.</p> + +<ul> +<li><b>reader</b> - The SegReader containing content to add.</li> + +<li><b>doc_map</b> - An array of integers mapping old document ids to new. +Deleted documents are mapped to 0, +indicating that they should be skipped.</li> +</ul> + +<h3><a class='u' +name="finish" +>finish</a></h3> + +<pre>$seg_writer->finish();</pre> + +<p>Complete the segment: close all streams, +store metadata, +etc.</p> + +<h2><a class='u' +name="INHERITANCE" +>INHERITANCE</a></h2> + +<p>Lucy::Index::SegWriter isa <a href="../../Lucy/Index/DataWriter.html" class="podlinkpod" +>Lucy::Index::DataWriter</a> isa Clownfish::Obj.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/Segment.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/Segment.html (added) +++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/Segment.html Mon Apr 4 09:23:29 2016 @@ -0,0 +1,270 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Index::Segment â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/perl/">Perl</a> » <a href="/docs/perl/Lucy/">Lucy</a> » <a href="/docs/perl/Lucy/Index/">Index</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/perl/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Index::Segment - Warehouse for information about one segment of an inverted index.</p> + +<h2><a class='u' +name="SYNOPSIS" +>SYNOPSIS</a></h2> + +<pre># Index-time. +package MyDataWriter; +use base qw( Lucy::Index::DataWriter ); + +sub finish { + my $self = shift; + my $segment = $self->get_segment; + my $metadata = $self->SUPER::metadata(); + $metadata->{foo} = $self->get_foo; + $segment->store_metadata( + key => 'my_component', + metadata => $metadata + ); +} + +# Search-time. +package MyDataReader; +use base qw( Lucy::Index::DataReader ); + +sub new { + my $self = shift->SUPER::new(@_); + my $segment = $self->get_segment; + my $metadata = $segment->fetch_metadata('my_component'); + if ($metadata) { + $self->set_foo( $metadata->{foo} ); + ... + } + return $self; +}</pre> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>Apache Lucy’s indexes are made up of individual “segments”, +each of which is is an independent inverted index. +On the file system, +each segment is a directory within the main index directory whose name starts with “seg_”: “seg_2”, +“seg_5a”, +etc.</p> + +<p>Each Segment object keeps track of information about an index segment: its fields, +document count, +and so on. +The Segment object itself writes one file, +<code>segmeta.json</code>; besides storing info needed by Segment itself, +the “segmeta” file serves as a central repository for metadata generated by other index components – relieving them of the burden of storing metadata themselves.</p> + +<h2><a class='u' +name="METHODS" +>METHODS</a></h2> + +<h3><a class='u' +name="add_field" +>add_field</a></h3> + +<pre>my $int = $segment->add_field($field);</pre> + +<p>Register a new field and assign it a field number. +If the field was already known, +nothing happens.</p> + +<ul> +<li><b>field</b> - Field name.</li> +</ul> + +<p>Returns: the field’s field number, +which is a positive integer.</p> + +<h3><a class='u' +name="store_metadata" +>store_metadata</a></h3> + +<pre>$segment->store_metadata( + key => $key # required + metadata => $metadata # required +);</pre> + +<p>Store arbitrary information in the segment’s metadata hash, +to be serialized later. +Throws an error if <code>key</code> is used twice.</p> + +<ul> +<li><b>key</b> - String identifying an index component.</li> + +<li><b>metadata</b> - JSON-izable data structure.</li> +</ul> + +<h3><a class='u' +name="fetch_metadata" +>fetch_metadata</a></h3> + +<pre>my $obj = $segment->fetch_metadata($key);</pre> + +<p>Fetch a value from the Segment’s metadata hash.</p> + +<h3><a class='u' +name="field_num" +>field_num</a></h3> + +<pre>my $int = $segment->field_num($field);</pre> + +<p>Given a field name, +return its field number for this segment (which may differ from its number in other segments). +Return 0 (an invalid field number) if the field name can’t be found.</p> + +<ul> +<li><b>field</b> - Field name.</li> +</ul> + +<h3><a class='u' +name="field_name" +>field_name</a></h3> + +<pre>my $string = $segment->field_name($field_num);</pre> + +<p>Given a field number, +return the name of its field, +or undef if the field name can’t be found.</p> + +<h3><a class='u' +name="get_name" +>get_name</a></h3> + +<pre>my $string = $segment->get_name();</pre> + +<p>Getter for the object’s seg name.</p> + +<h3><a class='u' +name="get_number" +>get_number</a></h3> + +<pre>my $int = $segment->get_number();</pre> + +<p>Getter for the segment number.</p> + +<h3><a class='u' +name="set_count" +>set_count</a></h3> + +<pre>$segment->set_count($count);</pre> + +<p>Setter for the object’s document count.</p> + +<h3><a class='u' +name="get_count" +>get_count</a></h3> + +<pre>my $int = $segment->get_count();</pre> + +<p>Getter for the object’s document count.</p> + +<h3><a class='u' +name="compare_to" +>compare_to</a></h3> + +<pre>my $int = $segment->compare_to($other);</pre> + +<p>Compare by segment number.</p> + +<h2><a class='u' +name="INHERITANCE" +>INHERITANCE</a></h2> + +<p>Lucy::Index::Segment isa Clownfish::Obj.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html> Added: websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/Similarity.html ============================================================================== --- websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/Similarity.html (added) +++ websites/staging/lucy/trunk/content/docs/perl/Lucy/Index/Similarity.html Mon Apr 4 09:23:29 2016 @@ -0,0 +1,175 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<html lang="en"> + <head> + <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> + <title>Lucy::Index::Similarity â Apache Lucy Documentation</title> + <link rel="stylesheet" type="text/css" media="screen" href="/css/lucy.css"> + </head> + + <body> + + <div id="lucy-rigid_wrapper"> + + <div id="lucy-top" class="container_16 lucy-white_box_3d"> + + <div id="lucy-logo_box" class="grid_8"> + <a href="/"><img src="/images/lucy_logo_150x100.png" alt="Apache Lucyâ¢"></a> + </div> <!-- lucy-logo_box --> + + <div #id="lucy-top_nav_box" class="grid_8"> + <div id="lucy-top_nav_bar" class="container_8"> + <ul> + <li><a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a></li> + <li><a href="http://www.apache.org/licenses/" title="License">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsorship">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li> + <li><a href="http://www.apache.org/security/ " title="Security">Security</a></li> + </ul> + </div> <!-- lucy-top_nav_bar --> + <p><a href="http://www.apache.org/">Apache</a> » <a href="/">Lucy</a> » <a href="/docs/">Docs</a> » <a href="/docs/perl/">Perl</a> » <a href="/docs/perl/Lucy/">Lucy</a> » <a href="/docs/perl/Lucy/Index/">Index</a></p> + <form name="lucy-top_search_box" id="lucy-top_search_box" action="http://www.google.com/search" method="get"> + <input value="*.apache.org" name="sitesearch" type="hidden"/> + <input type="text" name="q" id="query" style="width:85%"> + <input type="submit" id="submit" value="Search"> + </form> + </div> <!-- lucy-top_nav_box --> + + <div class="clear"></div> + + </div> <!-- lucy-top --> + + <div id="lucy-main_content" class="container_16 lucy-white_box_3d"> + + <div class="grid_4" id="lucy-left_nav_box"> + <h6>About</h6> + <ul> + <li><a href="/">Welcome</a></li> + <li><a href="/clownfish.html">Clownfish</a></li> + <li><a href="/faq.html">FAQ</a></li> + <li><a href="/people.html">People</a></li> + </ul> + <h6>Resources</h6> + <ul> + <li><a href="/download.html">Download</a></li> + <li><a href="/mailing_lists.html">Mailing Lists</a></li> + <li><a href="/docs/perl/">Documentation</a></li> + <li><a href="http://wiki.apache.org/lucy/">Wiki</a></li> + <li><a href="https://issues.apache.org/jira/browse/LUCY">Issue Tracker</a></li> + <li><a href="/version_control.html">Version Control</a></li> + </ul> + <h6>Related Projects</h6> + <ul> + <li><a href="http://lucene.apache.org/core/">Lucene</a></li> + <li><a href="http://dezi.org/">Dezi</a></li> + <li><a href="http://lucene.apache.org/solr/">Solr</a></li> + <li><a href="http://lucenenet.apache.org/">Lucene.NET</a></li> + <li><a href="http://lucene.apache.org/pylucene/">PyLucene</a></li> + </ul> + </div> <!-- lucy-left_nav_box --> + + <div id="lucy-main_content_box" class="grid_9"> + <div> +<a name='___top' class='dummyTopAnchor' ></a> + +<h2><a class='u' +name="NAME" +>NAME</a></h2> + +<p>Lucy::Index::Similarity - Judge how well a document matches a query.</p> + +<h2><a class='u' +name="SYNOPSIS" +>SYNOPSIS</a></h2> + +<pre>package MySimilarity; + +sub length_norm { return 1.0 } # disable length normalization + +package MyFullTextType; +use base qw( Lucy::Plan::FullTextType ); + +sub make_similarity { MySimilarity->new }</pre> + +<h2><a class='u' +name="DESCRIPTION" +>DESCRIPTION</a></h2> + +<p>After determining whether a document matches a given query, +a score must be calculated which indicates how <i>well</i> the document matches the query. +The Similarity class is used to judge how “similar” the query and the document are to each other; the closer the resemblance, +they higher the document scores.</p> + +<p>The default implementation uses Lucene’s modified cosine similarity measure. +Subclasses might tweak the existing algorithms, +or might be used in conjunction with custom Query subclasses to implement arbitrary scoring schemes.</p> + +<p>Most of the methods operate on single fields, +but some are used to combine scores from multiple fields.</p> + +<h2><a class='u' +name="CONSTRUCTORS" +>CONSTRUCTORS</a></h2> + +<h3><a class='u' +name="new" +>new</a></h3> + +<pre>my $sim = Lucy::Index::Similarity->new;</pre> + +<p>Constructor. +Takes no arguments.</p> + +<h2><a class='u' +name="METHODS" +>METHODS</a></h2> + +<h3><a class='u' +name="length_norm" +>length_norm</a></h3> + +<pre>my $float = $similarity->length_norm($num_tokens);</pre> + +<p>Dampen the scores of long documents.</p> + +<p>After a field is broken up into terms at index-time, +each term must be assigned a weight. +One of the factors in calculating this weight is the number of tokens that the original field was broken into.</p> + +<p>Typically, +we assume that the more tokens in a field, +the less important any one of them is – so that, +e.g. +5 mentions of “Kafka” in a short article are given more heft than 5 mentions of “Kafka” in an entire book. +The default implementation of length_norm expresses this using an inverted square root.</p> + +<p>However, +the inverted square root has a tendency to reward very short fields highly, +which isn’t always appropriate for fields you expect to have a lot of tokens on average.</p> + +<h2><a class='u' +name="INHERITANCE" +>INHERITANCE</a></h2> + +<p>Lucy::Index::Similarity isa Clownfish::Obj.</p> + +</div> + + </div> <!-- lucy-main_content_box --> + <div class="clear"></div> + + </div> <!-- lucy-main_content --> + + <div id="lucy-copyright" class="container_16"> + <p>Copyright © 2010-2015 The Apache Software Foundation, Licensed under the + <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. + <br/> + Apache Lucy, Lucy, Apache, the Apache feather logo, and the Apache Lucy project logo are trademarks of The + Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their + respective owners. + </p> + </div> <!-- lucy-copyright --> + + </div> <!-- lucy-rigid_wrapper --> + + </body> +</html>