Re: A solution to HitCollector-based searches problems

oramas martín Thu, 08 Mar 2007 11:24:09 -0800

Hello,

I have just added some search implementation samples based on this collector
solution, to easy the use and understanding or it:


   - KeywordSearch: Extract the terms (and frequency) found in a list of
fields
                    from the results of a query/filter search

   - GoogleSearch: Return an ordered search result grouped a la Google,
based
                   on the terms found in a list of fields

   - GetFieldNamesOp: Operation to mimic the getFieldNames method of
                      IndexReader but using a searcher. With it, it is
possible
                      to explore the fields of remote indexes.

See http://sourceforge.net/projects/lucollector/ for the source code (
lu-collector-src-sampleop-0.8.zip).

Regards,
José L. Oramas

On 2/26/07, oramas martín <[EMAIL PROTECTED]> wrote:



Hello,

As you probably know, the HitCollector-based search API is not meant to
work remotely, because it will generate a RPC-callback for every non-zero
score.

There is another problem with MultiSearcher-HitCollector-based search
which knows nothing about mix HitCollector based searches (not to say it has
hardcode the way to mix TopDocs for the score and for the Sort searches).
Also the ParallelMultiSearcher inherits this problems and is unable to
parallelize the HitCollector-based searcher.

A final problem with the HitCollector-based search is related to the lost
of a limit in the results, as the Hits class implements thought the
getMoreDocs() function, and lazy loading and caching of documents it does.


To solve those problems it is necessary a factory (HitCollectorSource)
able to generate collectors for single (SingleHitCollector) an multi
(MultiHitCollector) searches, and a new search method in the
Searchable interface for it. To avoid modifications to the lucene core, the
later requirement is NOT IMPLEMENTED in the library I have just created.
Instead, an ugly solution, a wrapper for those searchers
(SearcherHCSourceWrapper) and a Filter wrapper (FilterHitCollectorSource) to
carry the factory-based searches, is provided.

Each collector is based in a two steps sequence, one for collecting hits
or subsearcher results, and another for generating the final result.

Also, just in case you don't want to add a wrapper to each searcher of
your project, there is an adapted version of IndexSearcher, MultiSearcher
and ParallelMultiSearcher (only for version 2.1) modified exactly the same
way the wrapper class SearcherHCSourceWrapper does. Just put them in your
class-path (before the Lucene core jar) and you will be using the new
collector interfaces without modifying your code.

There are some unit testing (copied and adapted from the Lucene 
2.1distribution).

See http://sourceforge.net/projects/lucollector/ for the jar files and the
code.

If you find it interesting to complement the Lucene project, tell me how
to put it in the contribution area.

Regards,
José L. Oramas

Re: A solution to HitCollector-based searches problems

Reply via email to