Hello, I have just added some search implementation samples based on this collector solution, to easy the use and understanding or it:
- KeywordSearch: Extract the terms (and frequency) found in a list of fields from the results of a query/filter search - GoogleSearch: Return an ordered search result grouped a la Google, based on the terms found in a list of fields - GetFieldNamesOp: Operation to mimic the getFieldNames method of IndexReader but using a searcher. With it, it is possible to explore the fields of remote indexes. See http://sourceforge.net/projects/lucollector/ for the source code ( lu-collector-src-sampleop-0.8.zip). Regards, José L. Oramas On 2/26/07, oramas martín <[EMAIL PROTECTED]> wrote:
Hello, As you probably know, the HitCollector-based search API is not meant to work remotely, because it will generate a RPC-callback for every non-zero score. There is another problem with MultiSearcher-HitCollector-based search which knows nothing about mix HitCollector based searches (not to say it has hardcode the way to mix TopDocs for the score and for the Sort searches). Also the ParallelMultiSearcher inherits this problems and is unable to parallelize the HitCollector-based searcher. A final problem with the HitCollector-based search is related to the lost of a limit in the results, as the Hits class implements thought the getMoreDocs() function, and lazy loading and caching of documents it does. To solve those problems it is necessary a factory (HitCollectorSource) able to generate collectors for single (SingleHitCollector) an multi (MultiHitCollector) searches, and a new search method in the Searchable interface for it. To avoid modifications to the lucene core, the later requirement is NOT IMPLEMENTED in the library I have just created. Instead, an ugly solution, a wrapper for those searchers (SearcherHCSourceWrapper) and a Filter wrapper (FilterHitCollectorSource) to carry the factory-based searches, is provided. Each collector is based in a two steps sequence, one for collecting hits or subsearcher results, and another for generating the final result. Also, just in case you don't want to add a wrapper to each searcher of your project, there is an adapted version of IndexSearcher, MultiSearcher and ParallelMultiSearcher (only for version 2.1) modified exactly the same way the wrapper class SearcherHCSourceWrapper does. Just put them in your class-path (before the Lucene core jar) and you will be using the new collector interfaces without modifying your code. There are some unit testing (copied and adapted from the Lucene 2.1distribution). See http://sourceforge.net/projects/lucollector/ for the jar files and the code. If you find it interesting to complement the Lucene project, tell me how to put it in the contribution area. Regards, José L. Oramas