Alexander Aristov wrote:
Hi
Thank you for Katta
But are there any built-in Nutch functionality which can do this stuff. What
I am looking forward is to make distributed search as I am planning to build
an index of quite big size and so it will be not possible to keep it on one
server.
What are best practices for doing this?
There is no built-in single tool in Nutch to do this. Common practice is
to create indexes per segment (without merging them), and deploy pairs
of segment plus its index to the search servers, and then doing the
index merging there, on each search server. Whenever you add new
segments or remove old ones, you perform a merge of the new set of
active indexes on each search server.
This way it's easy to phase out outdated segments and their indexes, and
adding new segments, while still using a merged index on each search
server for maximum performance.
PS. it's possible to implement a low-level Lucene tool to split indexes,
using FilteredIndexReader and IndexWriter.addIndexes(...). But it's not
that relevant if you use the strategy that I explained above.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com