Dear All,
 We need to crawl a number of sites which will amount to around 2
million pages and several GB of data. I know this is well within the
ability of Nutch, but I a question about what is the right strategy to
pursue to be able to crawl index with the least amount of bother. 

Obviously I would need a single database create several manageable sized
segments and run the fetcher on each one, and subsequently update the
database, no problem so far. But what would then be the right strategy
to get an index which could be searched? 

Should I :

1) merge all the segments and then index them, or 
2) Should I index each segment individually and then merge the indexes,
keeping the segments separate. Or 
3) Should I index each segment separately, and keep both segments and
indexes separate, and search across multiple indexes (but I have heard
there are issues with the ranking) 

Please let me know your views!!

Thanks a lot!!
Regards



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to