Hi all, I performed a little test where I index the same set of documents with Nutch (0.9) and Lucene. This is a set of documents from TREC, 134 000+ short text documents.
With Lucene, it took 1H. With Nutch using the file:/ protocol, it took 4H10. Could anyone explain why there is such a difference and is there some way to eliminate part of this overhead ? Regards, -- Marc ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
