Hi all,
I performed a little test where I index the same set of documents with
Nutch (0.9) and Lucene.
This is a set of documents from TREC, 134 000+ short text documents.

With Lucene, it took 1H. With Nutch using the file:/ protocol, it took
4H10.

Could anyone explain why there is such a difference and is there some
way to eliminate part of this overhead ?

Regards,
--
Marc



-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to