Gal Nitzan wrote:
If I understand correctly, having one segment or a hundred is not important?
It depends. If you have hundreds of segments and are trying to search them with a single JVM then you will probably run out of file handles.
What happens when a page is fetched a second time? is there something to deduplicate it?
The dedup command has not yet been implemented in the mapred branch. Coming soon.
Doug ------------------------------------------------------- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
