[EMAIL PROTECTED] wrote:
Yes, that is the way I do my fetch/search cycles:
first round fetch text/html only, basically collect as many links as possbile
second round, application/msword,
third round, application/pdf,
...
all can go in parallel, and provide better storage management,
for pdf, doc are typically much larger than html and
you do not want to mix them with html in the same segment.
Why don't you want to mix them?
Doug
-------------------------------------------------------
This SF.Net email is sponsored by: SourceForge.net Broadband
Sign-up now for SourceForge Broadband and get the fastest
6.0/768 connection for only $19.95/mo for the first 3 months!
http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers