The first OutOfMemoryError was in the line below. This page is a looping Php error page with infinite size.
How to skip that kind of pages?
040602 222837 fetch of http://www.icnrd5-mongolia.mn/phprint.php failed with: java.lang.OutOfMemoryError
Looks like a bug in the chunked encoding handler. Add the following to your nutch-site.xml and this should go away:
<property> <name>http.version.1.1</name> <value>false</value> </property>
Is there a reason why Nutch should use HTTP 1.1? HTTP 1.0 is much simpler, and all servers support it. From the server logs I've found, it looks like Googlebot and Yahoo Slurp just use 1.0. Any reason Nutch shouldn't too? That would make the fetcher that much simpler, and hence more reliable.
If there are no objections, I'll remove the HTTP 1.1 support.
Doug
------------------------------------------------------- This SF.Net email is sponsored by the new InstallShield X.
From Windows to Linux, servers to mobile, InstallShield X is the one
installation-authoring solution that does it all. Learn more and evaluate today! http://www.installshield.com/Dev2Dev/0504 _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
