Massimo Miccoli wrote:
The first OutOfMemoryError was in the line below. This page is a looping Php error page with infinite size.
How to skip that kind of pages?


040602 222837 fetch of http://www.icnrd5-mongolia.mn/phprint.php failed with: java.lang.OutOfMemoryError

Looks like a bug in the chunked encoding handler. Add the following to your nutch-site.xml and this should go away:


<property>
  <name>http.version.1.1</name>
  <value>false</value>
</property>

Is there a reason why Nutch should use HTTP 1.1? HTTP 1.0 is much simpler, and all servers support it. From the server logs I've found, it looks like Googlebot and Yahoo Slurp just use 1.0. Any reason Nutch shouldn't too? That would make the fetcher that much simpler, and hence more reliable.

If there are no objections, I'll remove the HTTP 1.1 support.

Doug


------------------------------------------------------- This SF.Net email is sponsored by the new InstallShield X.
From Windows to Linux, servers to mobile, InstallShield X is the one
installation-authoring solution that does it all. Learn more and
evaluate today! http://www.installshield.com/Dev2Dev/0504
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to