Witch cvs version you use?
I have fetch for around 8 days nonstop without any trouble, in case you do not use throttling mechanism.


Stefan


Am 03.06.2004 um 14:24 schrieb Massimo Miccoli:

Hi,

I have tryed the fetcher many times getting at some point the java.lang.OutOfMemoryError,
so I send to list my log of Fetcher.


The first OutOfMemoryError was in the line below. This page is a looping Php error page with infinite size.
How to skip that kind of pages?


040602 222837 fetch of http://www.icnrd5-mongolia.mn/phprint.php failed with: java.lang.OutOfMemoryError

After the fetcher continue to work but, many line afeter another OutOfMemoryError. And after many
errors like: key out of order: 3176574 after 3176574


At the last (see log) the Fetcher died. Any help?

Note that I use net.nutch.fetcher.Fetcher and not net.nutch.fetcher.RequestScheduler

The fetchlist contain 6000000 urls.


040603 050121 fetching http://www.spiffysoftware.com/Default.asp
040603 050121 fetched 31634 bytes from http://www.city.osaka.jp/kyouiku/sisetu/establish03.html
040603 050121 SEVERE error writing output:java.lang.OutOfMemoryError
040603 050121 fetched 11723 bytes from http://www.naim-audio.com/
040603 050059 fetching http://www.1000grad.de/robots.txt
040603 050059 fetching http://www.lebensmittelwelt.de/Lebensmittelverpackungen/
040603 050121 fetching http://www.lebensmittelwelt.de/Lebensmittelverpackungen/
040603 050059 fetching http://www.netmedica.ro/index.php
040603 050121 found 24 outlinks in http://www.tagf.co.uk/location.shtml
040603 050121 SEVERE error writing output:java.io.IOException: key out of order: 3176574 after 3176574
040603 050059 fetching http://www.itu.int/aboutitu/overview/history.html
040603 050121 fetching http://www.itu.int/aboutitu/overview/history.html
040603 050121 fetching plain!
040603 050121 fetched 2469 bytes from http://www.butterbrot.de/butterbrot/brotzeit21.html
040603 050059 fetching http://indiaaccess.com/support/Software%20for%20Frontpage2000.asp
040603 050121 fetching http://indiaaccess.com/support/Software%20for%20Frontpage2000.asp
040603 050059 fetched 43360 bytes from http://www.nascar.com/races/tracks/mis/index.html
040603 050059 fetching http://www.zeroonerealty.com/robots.txt
040603 050121 found 24 outlinks in http://www.rootsweb.com/%7Eflafram/Slavedata.htm
040603 050059 Http: starting chunk
040603 050121 Http: starting chunk
040603 050121 fetched 47413 bytes from http://www.galatasaray.com/haberler/9430
040603 050121 SEVERE error writing output:java.io.IOException: key out of order: 3176574 after 3176574
040603 050059 found 29 outlinks in http://www.midwestdirttrackfacts.com/weekly_results.htm
040603 050121 SEVERE error writing output:java.io.IOException: key out of order: 3176574 after 3176574
040603 050100 fetching chunked!
040603 050121 Http: starting chunk
.....
040603 050148 SEVERE error writing output:java.io.IOException: key out of order: 3176574 after 3176574
040603 050148 SEVERE error writing output:java.io.IOException: key out of order: 3176574 after 3176574
040603 050148 SEVERE error writing output:java.io.IOException: key out of order: 3176574 after 3176574
040603 050148 SEVERE error writing output:java.io.IOException: key out of order: 3176574 after 3176574
040603 050148 SEVERE error writing output:java.io.IOException: key out of order: 3176574 after 3176574
040603 050148 SEVERE error writing output:java.io.IOException: key out of order: 3176574 after 3176574
040603 050148 SEVERE error writing output:java.io.IOException: key out of order: 3176574 after 3176574
040603 050148 SEVERE error writing output:java.io.IOException: key out of order: 3176574 after 3176574
040603 050148 SEVERE error writing output:java.io.IOException: key out of order: 3176574 after 3176574
040603 050148 SEVERE error writing output:java.io.IOException: key out of order: 3176574 after 3176574
040603 050148 SEVERE error writing output:java.io.IOException: key out of order: 3176574 after 3176574
040603 050148 SEVERE error writing output:java.io.IOException: key out of order: 3176574 after 3176574
040603 050148 fetching chunked!
040603 050148 Http: starting chunk
040603 050148 Http: starting chunk
040603 050148 fetched 312 bytes from http://www.zeroonerealty.com/
040603 050148 redirect to http://www.zerooneentertainment.org/zeroonerealty
040603 050148 fetching http://www.zerooneentertainment.org/zeroonerealty
....
040603 050121 Http: starting chunk
040603 050121 Http: starting chunk
040603 050059 found 32 outlinks in http://www.monroe.lib.in.us/childrens/earlych.html
040603 050121 fetching plain!
040603 050121 SEVERE error writing output:java.io.IOException: key out of order: 3176574 after 3176574
Exception in thread "main" java.lang.RuntimeException: SEVERE error logged. Exiting fetcher.
at net.nutch.fetcher.Fetcher.run(Fetcher.java:676)
at net.nutch.fetcher.Fetcher.main(Fetcher.java:755)
040603 050121 found 8 outlinks in http://wlapwww.gov.bc.ca/bcparks/campfire_bans_2004.htm
040603 050121 Http: starting chunk
040603 050121 Http: starting chunk
....
040603 050121 fetched 16947 bytes from http://www.itu.int/aboutitu/overview/history.html
040603 050121 Http: starting chunk
040603 050121 Http: starting chunk
040603 050121 found 24 outlinks in http://www.naim-audio.com/index.htm
040603 050121 SEVERE error writing output:java.io.IOException: key out of order: 3176574 after 3176574
.....




-------------------------------------------------------
This SF.Net email is sponsored by the new InstallShield X.
From Windows to Linux, servers to mobile, InstallShield X is the one
installation-authoring solution that does it all. Learn more and
evaluate today! http://www.installshield.com/Dev2Dev/0504
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers


---------------------------------------------------------------
open technology:   http://www.media-style.com
open source:           http://www.weta-group.net
open discussion:    http://www.text-mining.org



-------------------------------------------------------
This SF.Net email is sponsored by the new InstallShield X.
From Windows to Linux, servers to mobile, InstallShield X is the one
installation-authoring solution that does it all. Learn more and
evaluate today! http://www.installshield.com/Dev2Dev/0504
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to