Bugs item #1088877, was opened at 2004-12-21 07:30
Message generated for change (Comment added) made by joa23
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=1088877&group_id=59548

Category: tools
Group: None
>Status: Closed
>Resolution: Works For Me
Priority: 5
Submitted By: Dawid Weiss (dawidweiss)
Assigned to: Nobody/Anonymous (nobody)
Summary: Crawler ends prematurely with an IOException 

Initial Comment:

I start the crawler with the usual arguments, only I
wanted to have the entire site crawled, so I boosted
the depth a bit --

../bin/nutch crawl urls -dir crawl.35 -depth 35
>&crawl.log 

The crawler ends after an hour and a half with an 
exception (I repeated the crawling three times, it
always ends with it). Let me know if I can provide you
with any more information.

041220 183639 status: segment 20041220183513, 265
pages, 23 errors, 10445387 bytes, 76821 ms
041220 183639 status: 3.4495776 pages/s, 1062.2693
kb/s, 39416.555 bytes/page
041220 183640 Updating D:\nutch\tmp\crawl.35\db
041220 183640 Updating for
D:\nutch\tmp\crawl.35\segments\20041220183513
041220 183640 Processing document 0
041220 183641 Finishing update
041220 183641 Processing pagesByURL: Sorted 1295
instructions in 0.03 seconds.
041220 183641 Processing pagesByURL: Sorted
43166.66666666667 instructions/second
java.io.IOException: already exists:
D:\nutch\tmp\crawl.35\db\webdb.new\pagesByURL
        at net.nutch.io.MapFile$Writer.<init>(MapFile.java:67)
        at
net.nutch.db.WebDBWriter$CloseProcessor.closeDown(WebDBWriter.java:536)
        at net.nutch.db.WebDBWriter.close(WebDBWriter.java:1531)
        at
net.nutch.tools.UpdateDatabaseTool.close(UpdateDatabaseTool.java:301)
        at
net.nutch.tools.UpdateDatabaseTool.main(UpdateDatabaseTool.java:351)
        at net.nutch.tools.CrawlTool.main(CrawlTool.java:128)
Exception in thread "main" 

----------------------------------------------------------------------

>Comment By: Stefan Groschupf (joa23)
Date: 2005-03-10 20:14

Message:
Logged In: YES 
user_id=396197

this works on my system. can you please reopen in our new bug tracking 
in case the latest source code in subversion has still this problem.
http://issues.apache.org/jira/browse/Nutch
thanks.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=1088877&group_id=59548


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to