not hangup,this is caused by hadoop behavior. The fetched content were kept in your tmp dir first, and will be moved to your segmentdir during the reduce process. you may find the tmp dir like "c:\tmp" or "\tmp" in your system.
----- Original Message ----- From: "cesar voulgaris" <[EMAIL PROTECTED]> To: <[email protected]> Sent: Tuesday, February 13, 2007 10:02 AM Subject: fetcher hangs up? > hi, maybe someone who has the same problem can help me: > > I started a crawl, at a certain depth the fetchers logs out the urls > aparently correct, but from two days!! it seems to > be fetching the same site (a big one but not so big). What disturbs me is > that the segment directory is always the same size > (du -hs segmentdir) it only has crawl_generate as a subdir. Does nutch has a > temporary dir, where it stores the fetches until it > write the other subdirs?...maybe it is hangup?. It hapened two times in > diferent crawls (I didi several crawls,not to common) > ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
