[Nutch-cvs] [Nutch Wiki] Update of "MonitoringNutchCrawls" by MikeBrzozowski

Apache Wiki Wed, 31 Jan 2007 10:31:28 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.


The following page has been changed by MikeBrzozowski:
http://wiki.apache.org/nutch/MonitoringNutchCrawls

------------------------------------------------------------------------------
   2. Run your preferred crawl script with nohup, like this: `nohup <nutch 
crawl command or script> &`
   3. By default, this will output to nohup.out in the working directory. From 
the same directory, run: `sh monitorCrawl.sh`
  
- (Alternately, you can process hadoop.log in the logs/ directory by changing 
the three references to `nohup.out` to `hadoop.log`.)
+ (Alternately, you can process hadoop.log in the logs/ directory by changing 
the three references to `nohup.out` to `hadoop.log`. Be aware, though, that by 
default hadoop.log only contains activity from today, so your counts will reset 
to zero each night.)
  
  This will give you minute-by-minute stats on how many pages nutch tried to 
fetch and how many failed with errors (e.g. 404, server unreachable).
  

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-cvs mailing list
Nutch-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-cvs

[Nutch-cvs] [Nutch Wiki] Update of "MonitoringNutchCrawls" by MikeBrzozowski

Reply via email to