Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by MikeBrzozowski:
http://wiki.apache.org/nutch/MonitoringNutchCrawls

------------------------------------------------------------------------------
  Of course, the bandwidth alone doesn't tell the whole story. How many pages 
are you retrieving? How many failed?
  
  Here's a quick little shell script to do this; I'm sure people can improve on 
this--edit this page if so!
- 
+ {{{ 
  #!/bin/sh
  echo "Monitoring nohup.out crawl progress..."
  while :
@@ -24, +24 @@

    echo "Tried `grep 'fetching' nohup.out | wc -l` pages; `grep 'failed' 
nohup.out | wc -l` failed."
    sleep 60
  done
- 
+ }}}
  === To run this script: ===
-  1. Save this script as something like monitorCrawl.sh
+  1. Save this script as something like `monitorCrawl.sh`
-  2. Run your preferred crawl script with nohup, like this: nohup <nutch crawl 
command or script> &
+  2. Run your preferred crawl script with nohup, like this: `nohup <nutch 
crawl command or script> &`
-  3. By default, this will output to nohup.out in the working directory. From 
the same directory, run: sh monitorCrawl.sh
+  3. By default, this will output to nohup.out in the working directory. From 
the same directory, run: `sh monitorCrawl.sh`
  
  This will give you minute-by-minute stats on how many pages nutch tried to 
fetch and how many failed with errors (e.g. 404, server unreachable).
  

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-cvs mailing list
Nutch-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-cvs

Reply via email to