Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by MikeBrzozowski: http://wiki.apache.org/nutch/MonitoringNutchCrawls ------------------------------------------------------------------------------ 2. Run your preferred crawl script with nohup, like this: `nohup <nutch crawl command or script> &` 3. By default, this will output to nohup.out in the working directory. From the same directory, run: `sh monitorCrawl.sh` + (Alternately, you can process hadoop.log in the logs/ directory by changing the three references to `nohup.out` to `hadoop.log`.) + This will give you minute-by-minute stats on how many pages nutch tried to fetch and how many failed with errors (e.g. 404, server unreachable). ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-cvs mailing list Nutch-cvs@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-cvs