I am using nutch 0.8 (with hadoop 0.5 to get around the Java Exception that I have asked a few months ago about) with a custome analyzer plugin and some modification to NutchAnalysis.jj.
I ran "nutch crawl" over the same test site of just three HTML files after clearing the index directory. Two out of three tries, the crawl session only fetches the index page only. Only one run (out of three tries) successfully fetches all pages. All the crawl runs are done using the exact same parameters. Have anybody experienced strange behaviors like this? -kuro ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
