[ http://issues.apache.org/jira/browse/NUTCH-361?page=all ]
Uros Gruber updated NUTCH-361: ------------------------------ Attachment: partition.diff Patch to check number of reduce tasks and set it to 1 in case it is set to 0. > generator create fetchlist randomly > ----------------------------------- > > Key: NUTCH-361 > URL: http://issues.apache.org/jira/browse/NUTCH-361 > Project: Nutch > Issue Type: Bug > Components: fetcher > Affects Versions: 0.9.0 > Environment: Java 1.5, FreeBSD 6.1 > Reporter: Uros Gruber > Priority: Critical > Attachments: partition.diff > > > I noticed problems during generating fetchlist. I already post some info at > the users list. Today I check release 0.8 and I'm certain that problem is > only in version later than this. I've do testnig only on 0.8 and svn from > today. > The problem is that generator generate fetchlist from crawldb but everytime i > run there is different number of urls in fetchlist. > For example I put 6 test urls we have for testing and only 5 of 20 test there > were all urls listed in fetchlist, sometimes onyl one. Config was always the > same also when testing at version 0.8. > I try to debug what might go wrong but I only end up that in /tmp there were > all urls but somehow missed in crawl_generate > I also se some of > 2006-09-02 20:14:20,147 DEBUG conf.Configuration - java.io.IOException: > config(config) > at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:76) > at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:87) > at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:98) > at org.apache.nutch.util.NutchJob.<init>(NutchJob.java:26) > at org.apache.nutch.crawl.Generator.generate(Generator.java:330) > at org.apache.nutch.crawl.Generator.run(Generator.java:405) > at org.apache.nutch.util.ToolBase.doMain(ToolBase.java:145) > at org.apache.nutch.crawl.Generator.main(Generator.java:372) > if I enable DEBUG loging but I doubt that this has anything to do with this. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers