Error with Hadoop-0.4.0
Hi, I encountered some problems with Nutch trunk version. In fact it seems to be related to changes related to Hadoop-0.4.0 and JDK 1.5 (more precisely since HADOOP-129 and File replacement by Path). In my environment, the crawl command terminate with the following error: 2006-07-06 17:41:49,735 ERROR mapred.JobClient (JobClient.java:submitJob(273)) - Input directory /localpath/crawl/crawldb/current in local is invalid. Exception in thread main java.io.IOException: Input directory /localpathcrawl/crawldb/current in local is invalid. at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327) at org.apache.nutch.crawl.Injector.inject(Injector.java:146) at org.apache.nutch.crawl.Crawl.main(Crawl.java:105) By looking at the Nutch code, and simply changing the line 145 of Injector by mergeJob.setInputPath(tempDir) (instead of mergeJob.addInputPath (tempDir)) all is working fine. By taking a closer look at CrawlDb code, I finaly dont understand why there is the following line in the createJob method: job.addInputPath(new Path(crawlDb, CrawlDatum.DB_DIR_NAME)); For curiosity, if a hadoop guru can explain why there is such a regression... Does somebody have the same error? Regards Jérôme -- http://motrech.free.fr/ http://www.frutch.org/
[jira] Resolved: (NUTCH-317) Clarify what the queryLanguage argument of Query.parse(...) means
[ http://issues.apache.org/jira/browse/NUTCH-317?page=all ] Jerome Charron resolved NUTCH-317: -- Fix Version: 0.8-dev Resolution: Fixed Fixed Clarify what the queryLanguage argument of Query.parse(...) means - Key: NUTCH-317 URL: http://issues.apache.org/jira/browse/NUTCH-317 Project: Nutch Type: Bug Components: searcher Versions: 0.8-dev Reporter: KuroSaka TeruHiko Fix For: 0.8-dev API document on Query.parse(String queryString, String queryLang, Configuration conf) does not explain what queryLang is, and should be explained. There can be at least two interpretations: (1) Create a Query that restricts the search to include only the documents written in the specified language. So this would be equivalent of specifying lang:xx where xx is a two-letter language code. (2) Create a Query interpreting the queryString according to the rules of the specified languages. In reality, this is used to select the proper language Analyzer to parse the query string. I am guessing that (2) is intended. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Error with Hadoop-0.4.0
Jérôme Charron wrote: Hi, I encountered some problems with Nutch trunk version. In fact it seems to be related to changes related to Hadoop-0.4.0 and JDK 1.5 (more precisely since HADOOP-129 and File replacement by Path). Does somebody have the same error? I am not seeing this (just run inject on a single machine(linux) configuration, local fs without problems ). -- Sami Siren