Error with Hadoop-0.4.0

2006-07-06 Thread Jérôme Charron

Hi,

I encountered some problems with Nutch trunk version.
In fact it seems to be related to changes related to Hadoop-0.4.0 and JDK
1.5
(more precisely since HADOOP-129 and File replacement by Path).

In my environment, the crawl command terminate with the following error:
2006-07-06 17:41:49,735 ERROR mapred.JobClient (JobClient.java:submitJob(273))
- Input directory /localpath/crawl/crawldb/current in local is invalid.
Exception in thread main java.io.IOException: Input directory
/localpathcrawl/crawldb/current in local is invalid.
   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274)
   at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327)
   at org.apache.nutch.crawl.Injector.inject(Injector.java:146)
   at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)

By looking at the Nutch code, and simply changing the line 145 of Injector
by mergeJob.setInputPath(tempDir) (instead of mergeJob.addInputPath
(tempDir))
all is working fine. By taking a closer look at CrawlDb code, I finaly dont
understand why there is the following line in the createJob method:
job.addInputPath(new Path(crawlDb, CrawlDatum.DB_DIR_NAME));

For curiosity, if a hadoop guru can explain why there is such a
regression...

Does somebody have the same error?

Regards

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/


[jira] Resolved: (NUTCH-317) Clarify what the queryLanguage argument of Query.parse(...) means

2006-07-06 Thread Jerome Charron (JIRA)
 [ http://issues.apache.org/jira/browse/NUTCH-317?page=all ]
 
Jerome Charron resolved NUTCH-317:
--

Fix Version: 0.8-dev
 Resolution: Fixed

Fixed

 Clarify what the queryLanguage argument of Query.parse(...) means
 -

  Key: NUTCH-317
  URL: http://issues.apache.org/jira/browse/NUTCH-317
  Project: Nutch
 Type: Bug

   Components: searcher
 Versions: 0.8-dev
 Reporter: KuroSaka TeruHiko
  Fix For: 0.8-dev


 API document on 
   Query.parse(String queryString,
   String queryLang,
   Configuration conf)
 does not explain what queryLang is, and should be explained.
 There can be at least two interpretations:
 (1) Create a Query that restricts the search to include only the documents 
 written in the specified language. So this would
 be equivalent of specifying lang:xx where xx is a two-letter language code.
 (2) Create a Query interpreting the queryString according to the rules of the 
 specified languages.  In reality, this is used to
 select the proper language Analyzer to parse the query string.
 I am guessing that (2) is intended.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



Re: Error with Hadoop-0.4.0

2006-07-06 Thread Sami Siren

Jérôme Charron wrote:


Hi,

I encountered some problems with Nutch trunk version.
In fact it seems to be related to changes related to Hadoop-0.4.0 and JDK
1.5
(more precisely since HADOOP-129 and File replacement by Path).
Does somebody have the same error?


I am not seeing this (just run inject on a single machine(linux) 
configuration, local fs without problems ).


--
Sami Siren