Error with Hadoop-0.4.0

2006-07-06 Thread Jérôme Charron

Hi,

I encountered some problems with Nutch trunk version.
In fact it seems to be related to changes related to Hadoop-0.4.0 and JDK
1.5
(more precisely since HADOOP-129 and File replacement by Path).

In my environment, the crawl command terminate with the following error:
2006-07-06 17:41:49,735 ERROR mapred.JobClient (JobClient.java:submitJob(273))
- Input directory /localpath/crawl/crawldb/current in local is invalid.
Exception in thread "main" java.io.IOException: Input directory
/localpathcrawl/crawldb/current in local is invalid.
   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274)
   at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327)
   at org.apache.nutch.crawl.Injector.inject(Injector.java:146)
   at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)

By looking at the Nutch code, and simply changing the line 145 of Injector
by mergeJob.setInputPath(tempDir) (instead of mergeJob.addInputPath
(tempDir))
all is working fine. By taking a closer look at CrawlDb code, I finaly don"t
understand why there is the following line in the createJob method:
job.addInputPath(new Path(crawlDb, CrawlDatum.DB_DIR_NAME));

For curiosity, if a hadoop guru can explain why there is such a
regression...

Does somebody have the same error?

Regards

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/


[jira] Resolved: (NUTCH-317) Clarify what the queryLanguage argument of Query.parse(...) means

2006-07-06 Thread Jerome Charron (JIRA)
 [ http://issues.apache.org/jira/browse/NUTCH-317?page=all ]
 
Jerome Charron resolved NUTCH-317:
--

Fix Version: 0.8-dev
 Resolution: Fixed

Fixed

> Clarify what the queryLanguage argument of Query.parse(...) means
> -
>
>  Key: NUTCH-317
>  URL: http://issues.apache.org/jira/browse/NUTCH-317
>  Project: Nutch
> Type: Bug

>   Components: searcher
> Versions: 0.8-dev
> Reporter: KuroSaka TeruHiko
>  Fix For: 0.8-dev

>
> API document on 
>   Query.parse(String queryString,
>   String queryLang,
>   Configuration conf)
> does not explain what queryLang is, and should be explained.
> There can be at least two interpretations:
> (1) Create a Query that restricts the search to include only the documents 
> written in the specified language. So this would
> be equivalent of specifying "lang:xx" where xx is a two-letter language code.
> (2) Create a Query interpreting the queryString according to the rules of the 
> specified languages.  In reality, this is used to
> select the proper language Analyzer to parse the query string.
> I am guessing that (2) is intended.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



Re: Error with Hadoop-0.4.0

2006-07-06 Thread Sami Siren

Jérôme Charron wrote:


Hi,

I encountered some problems with Nutch trunk version.
In fact it seems to be related to changes related to Hadoop-0.4.0 and JDK
1.5
(more precisely since HADOOP-129 and File replacement by Path).
Does somebody have the same error?


I am not seeing this (just run inject on a single machine(linux) 
configuration, local fs without problems ).


--
Sami Siren


Re: Error with Hadoop-0.4.0

2006-07-06 Thread Jérôme Charron

> I encountered some problems with Nutch trunk version.
> In fact it seems to be related to changes related to Hadoop-0.4.0 and
JDK
> 1.5
> (more precisely since HADOOP-129 and File replacement by Path).
> Does somebody have the same error?

I am not seeing this (just run inject on a single machine(linux)
configuration, local fs without problems ).


Thanks for your feedback Sami.
The strange think is that I have exactly the same behavior on two different
boxes !!

Jérôme