Nutch Newbie wrote:
Hi:
Could some please be kind enough to confirm if the 0.9-dev trunk is
broken. I did a total of 4 fresh install and every time I am getting
stuck in indexing/reduce process. (Yes Speculative = false).
It would feel much better if I am not the only one with this problem!
Thank you for your help.
On 1/8/07, Nutch Newbie <[EMAIL PROTECTED]> wrote:
Hi:
I am getting the following error after updating to revision 494024. My
Hadoop-site.xml (mapred.speculative) set to false .. I am not sure
what I am doing wrong.. everything worked before the update.. Any
help..
Regards
Language identifier configuration [1-4/2048]
map 100% reduce 0%
Language identifier plugin supports: it(1000) is(1000) hu(1000)
th(1000) sv(1000) fr(1000) ru(1000) fi(1000) es(1000) en(1000)
el(1000) ee(1000) pt(1000) de(1000) da(1000) pl(1000) no(1000)
nl(1000)
Adding org.apache.nutch.analysis.lang.LanguageIndexingFilter
running sort pass
flushing segment 0
reduce > sort
found resource common-terms.utf8 at
file:/usr/local/nutch-0.9-dev/conf/common-terms.utf8
Optimizing index.
Optimizing index.
job_qmhsvz
java.lang.RuntimeException: Unexpected status: 67
at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:198)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:307)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:137)
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:399)
at org.apache.nutch.indexer.Indexer.index(Indexer.java:297)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:134)
I can confirm that indeed it is a bug. I'll provide a patch soon - in
the meantime you can just remove the "throws" clause - other datums will
simply be ignored.
The underlying issue is quite interesting - the status code that it's
complaining about is CrawlDatum.STATUS_LINKED, which indicates a page
that was redirected. However, as you can see there are probably some
inlinks pointing to this page. Now, the question is - should we discard
this page (and index only the target)? The answer is not simple.
BTW. if you guys are brave enough to use the bleeding-edge from SVN,
then you are expected to discuss any issues that may arise from its use
on nutch-dev - this mailing list is for users of regular releases, or
stable versions ...
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com