Re: [Nutch-general] Error after SVN update

Nutch Newbie Tue, 09 Jan 2007 15:08:14 -0800

Thank you for your confirmation Andrzej!

Yes, Next time I will report it to the dev list :-p


Regards



On 1/9/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> Nutch Newbie wrote:
> > Hi:
> >
> > Could some please be kind enough to confirm if the 0.9-dev trunk is
> > broken. I did a total of 4 fresh install and every time I am getting
> > stuck in indexing/reduce process. (Yes Speculative = false).
> >
> > It would feel much better if I am not the only one with this problem!
> >
> > Thank you for your help.
> >
> >
> > On 1/8/07, Nutch Newbie <[EMAIL PROTECTED]> wrote:
> >> Hi:
> >>
> >> I am getting the following error after updating to revision 494024. My
> >> Hadoop-site.xml (mapred.speculative) set to false .. I am not sure
> >> what I am doing wrong.. everything worked before the update.. Any
> >> help..
> >>
> >> Regards
> >>
> >> Language identifier configuration [1-4/2048]
> >>  map 100% reduce 0%
> >> Language identifier plugin supports: it(1000) is(1000) hu(1000)
> >> th(1000) sv(1000) fr(1000) ru(1000) fi(1000) es(1000) en(1000)
> >> el(1000) ee(1000) pt(1000) de(1000) da(1000) pl(1000) no(1000)
> >> nl(1000)
> >> Adding org.apache.nutch.analysis.lang.LanguageIndexingFilter
> >> running sort pass
> >> flushing segment 0
> >> reduce > sort
> >> found resource common-terms.utf8 at
> >> file:/usr/local/nutch-0.9-dev/conf/common-terms.utf8
> >> Optimizing index.
> >> Optimizing index.
> >> job_qmhsvz
> >> java.lang.RuntimeException: Unexpected status: 67
> >>         at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:198)
> >>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:307)
> >>         at
> >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:137)
> >> Exception in thread "main" java.io.IOException: Job failed!
> >>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:399)
> >>         at org.apache.nutch.indexer.Indexer.index(Indexer.java:297)
> >>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:134)
>
> I can confirm that indeed it is a bug. I'll provide a patch soon - in
> the meantime you can just remove the "throws" clause - other datums will
> simply be ignored.
>
> The underlying issue is quite interesting - the status code that it's
> complaining about is CrawlDatum.STATUS_LINKED, which indicates a page
> that was redirected. However, as you can see there are probably some
> inlinks pointing to this page. Now, the question is - should we discard
> this page (and index only the target)? The answer is not simple.
>
> BTW. if you guys are brave enough to use the bleeding-edge from SVN,
> then you are expected to discuss any issues that may arise from its use
> on nutch-dev - this mailing list is for users of regular releases, or
> stable versions ...
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] Error after SVN update

Reply via email to