Re: [jira] Commented: (NUTCH-266) hadoop bug when doing updatedb
Are you planning to update Hadoop to trunk/ ? I'd rather be careful with that - I'm not sure if it's still compatible with Java 1.4, besides being unreleased/unstable ... Not planning an upgrade, just wan't to know if it resolves the issues. We can then decide what's the best thing to do. -- Sami Siren
Re: [jira] Commented: (NUTCH-266) hadoop bug when doing updatedb
Sami Siren (JIRA) wrote: [ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12422929 ] Sami Siren commented on NUTCH-266: -- I finally found the time to setup an environment with cygwin and try this out. I can confirm that the hadoop.jar version provided with nutch gives these errors. I then checked tested nutch with hadoop nightly jar and everything worked just fine. Can someone try the hadoop nightly jar with nutch and see if it works for you. Nightly builds for hadoop are available from http://people.apache.org/dist/lucene/hadoop/nightly/ Are you planning to update Hadoop to trunk/ ? I'd rather be careful with that - I'm not sure if it's still compatible with Java 1.4, besides being unreleased/unstable ... -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
RE: [jira] Commented: (NUTCH-266) hadoop bug when doing updatedb
Thank you for your reply, Sami. > >I am not intend to run hadoop at all, so this > hadoop-site.xlm is empty. ... > You should at least set values for 'mapred.system.dir' and 'mapred.local.dir' > and point them to a dir that has enough space available (I think they > default to under /tmp at least on my system wich is far too small for > larger jobs) OK, I just copied the definitions for these properties from hadoop-default.xml and prepended "C:" to each value so that they really refer to C:\tmp. C: has 65 GB free space and this practice crawl crawls a directory that contain 20 documents with total byte count less than 10 MB. So I figure C: has more than adequate free space. But I've still got the same error: 2006-06-22 10:54:01,548 WARN mapred.LocalJobRunner (LocalJobRunner.java:run(119)) - job_x5jmir java.io.IOException: Couldn't rename C:/tmp/hadoop/mapred/local/map_ye7oza/part-0.out at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:102) Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:342) at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:55) at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) After the nutch exited, I checked the directory; C:/tmp/hadoop/mapred/local/map_ye7oza/ does exist but there was not a file called part-0.out. The directory was empty. I'd appreciate any other suggestions you might have. -kuro
Re: [jira] Commented: (NUTCH-266) hadoop bug when doing updatedb
KuroSaka TeruHiko (JIRA) wrote: [ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12416945 ] KuroSaka TeruHiko commented on NUTCH-266: - I am experiencing pretty much the same symptom with the nighly builds of 5/31/2006 up to 6/14/2006, which I tested the last time. Here's the result of my "nutch crawl" run with DEBUG level log turned on. 2006-06-16 17:04:05,932 INFO mapred.LocalJobRunner (LocalJobRunner.java:progress(140)) - C:/opt/nutch-060614/test/index/segments/20060616170358/crawl_parse/part-0:0+62 2006-06-16 17:04:05,948 WARN mapred.LocalJobRunner (LocalJobRunner.java:run(119)) - job_4wsxze java.io.IOException: Couldn't rename /tmp/hadoop/mapred/local/map_5n5aid/part-0.out at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:102) Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:342) at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:55) at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) Prior to this fatal exception, I've seen many occurances of this exception: 2006-06-16 17:04:05,854 INFO conf.Configuration (Configuration.java:loadResource(397)) - parsing file:/C:/opt/nutch-060614/conf/hadoop-site.xml This isn't really an exception, it's there just to print the stacktrace (so one can track who is calling it). I am not intend to run hadoop at all, so this hadoop-site.xlm is empty. It just has this empty element: You should at least set values for 'mapred.system.dir' and 'mapred.local.dir' and point them to a dir that has enough space available (I think they default to under /tmp at least on my system wich is far too small for larger jobs) -- Sami Siren