[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12445077 ] pdchen commented on NUTCH-266: -- but I still get the same error, the Environment: windows xp, JDK 1.4.2_04, nutch -0.8 , the hadoop version is hadoop-0.5.0.jar, also I change the src/java/org/apache/nutch/searcher/DistributedSearch.java according to the patch_hadoop-0.5.0.diff. anyone can help me? > hadoop bug when doing updatedb > -- > > Key: NUTCH-266 > URL: http://issues.apache.org/jira/browse/NUTCH-266 > Project: Nutch > Issue Type: Bug >Affects Versions: 0.8 > Environment: windows xp, JDK 1.4.2_04 >Reporter: Eugen Kochuev > Fix For: 0.9.0, 0.8.1 > > Attachments: patch.diff, patch_hadoop-0.5.0.diff > > > I constantly get the following error message > 060508 230637 Running job: job_pbhn3t > 060508 230637 > c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258 > 060508 230637 job_pbhn3t > java.io.IOException: Target > /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists > at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62) > at > org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191) > at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101) > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341) > at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12432984 ] Richard Braman commented on NUTCH-266: -- latest and greatest from svn > hadoop bug when doing updatedb > -- > > Key: NUTCH-266 > URL: http://issues.apache.org/jira/browse/NUTCH-266 > Project: Nutch > Issue Type: Bug >Affects Versions: 0.8 > Environment: windows xp, JDK 1.4.2_04 >Reporter: Eugen Kochuev > Fix For: 0.9.0, 0.8.1 > > Attachments: patch.diff, patch_hadoop-0.5.0.diff > > > I constantly get the following error message > 060508 230637 Running job: job_pbhn3t > 060508 230637 > c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258 > 060508 230637 job_pbhn3t > java.io.IOException: Target > /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists > at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62) > at > org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191) > at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101) > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341) > at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12432871 ] Sami Siren commented on NUTCH-266: -- what version of nutch are you running? > hadoop bug when doing updatedb > -- > > Key: NUTCH-266 > URL: http://issues.apache.org/jira/browse/NUTCH-266 > Project: Nutch > Issue Type: Bug >Affects Versions: 0.8 > Environment: windows xp, JDK 1.4.2_04 >Reporter: Eugen Kochuev > Fix For: 0.9.0, 0.8.1 > > Attachments: patch.diff, patch_hadoop-0.5.0.diff > > > I constantly get the following error message > 060508 230637 Running job: job_pbhn3t > 060508 230637 > c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258 > 060508 230637 job_pbhn3t > java.io.IOException: Target > /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists > at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62) > at > org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191) > at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101) > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341) > at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12432707 ] Richard Braman commented on NUTCH-266: -- I am having the same problem. I tried getting the latest hadoop from svn, and built it and put it into my nutch/lib and rebuilt. i am still having the same issue. for me is happens no matter what command i run. It didn;t happen the first time, but rather when after I have about 21 sgements done. > hadoop bug when doing updatedb > -- > > Key: NUTCH-266 > URL: http://issues.apache.org/jira/browse/NUTCH-266 > Project: Nutch > Issue Type: Bug >Affects Versions: 0.8 > Environment: windows xp, JDK 1.4.2_04 >Reporter: Eugen Kochuev > Fix For: 0.9.0, 0.8.1 > > Attachments: patch.diff, patch_hadoop-0.5.0.diff > > > I constantly get the following error message > 060508 230637 Running job: job_pbhn3t > 060508 230637 > c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258 > 060508 230637 job_pbhn3t > java.io.IOException: Target > /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists > at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62) > at > org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191) > at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101) > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341) > at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12426579 ] Renaud Richardet commented on NUTCH-266: KuroSaka, yes you can download the hadoop jar, release 0.5.0 from the project website: http://lucene.apache.org/hadoop/ and http://www.apache.org/dyn/closer.cgi/lucene/hadoop/ > hadoop bug when doing updatedb > -- > > Key: NUTCH-266 > URL: http://issues.apache.org/jira/browse/NUTCH-266 > Project: Nutch > Issue Type: Bug >Affects Versions: 0.8 > Environment: windows xp, JDK 1.4.2_04 >Reporter: Eugen Kochuev > Fix For: 0.9.0, 0.8.1 > > Attachments: patch.diff, patch_hadoop-0.5.0.diff > > > I constantly get the following error message > 060508 230637 Running job: job_pbhn3t > 060508 230637 > c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258 > 060508 230637 job_pbhn3t > java.io.IOException: Target > /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists > at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62) > at > org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191) > at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101) > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341) > at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12426377 ] KuroSaka TeruHiko commented on NUTCH-266: - Renaud, thank you for posting the patch. Is there a patched version of hadoop jar file (precompiled) that I can download? > hadoop bug when doing updatedb > -- > > Key: NUTCH-266 > URL: http://issues.apache.org/jira/browse/NUTCH-266 > Project: Nutch > Issue Type: Bug >Affects Versions: 0.8 > Environment: windows xp, JDK 1.4.2_04 >Reporter: Eugen Kochuev > Fix For: 0.9.0, 0.8.1 > > Attachments: patch.diff, patch_hadoop-0.5.0.diff > > > I constantly get the following error message > 060508 230637 Running job: job_pbhn3t > 060508 230637 > c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258 > 060508 230637 job_pbhn3t > java.io.IOException: Target > /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists > at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62) > at > org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191) > at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101) > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341) > at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12425753 ] Sami Siren commented on NUTCH-266: -- I am planning to build a patched fersion of hadoop 0.4.0 that includes a fix for this problem. If there are no objections I will commit the patched jar (to 0.8 branch and trunk) in few days. As soon as hadoop-0.5.x is available it should override the patched version in trunk. > hadoop bug when doing updatedb > -- > > Key: NUTCH-266 > URL: http://issues.apache.org/jira/browse/NUTCH-266 > Project: Nutch > Issue Type: Bug >Affects Versions: 0.8 > Environment: windows xp, JDK 1.4.2_04 >Reporter: Eugen Kochuev > Attachments: patch.diff > > > I constantly get the following error message > 060508 230637 Running job: job_pbhn3t > 060508 230637 > c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258 > 060508 230637 job_pbhn3t > java.io.IOException: Target > /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists > at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62) > at > org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191) > at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101) > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341) > at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12424930 ] Sami Siren commented on NUTCH-266: -- just adding a remainder: there are two options to get this fixed, use patched version of hadoop-0.4.0 or wait until hadoop-0.5.0 > hadoop bug when doing updatedb > -- > > Key: NUTCH-266 > URL: http://issues.apache.org/jira/browse/NUTCH-266 > Project: Nutch > Issue Type: Bug >Affects Versions: 0.8 > Environment: windows xp, JDK 1.4.2_04 >Reporter: Eugen Kochuev > > I constantly get the following error message > 060508 230637 Running job: job_pbhn3t > 060508 230637 > c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258 > 060508 230637 job_pbhn3t > java.io.IOException: Target > /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists > at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62) > at > org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191) > at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101) > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341) > at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] Commented: (NUTCH-266) hadoop bug when doing updatedb
Are you planning to update Hadoop to trunk/ ? I'd rather be careful with that - I'm not sure if it's still compatible with Java 1.4, besides being unreleased/unstable ... Not planning an upgrade, just wan't to know if it resolves the issues. We can then decide what's the best thing to do. -- Sami Siren
Re: [jira] Commented: (NUTCH-266) hadoop bug when doing updatedb
Sami Siren (JIRA) wrote: [ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12422929 ] Sami Siren commented on NUTCH-266: -- I finally found the time to setup an environment with cygwin and try this out. I can confirm that the hadoop.jar version provided with nutch gives these errors. I then checked tested nutch with hadoop nightly jar and everything worked just fine. Can someone try the hadoop nightly jar with nutch and see if it works for you. Nightly builds for hadoop are available from http://people.apache.org/dist/lucene/hadoop/nightly/ Are you planning to update Hadoop to trunk/ ? I'd rather be careful with that - I'm not sure if it's still compatible with Java 1.4, besides being unreleased/unstable ... -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12422929 ] Sami Siren commented on NUTCH-266: -- I finally found the time to setup an environment with cygwin and try this out. I can confirm that the hadoop.jar version provided with nutch gives these errors. I then checked tested nutch with hadoop nightly jar and everything worked just fine. Can someone try the hadoop nightly jar with nutch and see if it works for you. Nightly builds for hadoop are available from http://people.apache.org/dist/lucene/hadoop/nightly/ just extract the archive and grab the hadoop-nightly.jar from there and replace the one in nutch installation with that one thanks > hadoop bug when doing updatedb > -- > > Key: NUTCH-266 > URL: http://issues.apache.org/jira/browse/NUTCH-266 > Project: Nutch > Issue Type: Bug >Affects Versions: 0.8-dev > Environment: windows xp, JDK 1.4.2_04 >Reporter: Eugen Kochuev > > I constantly get the following error message > 060508 230637 Running job: job_pbhn3t > 060508 230637 > c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258 > 060508 230637 job_pbhn3t > java.io.IOException: Target > /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists > at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62) > at > org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191) > at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101) > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341) > at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12417922 ] KuroSaka TeruHiko commented on NUTCH-266: - I opened a Hadoop bug as this is more likely a bug in Hadoop: http://issues.apache.org/jira/browse/HADOOP-323 Close this bug (#266) when the other one (#323) is fixed and vice versa. > hadoop bug when doing updatedb > -- > > Key: NUTCH-266 > URL: http://issues.apache.org/jira/browse/NUTCH-266 > Project: Nutch > Type: Bug > Versions: 0.8-dev > Environment: windows xp, JDK 1.4.2_04 > Reporter: Eugen Kochuev > > I constantly get the following error message > 060508 230637 Running job: job_pbhn3t > 060508 230637 > c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258 > 060508 230637 job_pbhn3t > java.io.IOException: Target > /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists > at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62) > at > org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191) > at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101) > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341) > at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12417391 ] KuroSaka TeruHiko commented on NUTCH-266: - I'm sorry for adding many comment. This would be the last for today. As an experiment, I replaced hadoop-0.2-dev.jar that came with the Nutch 0.8 GUI build (that worked) with hadoop-0.3.3-dev.jar that was bundled with the nightly builds. (And I had to add commons-logging-1.0.4.jar and log4j-1.2.13.jar to the lib dir, in order to remove the ClassNotFound exception.) Then, the Nutch 0.8 GUI build showed the same exceptions and stopped working. SO, I'd have to conclude that it is a change introduced in Hadoop between version 0.2 and 0.3.3 that is causing this Nutch failure. > hadoop bug when doing updatedb > -- > > Key: NUTCH-266 > URL: http://issues.apache.org/jira/browse/NUTCH-266 > Project: Nutch > Type: Bug > Versions: 0.8-dev > Environment: windows xp, JDK 1.4.2_04 > Reporter: Eugen Kochuev > > I constantly get the following error message > 060508 230637 Running job: job_pbhn3t > 060508 230637 > c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258 > 060508 230637 job_pbhn3t > java.io.IOException: Target > /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists > at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62) > at > org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191) > at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101) > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341) > at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12417387 ] KuroSaka TeruHiko commented on NUTCH-266: - Both Eugine's case and my case are failing in the call chain started at line 101 of LocalJobRunner.java, which reads: if (!localFs.rename(mapOut, reduceIn)) // Line 101 throw new IOException("Couldn't rename " + mapOut); // Line 102 This eventually calls LocalFileSystem.renameRaw(Path, Path) whose implementation is: public boolean renameRaw(Path src, Path dst) throws IOException { if (useCopyForRename) { return FileUtil.copy(this, src, this, dst, true, getConf()); } else return pathToFile(src).renameTo(pathToFile(dst)); } The difference in the error message between Eugine's and mine is whether useCopyForRename was true or false. I inserted a LOG.debug() call at the entrance of FileSystem.rename() to see what rename is asked to do. Below is the output: 2006-06-22 15:45:11,996 DEBUG dfs.DistributedFileSystem (FileSystem.java:rename(308)) - Renaming "C:/tmp/hadoop/mapred/local/map_iwp4ih/part-0.out" to "C:/tmp/hadoop/mapred/local/reduce_ilpajy/map_1.out"... 2006-06-22 15:45:12,012 DEBUG dfs.DistributedFileSystem (FileSystem.java:rename(308)) - Renaming "C:/tmp/hadoop/mapred/local/map_iwp4ih/part-0.out" to "C:/tmp/hadoop/mapred/local/reduce_ilpajy/map_2.out"... 2006-06-22 15:45:12,028 WARN mapred.LocalJobRunner (LocalJobRunner.java:run(119)) - job_i2gl4i java.io.IOException: Couldn't rename C:/tmp/hadoop/mapred/local/map_iwp4ih/part-0.out As seen, the same rename operation is attempted twice, the first one succeeded while the second one failed. Is this how rename is supposed to be called? Another thing I noticed, by comparing the source code of the version that works and the version that doesn't, is that "File" (java.io.File?) has been replaced by "Path" (org.apache.hadoop.fs.Path?) recently. This may relate to the problem we are having. > hadoop bug when doing updatedb > -- > > Key: NUTCH-266 > URL: http://issues.apache.org/jira/browse/NUTCH-266 > Project: Nutch > Type: Bug > Versions: 0.8-dev > Environment: windows xp, JDK 1.4.2_04 > Reporter: Eugen Kochuev > > I constantly get the following error message > 060508 230637 Running job: job_pbhn3t > 060508 230637 > c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258 > 060508 230637 job_pbhn3t > java.io.IOException: Target > /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists > at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62) > at > org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191) > at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101) > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341) > at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
RE: [jira] Commented: (NUTCH-266) hadoop bug when doing updatedb
Thank you for your reply, Sami. > >I am not intend to run hadoop at all, so this > hadoop-site.xlm is empty. ... > You should at least set values for 'mapred.system.dir' and 'mapred.local.dir' > and point them to a dir that has enough space available (I think they > default to under /tmp at least on my system wich is far too small for > larger jobs) OK, I just copied the definitions for these properties from hadoop-default.xml and prepended "C:" to each value so that they really refer to C:\tmp. C: has 65 GB free space and this practice crawl crawls a directory that contain 20 documents with total byte count less than 10 MB. So I figure C: has more than adequate free space. But I've still got the same error: 2006-06-22 10:54:01,548 WARN mapred.LocalJobRunner (LocalJobRunner.java:run(119)) - job_x5jmir java.io.IOException: Couldn't rename C:/tmp/hadoop/mapred/local/map_ye7oza/part-0.out at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:102) Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:342) at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:55) at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) After the nutch exited, I checked the directory; C:/tmp/hadoop/mapred/local/map_ye7oza/ does exist but there was not a file called part-0.out. The directory was empty. I'd appreciate any other suggestions you might have. -kuro
[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12416971 ] KuroSaka TeruHiko commented on NUTCH-266: - The Nutch binary (w/ Adming GUI) downloadable from: http://68.178.249.66/nutch-admin/nutch-0.8-dev_guiBundle_05_02_06.tar.gz does not exhibit the same problem. Some changes made after May 2, 2006 must have been causing this problem. > hadoop bug when doing updatedb > -- > > Key: NUTCH-266 > URL: http://issues.apache.org/jira/browse/NUTCH-266 > Project: Nutch > Type: Bug > Versions: 0.8-dev > Environment: windows xp, JDK 1.4.2_04 > Reporter: Eugen Kochuev > > I constantly get the following error message > 060508 230637 Running job: job_pbhn3t > 060508 230637 > c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258 > 060508 230637 job_pbhn3t > java.io.IOException: Target > /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists > at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62) > at > org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191) > at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101) > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341) > at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12416958 ] KuroSaka TeruHiko commented on NUTCH-266: - I noticed that there is no drive letter C: in the path quoted in the exception messages in both cases. Since both cases are observed on the Windows platform, lack of the drive letter may lead to an access to the wrong drive, which might be a cause of these fatal errors. > hadoop bug when doing updatedb > -- > > Key: NUTCH-266 > URL: http://issues.apache.org/jira/browse/NUTCH-266 > Project: Nutch > Type: Bug > Versions: 0.8-dev > Environment: windows xp, JDK 1.4.2_04 > Reporter: Eugen Kochuev > > I constantly get the following error message > 060508 230637 Running job: job_pbhn3t > 060508 230637 > c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258 > 060508 230637 job_pbhn3t > java.io.IOException: Target > /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists > at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62) > at > org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191) > at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101) > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341) > at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] Commented: (NUTCH-266) hadoop bug when doing updatedb
KuroSaka TeruHiko (JIRA) wrote: [ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12416945 ] KuroSaka TeruHiko commented on NUTCH-266: - I am experiencing pretty much the same symptom with the nighly builds of 5/31/2006 up to 6/14/2006, which I tested the last time. Here's the result of my "nutch crawl" run with DEBUG level log turned on. 2006-06-16 17:04:05,932 INFO mapred.LocalJobRunner (LocalJobRunner.java:progress(140)) - C:/opt/nutch-060614/test/index/segments/20060616170358/crawl_parse/part-0:0+62 2006-06-16 17:04:05,948 WARN mapred.LocalJobRunner (LocalJobRunner.java:run(119)) - job_4wsxze java.io.IOException: Couldn't rename /tmp/hadoop/mapred/local/map_5n5aid/part-0.out at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:102) Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:342) at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:55) at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) Prior to this fatal exception, I've seen many occurances of this exception: 2006-06-16 17:04:05,854 INFO conf.Configuration (Configuration.java:loadResource(397)) - parsing file:/C:/opt/nutch-060614/conf/hadoop-site.xml This isn't really an exception, it's there just to print the stacktrace (so one can track who is calling it). I am not intend to run hadoop at all, so this hadoop-site.xlm is empty. It just has this empty element: You should at least set values for 'mapred.system.dir' and 'mapred.local.dir' and point them to a dir that has enough space available (I think they default to under /tmp at least on my system wich is far too small for larger jobs) -- Sami Siren
[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12416945 ] KuroSaka TeruHiko commented on NUTCH-266: - I am experiencing pretty much the same symptom with the nighly builds of 5/31/2006 up to 6/14/2006, which I tested the last time. Here's the result of my "nutch crawl" run with DEBUG level log turned on. 2006-06-16 17:04:05,932 INFO mapred.LocalJobRunner (LocalJobRunner.java:progress(140)) - C:/opt/nutch-060614/test/index/segments/20060616170358/crawl_parse/part-0:0+62 2006-06-16 17:04:05,948 WARN mapred.LocalJobRunner (LocalJobRunner.java:run(119)) - job_4wsxze java.io.IOException: Couldn't rename /tmp/hadoop/mapred/local/map_5n5aid/part-0.out at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:102) Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:342) at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:55) at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) Prior to this fatal exception, I've seen many occurances of this exception: 2006-06-16 17:04:05,854 INFO conf.Configuration (Configuration.java:loadResource(397)) - parsing file:/C:/opt/nutch-060614/conf/hadoop-site.xml 2006-06-16 17:04:05,870 DEBUG conf.Configuration (Configuration.java:(67)) - java.io.IOException: config() at org.apache.hadoop.conf.Configuration.(Configuration.java:67) at org.apache.hadoop.mapred.JobConf.(JobConf.java:115) at org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:61) at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:181) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:277) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:312) at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:55) at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) I am not intend to run hadoop at all, so this hadoop-site.xlm is empty. It just has this empty element: > hadoop bug when doing updatedb > -- > > Key: NUTCH-266 > URL: http://issues.apache.org/jira/browse/NUTCH-266 > Project: Nutch > Type: Bug > Versions: 0.8-dev > Environment: windows xp, JDK 1.4.2_04 > Reporter: Eugen Kochuev > > I constantly get the following error message > 060508 230637 Running job: job_pbhn3t > 060508 230637 > c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296 > 060508 230637 > c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258 > 060508 230637 job_pbhn3t > java.io.IOException: Target > /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists > at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62) > at > org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191) > at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101) > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341) > at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira