[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-10-26 Thread pdchen (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12445077 ] 

pdchen commented on NUTCH-266:
--

but I still get the same error, the Environment:  windows xp, JDK 1.4.2_04, 
nutch -0.8 ,  the hadoop version is hadoop-0.5.0.jar,  also I change the 
src/java/org/apache/nutch/searcher/DistributedSearch.java according to the 
patch_hadoop-0.5.0.diff.  anyone can help me?

 hadoop bug when doing updatedb
 --

 Key: NUTCH-266
 URL: http://issues.apache.org/jira/browse/NUTCH-266
 Project: Nutch
  Issue Type: Bug
Affects Versions: 0.8
 Environment: windows xp, JDK 1.4.2_04
Reporter: Eugen Kochuev
 Fix For: 0.9.0, 0.8.1

 Attachments: patch.diff, patch_hadoop-0.5.0.diff


 I constantly get the following error message
 060508 230637 Running job: job_pbhn3t
 060508 230637 
 c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258
 060508 230637 job_pbhn3t
 java.io.IOException: Target 
 /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists
 at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162)
 at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62)
 at 
 org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191)
 at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101)
 Exception in thread main java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
 at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54)
 at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-09-06 Thread Sami Siren (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12432871 ] 

Sami Siren commented on NUTCH-266:
--

what version of nutch are you running?

 hadoop bug when doing updatedb
 --

 Key: NUTCH-266
 URL: http://issues.apache.org/jira/browse/NUTCH-266
 Project: Nutch
  Issue Type: Bug
Affects Versions: 0.8
 Environment: windows xp, JDK 1.4.2_04
Reporter: Eugen Kochuev
 Fix For: 0.9.0, 0.8.1

 Attachments: patch.diff, patch_hadoop-0.5.0.diff


 I constantly get the following error message
 060508 230637 Running job: job_pbhn3t
 060508 230637 
 c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258
 060508 230637 job_pbhn3t
 java.io.IOException: Target 
 /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists
 at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162)
 at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62)
 at 
 org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191)
 at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101)
 Exception in thread main java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
 at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54)
 at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-08-08 Thread Renaud Richardet (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12426579 ] 

Renaud Richardet commented on NUTCH-266:


KuroSaka, yes you can download the hadoop jar, release 0.5.0 from the project 
website: http://lucene.apache.org/hadoop/ and 
http://www.apache.org/dyn/closer.cgi/lucene/hadoop/

 hadoop bug when doing updatedb
 --

 Key: NUTCH-266
 URL: http://issues.apache.org/jira/browse/NUTCH-266
 Project: Nutch
  Issue Type: Bug
Affects Versions: 0.8
 Environment: windows xp, JDK 1.4.2_04
Reporter: Eugen Kochuev
 Fix For: 0.9.0, 0.8.1

 Attachments: patch.diff, patch_hadoop-0.5.0.diff


 I constantly get the following error message
 060508 230637 Running job: job_pbhn3t
 060508 230637 
 c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258
 060508 230637 job_pbhn3t
 java.io.IOException: Target 
 /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists
 at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162)
 at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62)
 at 
 org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191)
 at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101)
 Exception in thread main java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
 at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54)
 at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-08-07 Thread KuroSaka TeruHiko (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12426377 ] 

KuroSaka TeruHiko commented on NUTCH-266:
-

Renaud, thank you for posting the patch.  Is there a patched version of hadoop 
jar file (precompiled) that I can download?


 hadoop bug when doing updatedb
 --

 Key: NUTCH-266
 URL: http://issues.apache.org/jira/browse/NUTCH-266
 Project: Nutch
  Issue Type: Bug
Affects Versions: 0.8
 Environment: windows xp, JDK 1.4.2_04
Reporter: Eugen Kochuev
 Fix For: 0.9.0, 0.8.1

 Attachments: patch.diff, patch_hadoop-0.5.0.diff


 I constantly get the following error message
 060508 230637 Running job: job_pbhn3t
 060508 230637 
 c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258
 060508 230637 job_pbhn3t
 java.io.IOException: Target 
 /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists
 at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162)
 at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62)
 at 
 org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191)
 at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101)
 Exception in thread main java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
 at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54)
 at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-08-04 Thread Sami Siren (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12425753 ] 

Sami Siren commented on NUTCH-266:
--

I am planning to build a patched fersion of hadoop 0.4.0 that includes a fix 
for this problem.

If there are no objections I will commit the patched jar (to 0.8 branch and 
trunk) in few days. 

As soon as hadoop-0.5.x is available it should override the patched version in 
trunk.

 hadoop bug when doing updatedb
 --

 Key: NUTCH-266
 URL: http://issues.apache.org/jira/browse/NUTCH-266
 Project: Nutch
  Issue Type: Bug
Affects Versions: 0.8
 Environment: windows xp, JDK 1.4.2_04
Reporter: Eugen Kochuev
 Attachments: patch.diff


 I constantly get the following error message
 060508 230637 Running job: job_pbhn3t
 060508 230637 
 c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258
 060508 230637 job_pbhn3t
 java.io.IOException: Target 
 /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists
 at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162)
 at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62)
 at 
 org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191)
 at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101)
 Exception in thread main java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
 at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54)
 at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-08-01 Thread Sami Siren (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12424930 ] 

Sami Siren commented on NUTCH-266:
--

just adding a remainder:

there are two options to get this fixed, use patched version of hadoop-0.4.0 or 
wait until hadoop-0.5.0

 hadoop bug when doing updatedb
 --

 Key: NUTCH-266
 URL: http://issues.apache.org/jira/browse/NUTCH-266
 Project: Nutch
  Issue Type: Bug
Affects Versions: 0.8
 Environment: windows xp, JDK 1.4.2_04
Reporter: Eugen Kochuev

 I constantly get the following error message
 060508 230637 Running job: job_pbhn3t
 060508 230637 
 c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258
 060508 230637 job_pbhn3t
 java.io.IOException: Target 
 /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists
 at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162)
 at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62)
 at 
 org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191)
 at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101)
 Exception in thread main java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
 at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54)
 at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-07-24 Thread Andrzej Bialecki

Sami Siren (JIRA) wrote:
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12422929 ] 

Sami Siren commented on NUTCH-266:

--

I finally found the time to setup an environment with cygwin and try this out. I can confirm that the hadoop.jar version provided with nutch gives these errors. 


I then checked tested nutch with hadoop nightly jar and everything worked just 
fine.

Can someone try the hadoop nightly jar with nutch and see if it works for you. 
Nightly builds for hadoop are available from
http://people.apache.org/dist/lucene/hadoop/nightly/

  



Are you planning to update Hadoop to trunk/ ? I'd rather be careful with 
that - I'm not sure if it's still compatible with Java 1.4, besides 
being unreleased/unstable ...


--
Best regards,
Andrzej Bialecki 
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




Re: [jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-07-24 Thread Sami Siren




Are you planning to update Hadoop to trunk/ ? I'd rather be careful 
with that - I'm not sure if it's still compatible with Java 1.4, 
besides being unreleased/unstable ...


Not planning an upgrade, just wan't to know if it resolves the issues. 
We can then decide what's the best thing to do.


--
Sami Siren




[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-07-23 Thread Sami Siren (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12422929 ] 

Sami Siren commented on NUTCH-266:
--

I finally found the time to setup an environment with cygwin and try this out. 
I can confirm that the hadoop.jar version provided with nutch gives these 
errors. 

I then checked tested nutch with hadoop nightly jar and everything worked just 
fine.

Can someone try the hadoop nightly jar with nutch and see if it works for you. 
Nightly builds for hadoop are available from
http://people.apache.org/dist/lucene/hadoop/nightly/

just extract the archive and grab the hadoop-nightly.jar from there and replace 
the one in nutch installation with that one

thanks

 hadoop bug when doing updatedb
 --

 Key: NUTCH-266
 URL: http://issues.apache.org/jira/browse/NUTCH-266
 Project: Nutch
  Issue Type: Bug
Affects Versions: 0.8-dev
 Environment: windows xp, JDK 1.4.2_04
Reporter: Eugen Kochuev

 I constantly get the following error message
 060508 230637 Running job: job_pbhn3t
 060508 230637 
 c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258
 060508 230637 job_pbhn3t
 java.io.IOException: Target 
 /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists
 at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162)
 at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62)
 at 
 org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191)
 at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101)
 Exception in thread main java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
 at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54)
 at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-06-26 Thread KuroSaka TeruHiko (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12417922 ] 

KuroSaka TeruHiko commented on NUTCH-266:
-

I opened a Hadoop bug as this is more likely a bug in Hadoop:
http://issues.apache.org/jira/browse/HADOOP-323

Close this bug (#266) when the other one (#323) is fixed and vice versa.


 hadoop bug when doing updatedb
 --

  Key: NUTCH-266
  URL: http://issues.apache.org/jira/browse/NUTCH-266
  Project: Nutch
 Type: Bug

 Versions: 0.8-dev
  Environment: windows xp, JDK 1.4.2_04
 Reporter: Eugen Kochuev


 I constantly get the following error message
 060508 230637 Running job: job_pbhn3t
 060508 230637 
 c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258
 060508 230637 job_pbhn3t
 java.io.IOException: Target 
 /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists
 at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162)
 at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62)
 at 
 org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191)
 at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101)
 Exception in thread main java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
 at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54)
 at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



RE: [jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-06-22 Thread Teruhiko Kurosaka
Thank you for your reply, Sami.

 I am not intend to run hadoop at all, so this 
 hadoop-site.xlm is empty.
...
 You should at least set values for 'mapred.system.dir' and
'mapred.local.dir'
 and point them to a dir that has enough space available (I think they 
 default to under /tmp at least on my system wich is far too small for 
 larger jobs)

OK, I just copied the definitions for these properties from
hadoop-default.xml 
and prepended C: to each value so that they really refer to C:\tmp. 
C: has 65 GB free space and this practice crawl crawls a directory that
contain 20 documents with total byte count less than 10 MB. So I figure
C: has more than adequate free space.

But I've still got the same error:
2006-06-22 10:54:01,548 WARN  mapred.LocalJobRunner
(LocalJobRunner.java:run(119)) - job_x5jmir
java.io.IOException: Couldn't rename
C:/tmp/hadoop/mapred/local/map_ye7oza/part-0.out
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:102)
Exception in thread main java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:342)
at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:55)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)

After the nutch exited, I checked the directory;
C:/tmp/hadoop/mapred/local/map_ye7oza/
does exist but there was not a file called part-0.out.  The directory
was empty.

I'd appreciate any other suggestions you might have.

-kuro





[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-06-22 Thread KuroSaka TeruHiko (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12417387 ] 

KuroSaka TeruHiko commented on NUTCH-266:
-

Both Eugine's case and my case are failing in the call chain started at line 
101 of LocalJobRunner.java,
which reads:
  if (!localFs.rename(mapOut, reduceIn)) // Line 101
throw new IOException(Couldn't rename  + mapOut); // Line 102

This eventually calls LocalFileSystem.renameRaw(Path, Path) whose 
implementation is:
public boolean renameRaw(Path src, Path dst) throws IOException {
if (useCopyForRename) {
  return FileUtil.copy(this, src, this, dst, true, getConf());
} else return pathToFile(src).renameTo(pathToFile(dst));
}

The difference in the error message between Eugine's and mine is whether 
useCopyForRename was true or false.

I inserted a LOG.debug() call at the entrance of FileSystem.rename() to see 
what rename
is asked to do.  Below is the output:
2006-06-22 15:45:11,996 DEBUG dfs.DistributedFileSystem 
(FileSystem.java:rename(308)) - Renaming 
C:/tmp/hadoop/mapred/local/map_iwp4ih/part-0.out to 
C:/tmp/hadoop/mapred/local/reduce_ilpajy/map_1.out...
2006-06-22 15:45:12,012 DEBUG dfs.DistributedFileSystem 
(FileSystem.java:rename(308)) - Renaming 
C:/tmp/hadoop/mapred/local/map_iwp4ih/part-0.out to 
C:/tmp/hadoop/mapred/local/reduce_ilpajy/map_2.out...
2006-06-22 15:45:12,028 WARN  mapred.LocalJobRunner 
(LocalJobRunner.java:run(119)) - job_i2gl4i
java.io.IOException: Couldn't rename 
C:/tmp/hadoop/mapred/local/map_iwp4ih/part-0.out

As seen, the same rename operation is attempted twice, the first one succeeded 
while the second one failed.
Is this how rename is supposed to be called? 

Another thing I noticed, by comparing the source code of the version that works 
and the version that doesn't, is that File (java.io.File?) has been replaced 
by Path (org.apache.hadoop.fs.Path?) recently.  This may relate to the 
problem we are having.



 hadoop bug when doing updatedb
 --

  Key: NUTCH-266
  URL: http://issues.apache.org/jira/browse/NUTCH-266
  Project: Nutch
 Type: Bug

 Versions: 0.8-dev
  Environment: windows xp, JDK 1.4.2_04
 Reporter: Eugen Kochuev


 I constantly get the following error message
 060508 230637 Running job: job_pbhn3t
 060508 230637 
 c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258
 060508 230637 job_pbhn3t
 java.io.IOException: Target 
 /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists
 at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162)
 at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62)
 at 
 org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191)
 at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101)
 Exception in thread main java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
 at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54)
 at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-06-22 Thread KuroSaka TeruHiko (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12417391 ] 

KuroSaka TeruHiko commented on NUTCH-266:
-

I'm sorry for adding many comment.  This would be the last for today.
As an experiment, I replaced hadoop-0.2-dev.jar that came with the Nutch 0.8 
GUI build (that worked) with hadoop-0.3.3-dev.jar that was bundled with the 
nightly builds.  (And I had to add commons-logging-1.0.4.jar and 
log4j-1.2.13.jar to the lib dir, in order to remove the ClassNotFound 
exception.)  
Then, the Nutch 0.8 GUI build showed the same exceptions and stopped working.

SO, I'd have to conclude that it is a change introduced in Hadoop between 
version 0.2 and 0.3.3 that is causing this Nutch failure.


 hadoop bug when doing updatedb
 --

  Key: NUTCH-266
  URL: http://issues.apache.org/jira/browse/NUTCH-266
  Project: Nutch
 Type: Bug

 Versions: 0.8-dev
  Environment: windows xp, JDK 1.4.2_04
 Reporter: Eugen Kochuev


 I constantly get the following error message
 060508 230637 Running job: job_pbhn3t
 060508 230637 
 c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258
 060508 230637 job_pbhn3t
 java.io.IOException: Target 
 /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists
 at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162)
 at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62)
 at 
 org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191)
 at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101)
 Exception in thread main java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
 at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54)
 at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-06-20 Thread KuroSaka TeruHiko (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12416945 ] 

KuroSaka TeruHiko commented on NUTCH-266:
-

I am experiencing pretty much the same symptom with the nighly builds of 
5/31/2006 up to 6/14/2006, which I tested the last time.
Here's the result of my nutch crawl run with DEBUG level log turned on.

2006-06-16 17:04:05,932 INFO  mapred.LocalJobRunner 
(LocalJobRunner.java:progress(140)) - 
C:/opt/nutch-060614/test/index/segments/20060616170358/crawl_parse/part-0:0+62
2006-06-16 17:04:05,948 WARN  mapred.LocalJobRunner 
(LocalJobRunner.java:run(119)) - job_4wsxze
java.io.IOException: Couldn't rename 
/tmp/hadoop/mapred/local/map_5n5aid/part-0.out
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:102)
Exception in thread main java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:342)
at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:55)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)

Prior to this fatal exception, I've seen many occurances of this exception:
2006-06-16 17:04:05,854 INFO  conf.Configuration
(Configuration.java:loadResource(397)) - parsing 
file:/C:/opt/nutch-060614/conf/hadoop-site.xml
2006-06-16 17:04:05,870 DEBUG conf.Configuration 
(Configuration.java:init(67)) - java.io.IOException: config()
at org.apache.hadoop.conf.Configuration.init(Configuration.java:67)
at org.apache.hadoop.mapred.JobConf.init(JobConf.java:115)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.init(LocalJobRunner.java:61)
at 
org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:181)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:277)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:312)
at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:55)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)

I am not intend to run hadoop at all, so this hadoop-site.xlm is empty.
It just has this empty element:
configuration
/configuration



 hadoop bug when doing updatedb
 --

  Key: NUTCH-266
  URL: http://issues.apache.org/jira/browse/NUTCH-266
  Project: Nutch
 Type: Bug

 Versions: 0.8-dev
  Environment: windows xp, JDK 1.4.2_04
 Reporter: Eugen Kochuev


 I constantly get the following error message
 060508 230637 Running job: job_pbhn3t
 060508 230637 
 c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258
 060508 230637 job_pbhn3t
 java.io.IOException: Target 
 /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists
 at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162)
 at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62)
 at 
 org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191)
 at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101)
 Exception in thread main java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
 at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54)
 at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



Re: [jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-06-20 Thread Sami Siren

KuroSaka TeruHiko (JIRA) wrote:

   [ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12416945 ] 


KuroSaka TeruHiko commented on NUTCH-266:
-

I am experiencing pretty much the same symptom with the nighly builds of 
5/31/2006 up to 6/14/2006, which I tested the last time.
Here's the result of my nutch crawl run with DEBUG level log turned on.

2006-06-16 17:04:05,932 INFO  mapred.LocalJobRunner 
(LocalJobRunner.java:progress(140)) - 
C:/opt/nutch-060614/test/index/segments/20060616170358/crawl_parse/part-0:0+62
2006-06-16 17:04:05,948 WARN  mapred.LocalJobRunner 
(LocalJobRunner.java:run(119)) - job_4wsxze
java.io.IOException: Couldn't rename 
/tmp/hadoop/mapred/local/map_5n5aid/part-0.out
   at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:102)
Exception in thread main java.io.IOException: Job failed!
   at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:342)
   at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:55)
   at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)

Prior to this fatal exception, I've seen many occurances of this exception:
2006-06-16 17:04:05,854 INFO  conf.Configuration
(Configuration.java:loadResource(397)) - parsing 
file:/C:/opt/nutch-060614/conf/hadoop-site.xml
 


snip

This isn't really an exception, it's there just to print the stacktrace (so one 
can track
who is calling it).




I am not intend to run hadoop at all, so this hadoop-site.xlm is empty.
It just has this empty element:
configuration
/configuration

 

You should at least set values for 'mapred.system.dir' and 
'mapred.local.dir'
and point them to a dir that has enough space available (I think they 
default

to under /tmp at least on my system wich is far too small for larger jobs)

--
Sami Siren


[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-06-20 Thread KuroSaka TeruHiko (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12416958 ] 

KuroSaka TeruHiko commented on NUTCH-266:
-

I noticed that there is no drive letter C: in the path quoted in the exception 
messages in both cases.  Since both cases are observed on the Windows platform, 
lack of the drive letter may lead to an access to the wrong drive, which might 
be a cause of these fatal errors.




 hadoop bug when doing updatedb
 --

  Key: NUTCH-266
  URL: http://issues.apache.org/jira/browse/NUTCH-266
  Project: Nutch
 Type: Bug

 Versions: 0.8-dev
  Environment: windows xp, JDK 1.4.2_04
 Reporter: Eugen Kochuev


 I constantly get the following error message
 060508 230637 Running job: job_pbhn3t
 060508 230637 
 c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258
 060508 230637 job_pbhn3t
 java.io.IOException: Target 
 /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists
 at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162)
 at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62)
 at 
 org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191)
 at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101)
 Exception in thread main java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
 at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54)
 at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-06-20 Thread KuroSaka TeruHiko (JIRA)
[ 
http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12416971 ] 

KuroSaka TeruHiko commented on NUTCH-266:
-

The Nutch binary (w/ Adming GUI) downloadable from:
http://68.178.249.66/nutch-admin/nutch-0.8-dev_guiBundle_05_02_06.tar.gz
does not exhibit the same problem.  Some changes made after May 2, 2006 must 
have been causing this problem.


 hadoop bug when doing updatedb
 --

  Key: NUTCH-266
  URL: http://issues.apache.org/jira/browse/NUTCH-266
  Project: Nutch
 Type: Bug

 Versions: 0.8-dev
  Environment: windows xp, JDK 1.4.2_04
 Reporter: Eugen Kochuev


 I constantly get the following error message
 060508 230637 Running job: job_pbhn3t
 060508 230637 
 c:/nutch/crawl-20060508230625/crawldb/current/part-0/data:0+245
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_fetch/part-0/data:0+296
 060508 230637 
 c:/nutch/crawl-20060508230625/segments/20060508230628/crawl_parse/part-0:0+5258
 060508 230637 job_pbhn3t
 java.io.IOException: Target 
 /tmp/hadoop/mapred/local/reduce_qnd5sx/map_qjp7tf.out already exists
 at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:162)
 at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:62)
 at 
 org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:191)
 at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:306)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:101)
 Exception in thread main java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
 at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:54)
 at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira