Re: [jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-07-24 Thread Sami Siren




Are you planning to update Hadoop to trunk/ ? I'd rather be careful 
with that - I'm not sure if it's still compatible with Java 1.4, 
besides being unreleased/unstable ...


Not planning an upgrade, just wan't to know if it resolves the issues. 
We can then decide what's the best thing to do.


--
Sami Siren




Re: [jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-07-24 Thread Andrzej Bialecki

Sami Siren (JIRA) wrote:
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12422929 ] 

Sami Siren commented on NUTCH-266:

--

I finally found the time to setup an environment with cygwin and try this out. I can confirm that the hadoop.jar version provided with nutch gives these errors. 


I then checked tested nutch with hadoop nightly jar and everything worked just 
fine.

Can someone try the hadoop nightly jar with nutch and see if it works for you. 
Nightly builds for hadoop are available from
http://people.apache.org/dist/lucene/hadoop/nightly/

  



Are you planning to update Hadoop to trunk/ ? I'd rather be careful with 
that - I'm not sure if it's still compatible with Java 1.4, besides 
being unreleased/unstable ...


--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




RE: [jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-06-22 Thread Teruhiko Kurosaka
Thank you for your reply, Sami.

> >I am not intend to run hadoop at all, so this 
> hadoop-site.xlm is empty.
...
> You should at least set values for 'mapred.system.dir' and
'mapred.local.dir'
> and point them to a dir that has enough space available (I think they 
> default to under /tmp at least on my system wich is far too small for 
> larger jobs)

OK, I just copied the definitions for these properties from
hadoop-default.xml 
and prepended "C:" to each value so that they really refer to C:\tmp. 
C: has 65 GB free space and this practice crawl crawls a directory that
contain 20 documents with total byte count less than 10 MB. So I figure
C: has more than adequate free space.

But I've still got the same error:
2006-06-22 10:54:01,548 WARN  mapred.LocalJobRunner
(LocalJobRunner.java:run(119)) - job_x5jmir
java.io.IOException: Couldn't rename
C:/tmp/hadoop/mapred/local/map_ye7oza/part-0.out
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:102)
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:342)
at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:55)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)

After the nutch exited, I checked the directory;
C:/tmp/hadoop/mapred/local/map_ye7oza/
does exist but there was not a file called part-0.out.  The directory
was empty.

I'd appreciate any other suggestions you might have.

-kuro





Re: [jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-06-20 Thread Sami Siren

KuroSaka TeruHiko (JIRA) wrote:

   [ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12416945 ] 


KuroSaka TeruHiko commented on NUTCH-266:
-

I am experiencing pretty much the same symptom with the nighly builds of 
5/31/2006 up to 6/14/2006, which I tested the last time.
Here's the result of my "nutch crawl" run with DEBUG level log turned on.

2006-06-16 17:04:05,932 INFO  mapred.LocalJobRunner 
(LocalJobRunner.java:progress(140)) - 
C:/opt/nutch-060614/test/index/segments/20060616170358/crawl_parse/part-0:0+62
2006-06-16 17:04:05,948 WARN  mapred.LocalJobRunner 
(LocalJobRunner.java:run(119)) - job_4wsxze
java.io.IOException: Couldn't rename 
/tmp/hadoop/mapred/local/map_5n5aid/part-0.out
   at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:102)
Exception in thread "main" java.io.IOException: Job failed!
   at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:342)
   at org.apache.nutch.crawl.CrawlDb.update(CrawlDb.java:55)
   at org.apache.nutch.crawl.Crawl.main(Crawl.java:114)

Prior to this fatal exception, I've seen many occurances of this exception:
2006-06-16 17:04:05,854 INFO  conf.Configuration
(Configuration.java:loadResource(397)) - parsing 
file:/C:/opt/nutch-060614/conf/hadoop-site.xml
 




This isn't really an exception, it's there just to print the stacktrace (so one 
can track
who is calling it).




I am not intend to run hadoop at all, so this hadoop-site.xlm is empty.
It just has this empty element:



 

You should at least set values for 'mapred.system.dir' and 
'mapred.local.dir'
and point them to a dir that has enough space available (I think they 
default

to under /tmp at least on my system wich is far too small for larger jobs)

--
Sami Siren