Hello Nutch devs,

I have same problems. I have 10 hosts and one master. For each host I have a datanode and tasktracer.
My mapred conf is 100 maps and 25 reducers. Belove the logs with errors.

Thanks

051107 144101 task_r_pd3ybk 0.224% reduce > copy >
051107 144102 Moving bad file /tmp/nutch/mapred/local/task_m_mmdwzs/part-18.out to /tmp/bad_files/part-18.out.-1505193967 051107 144102 Server handler on 48724 caught: java.io.IOException: Checksum error: /tmp/nutch/mapred/local/task_m_mmdwzs/pa
rt-18.out
java.io.IOException: Checksum error: /tmp/nutch/mapred/local/task_m_mmdwzs/part-18.out at org.apache.nutch.fs.NFSDataInputStream$Checker.verifySum(NFSDataInputStream.java:115) at org.apache.nutch.fs.NFSDataInputStream$Checker.read(NFSDataInputStream.java:95) at org.apache.nutch.fs.NFSDataInputStream$PositionCache.read(NFSDataInputStream.java:152)
       at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
       at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
       at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
       at java.io.DataInputStream.read(DataInputStream.java:80)
at org.apache.nutch.mapred.MapOutputFile.write(MapOutputFile.java:95) at org.apache.nutch.io.ObjectWritable.writeObject(ObjectWritable.java:117)
       at org.apache.nutch.io.ObjectWritable.write(ObjectWritable.java:64)
       at org.apache.nutch.ipc.Server$Handler.run(Server.java:213)
051107 144103 task_r_pd3ybk 0.24400002% reduce > copy >
051107 144103 parsing file:/d1/mapred/conf/nutch-default.xml
051107 144103 parsing file:/d1/mapred/conf/mapred-default.xml
051107 144103 parsing /tmp/nutch/mapred/local/taskTracker/task_r_pd3ybk/job.xml
051107 144103 parsing file:/d1/mapred/conf/nutch-site.xml
051107 144104 task_r_pd3ybk  parsing file:/d1/mapred/conf/nutch-default.xml
051107 144104 task_r_pd3ybk  parsing file:/d1/mapred/conf/nutch-site.xml
051107 144104 task_r_pd3ybk  Child starting
051107 144104 task_r_pd3ybk  Client connection to 0.0.0.0:33273: starting
051107 144104 Server connection on port 33273 from 127.0.0.1: starting
051107 144104 task_r_pd3ybk  parsing file:/d1/mapred/conf/nutch-default.xml
051107 144104 task_r_pd3ybk  parsing file:/d1/mapred/conf/mapred-default.xml
051107 144104 task_r_pd3ybk parsing /tmp/nutch/mapred/local/taskTracker/task_r_pd3ybk/job.xml
051107 144104 task_r_pd3ybk  parsing file:/d1/mapred/conf/nutch-site.xml
051107 144104 task_r_pd3ybk  parsing file:/d1/mapred/conf/nutch-default.xml
051107 144104 task_r_pd3ybk parsing /tmp/nutch/mapred/local/taskTracker/task_r_pd3ybk/job.xml
051107 144104 task_r_pd3ybk  parsing file:/d1/mapred/conf/nutch-site.xml
051107 144105 task_r_pd3ybk 0.25640127% reduce > append > /tmp/nutch/mapred/local/task_r_pd3ybk/task_m_9b2agp.out 051107 144106 task_r_pd3ybk 0.26105025% reduce > append > /tmp/nutch/mapred/local/task_r_pd3ybk/task_m_iwbx48.out 051107 144107 task_r_pd3ybk 0.30607307% reduce > append > /tmp/nutch/mapred/local/task_r_pd3ybk/task_m_cphmud.out 051107 144108 task_r_pd3ybk 0.30645084% reduce > append > /tmp/nutch/mapred/local/task_r_pd3ybk/task_m_cphmud.out 051107 144109 task_r_pd3ybk 0.30679235% reduce > append > /tmp/nutch/mapred/local/task_r_pd3ybk/task_m_cphmud.out 051107 144110 task_r_pd3ybk 0.30714962% reduce > append > /tmp/nutch/mapred/local/task_r_pd3ybk/task_m_cphmud.out 051107 144111 task_r_pd3ybk 0.30751395% reduce > append > /tmp/nutch/mapred/local/task_r_pd3ybk/task_m_cphmud.out 051107 144112 task_r_pd3ybk 0.3078882% reduce > append > /tmp/nutch/mapred/local/task_r_pd3ybk/task_m_cphmud.out 051107 144113 task_r_pd3ybk 0.3246999% reduce > append > /tmp/nutch/mapred/local/task_r_pd3ybk/task_m_ahej3w.out 051107 144114 task_r_pd3ybk 0.33490744% reduce > append > /tmp/nutch/mapred/local/task_r_pd3ybk/task_m_rebwrf.out 051107 144115 task_r_pd3ybk 0.3441058% reduce > append > /tmp/nutch/mapred/local/task_r_pd3ybk/task_m_atf6cb.out 051107 144116 task_r_pd3ybk 0.3537717% reduce > append > /tmp/nutch/mapred/local/task_r_pd3ybk/task_m_objo5q.out 051107 144117 task_r_pd3ybk 0.35881257% reduce > append > /tmp/nutch/mapred/local/task_r_pd3ybk/task_m_ybv2xw.out 051107 144118 task_r_pd3ybk 0.36855537% reduce > append > /tmp/nutch/mapred/local/task_r_pd3ybk/task_m_pv6b9d.out 051107 144119 task_r_pd3ybk 0.37860525% reduce > append > /tmp/nutch/mapred/local/task_r_pd3ybk/task_m_lj8ljn.out 051107 144120 task_r_pd3ybk 0.3887727% reduce > append > /tmp/nutch/mapred/local/task_r_pd3ybk/task_m_5jjyb8.out 051107 144121 task_r_pd3ybk 0.39831316% reduce > append > /tmp/nutch/mapred/local/task_r_pd3ybk/task_m_q24lb2.out 051107 144122 task_r_pd3ybk 0.44835892% reduce > append > /tmp/nutch/mapred/local/task_r_pd3ybk/task_m_9yx6r2.out 051107 144123 task_r_pd3ybk 0.4488136% reduce > append > /tmp/nutch/mapred/local/task_r_pd3ybk/task_m_9yx6r2.out 051107 144124 task_r_pd3ybk 0.4492674% reduce > append > /tmp/nutch/mapred/local/task_r_pd3ybk/task_m_9yx6r2.out 051107 144125 task_r_pd3ybk 0.44971693% reduce > append > /tmp/nutch/mapred/local/task_r_pd3ybk/task_m_9yx6r2.out 051107 144126 task_r_pd3ybk 0.48041725% reduce > append > /tmp/nutch/mapred/local/task_r_pd3ybk/task_m_xnbtvi.out
051107 144128 task_r_pd3ybk 0.5% reduce > sort
051107 144129 task_r_pd3ybk 0.5% reduce > sort
051107 144130 task_r_pd3ybk 0.5% reduce > sort
051107 144131 task_r_pd3ybk 0.5% reduce > sort
051107 144132 task_r_pd3ybk 0.5% reduce > sort
051107 144133 task_r_pd3ybk 0.5% reduce > sort
051107 144134 task_r_pd3ybk 0.5% reduce > sort
051107 144135 task_r_pd3ybk 0.5% reduce > sort
051107 144136 task_r_pd3ybk 0.5% reduce > sort
051107 144137 task_r_pd3ybk 0.5% reduce > sort
051107 144138 task_r_pd3ybk 0.5% reduce > sort
051107 144139 task_r_pd3ybk 0.5% reduce > sort
051107 144140 task_r_pd3ybk 0.5% reduce > sort
051107 144141 task_r_pd3ybk 0.5% reduce > sort
051107 144142 task_r_pd3ybk 0.5% reduce > sort
051107 144144 task_r_pd3ybk 0.5% reduce > sort
051107 144145 task_r_pd3ybk 0.5% reduce > sort
051107 144146 task_r_pd3ybk 0.5% reduce > sort
051107 144147 task_r_pd3ybk 0.5% reduce > sort
051107 144148 task_r_pd3ybk 0.5% reduce > sort
051107 144149 task_r_pd3ybk 0.5% reduce > sort
051107 144150 task_r_pd3ybk 0.5% reduce > sort
051107 144151 task_r_pd3ybk 0.5% reduce > sort
051107 144151 task_r_pd3ybk  Client connection to 10.2.0.11:7000: starting
051107 144152 task_r_pd3ybk 0.75141895% reduce > reduce
051107 144153 task_r_pd3ybk 0.75535446% reduce > reduce
051107 144154 task_r_pd3ybk 0.7593212% reduce > reduce
051107 144155 task_r_pd3ybk 0.7630673% reduce > reduce
051107 144156 task_r_pd3ybk 0.7669503% reduce > reduce
051107 144157 task_r_pd3ybk 0.770851% reduce > reduce
051107 144158 task_r_pd3ybk 0.774693% reduce > reduce
051107 144159 task_r_pd3ybk 0.77830505% reduce > reduce
051107 144200 task_r_pd3ybk 0.78223264% reduce > reduce
051107 144201 task_r_pd3ybk 0.7861667% reduce > reduce
051107 144202 task_r_pd3ybk 0.7900911% reduce > reduce
051107 144203 Server connection on port 48724 from 10.2.0.9: exiting
051107 144203 task_r_pd3ybk 0.79412013% reduce > reduce
051107 144203 Server connection on port 48724 from 10.2.0.9: starting
051107 144203 Server handler on 48724 caught: java.io.FileNotFoundException: /tmp/nutch/mapred/local/task_m_mmdwzs/part-18.
out
java.io.FileNotFoundException: /tmp/nutch/mapred/local/task_m_mmdwzs/part-18.out at org.apache.nutch.fs.LocalFileSystem.openRaw(LocalFileSystem.java:106) at org.apache.nutch.fs.NFSDataInputStream$Checker.<init>(NFSDataInputStream.java:45) at org.apache.nutch.fs.NFSDataInputStream.<init>(NFSDataInputStream.java:217) at org.apache.nutch.fs.NutchFileSystem.open(NutchFileSystem.java:143) at org.apache.nutch.fs.NutchFileSystem.open(NutchFileSystem.java:132) at org.apache.nutch.mapred.MapOutputFile.write(MapOutputFile.java:91) at org.apache.nutch.io.ObjectWritable.writeObject(ObjectWritable.java:117)
       at org.apache.nutch.io.ObjectWritable.write(ObjectWritable.java:64)
       at org.apache.nutch.ipc.Server$Handler.run(Server.java:213)
051107 144204 task_r_pd3ybk 0.79818034% reduce > reduce
051107 144205 task_r_pd3ybk 0.80157274% reduce > reduce
051107 144206 task_r_pd3ybk 0.8053863% reduce > reduce
051107 144207 task_r_pd3ybk 0.8092159% reduce > reduce
....



Rod Taylor ha scritto:

On Fri, 2005-11-04 at 20:41 -0800, Doug Cutting wrote:
Rod Taylor wrote:
Here you go. local filesystem and a single job tracker on another
machine. When the tasktracker and jobtracker are on the same box there
isn't a problem. When they are on different machines it runs into
issues.

This is using mapred.local.dir on the local machine (not sharedd between
sbider4 and sbider5):
       parsing /home/sitesell/localt/taskTracker/task_m_o59djj/job.xml
       [Fatal Error] :-1:-1: Premature end of file.
What is mapred.system.dir? That must be shared. Also, filenames you pass to commands must be pathnames that work on all hosts.

I managed to get past all of the initial injection problems by running a
local crawl (no jobtracker) which created the crawldb/current/part-00000
files. So I was able to do a real inject, with jobtracker, for all of
the urls system wide without any complaints about files or directories
not existing.

Now, when trying to run a generate with a jobtracker it seems to have a
hard time finding the temporary working areas from one job to the next.
I cannot figure out where it is creating generate-temp-908680235. With
NDFS it would be /user/$USER/

<-- nutch generate -->
051107 091256 topN: 10000
051107 091256 Generator: starting
051107 091256 Generator:
segment: /opt/sitesell/sbider_data/test2/segments/20051107091256
051107 091256 Generator: Selecting most-linked urls due for fetch.
051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051107 091256 parsing file:/opt/nutch-0.8_7/conf/mapred-default.xml
051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051107 091256 Client connection to 192.168.100.14:5464: starting
051107 091256 Running job: job_xhvq9b
051107 091258  map 0%
051107 091300  map 5%
051107 091303  map 16%
051107 091305  map 21%
051107 091306  map 26%
051107 091308  map 32%
051107 091309  map 37%
051107 091312  map 47%
051107 091315  map 58%
051107 091318  map 68%
051107 091320  map 74%
051107 091321  map 79%
051107 091324  map 89%
051107 091327  map 100%
051107 091330  reduce 5%
051107 091332  reduce 11%
051107 091333  reduce 16%
051107 091335  reduce 21%
051107 091337  reduce 26%
051107 091339  reduce 37%
051107 091342  reduce 47%
051107 091344  reduce 53%
051107 091345  reduce 58%
051107 091347  reduce 63%
051107 091348  reduce 68%
051107 091351  reduce 79%
051107 091354  reduce 89%
051107 091357  reduce 100%
051107 091359 Job complete: job_xhvq9b
051107 091359 Generator: Partitioning selected urls by host, for
politeness.
051107 091359 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051107 091359 parsing file:/opt/nutch-0.8_7/conf/mapred-default.xml
051107 091359 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
Exception in thread "main" java.io.IOException: No input directories
specified in: NutchConf: nutch-default.xml ,
mapred-default.xml , /home/sitesell/local/jobTracker/job_h22fvi.xml ,
nutch-site.xml
       at org.apache.nutch.ipc.Client.call(Client.java:294)
       at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
       at $Proxy0.submitJob(Unknown Source)
       at
org.apache.nutch.mapred.JobClient.submitJob(JobClient.java:259)
       at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:288)
       at org.apache.nutch.crawl.Generator.generate(Generator.java:213)
       at org.apache.nutch.crawl.Generator.main(Generator.java:258)

[EMAIL PROTECTED] sbider_data]$
cat /home/sitesell/local/jobTracker/job_h22fvi.xml | grep input
<property><name>mapred.input.format.class</name><value>org.apache.nutch.mapred.SequenceFileInputFormat</value></property>
<property><name>mapred.input.dir</name><value>generate-temp-908680235</value></property>
<property><name>mapred.input.value.class</name><value>org.apache.nutch.io.UTF8</value></property>
<property><name>mapred.input.key.class</name><value>org.apache.nutch.crawl.CrawlDatum</value></property>



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to