Hi,
I am using Nutch trunk version (493556) and it is failing in Indexer.
java.io.IOException: Not a file:
/nutch/nutchcrawl/segments/20070110171621/crawl_fetch/part-00000/data
at org.apache.hadoop.mapred.InputFormatBase.getSplits(
InputFormatBase.java:125)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
LocalJobRunner.java:93)
Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:399)
at org.apache.nutch.indexer.Indexer.index(Indexer.java:297)
at org.apache.nutch.indexer.Indexer.run(Indexer.java:319)
at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
at org.apache.nutch.indexer.Indexer.main(Indexer.java:302)
Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:399)
at org.apache.nutch.indexer.Indexer.index(Indexer.java:297)
at org.apache.nutch.indexer.Indexer.run(Indexer.java:319)
at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
at org.apache.nutch.indexer.Indexer.main(Indexer.java:302)
Also I noticed that there were some issues during parsing (which run prior
to indexing). The following is what I got when I allowed finer logging:
Moving bad file
/nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/data to
/bad_files/data.1751375967
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/content/part-00000/data error
: Checksum error:
/nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/data
at 0
map 100% reduce 0%
Moving bad file
/nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/part-00000
to /bad_files/part-00000.377330604
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/crawl_parse/part-00000
error : Checksum error:
/nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/part-00000
at 0
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/content/part-00000/index
error : /nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/index:
No such file or directory
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/parse_data/part-00000/index
error : /nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/index:
No such file or directory
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/crawl_fetch/part-00000/data
error : /nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/data:
No such file or directory
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/parse_text/part-00000/index
error : /nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/index:
No such file or directory
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/parse_text/part-00000/data
error : /nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/data:
No such file or directory
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/parse_data/part-00000/data
error : /nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/data:
No such file or directory
PhasedFileSystem failed to commit file :
/nutch/nutchcrawl/mergesegs_dir/20070110171621/crawl_generate/part-00000
error :
/nutch/hadoop/mapred/system/job_hs0t76/tip_r_0001/reduce_w5a819/part-00000:
No such file or directory
I am not sure if this can cause the IOException described above. Does
anybody know what I did incorrectly?
Regards,
Lukas
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general