Hello, please help me... am using nutch-0.9 with lucene 2.2 and Hadoop 0.15.0
I have commented the line dedup, so am able crwal the site http://www.traguiden.se(but for other sites its working properly...), but not indexing properly. If i uncomment the line dedup am getting below exception. Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604) at org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:439) at org.apache.nutch.crawl.Crawl.main(Crawl.java:135) only file by name segments are creating in index folder. but failing to generate below files... please help out... am not able to generate some files under index folder... when i crwal a site... i need to generate below files... please help... tried nearly a week.. to solve. _0.fdt _0.tis _0.fdx _0.prx ._0.fdt.crc _0.tii ._0.fdx.crc _0.nrm ._0.fnm.crc ._0.frq.crc _0.fnm _0.frq ._0.nrm.crc ._0.tii.crc ._0.tis.crc ._0.prx.crc response as a solution is appreciated. Thanks S Patil. -- View this message in context: http://www.nabble.com/files-are-not-generated-in-index-folder-by-indexer-for-the-site-http%3A--www.traguiden.se%28for-other-sites-its-working-good%29-while-crwaling-tp14330778p14330778.html Sent from the Nutch - Dev mailing list archive at Nabble.com.