I am very new to the Nutch source code, and have been reading over the
Injector class code. From what I understood of the MapReduce system there
had to be both a map and reduce step in order for the algorithm to function
properly. However, in CrawlDb.createJob( Configuration, Path ) a new job is
created for merging the injected URLs that has no Mapper Class set.
..
JobConf job = new NutchJob(config);
job.setJobNmae("crawldb " + crawlDb);
Path current = new Path(crawlDb, CrawlDatum.DB_DIR_NAME);
if ( FileSystem.get( job ).exists( current ) ) {
job.addInputPath( current );
}
job.setInputFormat( SequenceFileInputFormat.class );
job.setInputKeyClass( UTF8.class );
job.setInputValueClass( CrawlDatum.class );
job.setReducerClass( CrawlDbReducer.class );
job.setOutputPath( newCrawlDb);
job.setOutputFormat( MapFileOutputFormat.class );
job.setOutputKeyClass( UTF8.class );
job.setOutputValueClass( CrawlDatum.class );
return job;
How does this code function properly?
Is it designed to only run on a single machine and thus does not need a
mapper function set?
Thanks for any help,
-Charles Williams
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers