Jason Camp wrote:
I'd like to generate multiple segments in a row, and send them off to another server, is this possible using the local file system?
The Hadoop-based Nutch now automates multiple, parallel fetches for you. So there is less need to manually generate multiple segments. Try configuring your servers as slaves (by adding them to conf/slaves) and configuring a master (by setting fs.default.name and mapred.jobtracker in conf/hadoop-site.xml) then using bin/start-all.sh to start daemons. Then copy your root url directory to dfs with something like 'bin/hadoop dfs -put roots roots'. Then you can run a multi-machine crawl with 'bin/nutch crawl'. Or if you need finer-grained control, you can still step through the inject, generate, fetch, updatedb, generate, fetch, ... cycle, except now each step runs across all slave nodes.
This is outlined in the Hadoop javadoc: http://lucene.apache.org/hadoop/docs/api/overview-summary.html#overview_description Doug ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
