Many thanks, Andrzej, It makes sense that in a pure hadoop environment, where nutch has not been distributed to the tasktracker machines, that there needs to be a method to pass configurations and plugins to them. Thus, I can begin to understand why the hadoop code would need to prioritise the sources of these configs and plugins. My brain still aches as to why this would apply in our case (where nutch and the configs HAVE been distributed to the tasktrackers), but I'm willling to accept that it is so, and compiled and distributed the job file!
As, yesterday, I'm out at meetings all day today, and will be able to report "our" progress :), I will also clear an extra 10 hours with the bosses, although even if they declined I will be good for it. Sadly my 2.5m fetch failed during the reduce phase, which raises the priority having the tools for combining db's and segments (If you have time on your hands today :)) I'll be back in the evening with any news. Thanks for all your support, and talk to you later, Monu -----Original Message----- From: Michael Stack [mailto:[EMAIL PROTECTED] Sent: 04 May 2006 20:30 To: [email protected] Subject: Re: plugins in job file. Stefan Groschupf wrote: > Hi, > > I'm wondering why the plugins are in the job file, since it looks like > the plugins are never loaded from the job file but from the outside > (plugin folder). > Should they? If running your job jar on a pure hadoop platform, there are no plugins on local disk. The job jar needs to carry all it needs to run. If you have nutch everywhere on your cluster, there will be plugins on disk and plugins in your job jar. Which gets favored should just be a matter of the CLASSPATH when the child runs: The first plugin found wins (Looks like those on disk will be found first going by TaskRunner classpath). In the past, I've had some trouble trying to load up extra plugins and overrides of plugins already present in the nutch default 'plugins' directory. At the time, naming the plugins directory in my job jar other than 'plugins' -- e.g. 'xtra-plugins' -- and then adding it to the plugins.include property in configuration loaded into my job jar AHEAD of default 'plugin' directory got me further. Nowadays, I build a job jar that that picks and chooses from multiple plugin sources, the plugins I need, aggregating them under a plugin dir in the job jar. The resultant job jar is run on a pure hadoop rather than nutch platform. St.Ack -- No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.1.392 / Virus Database: 268.5.3/331 - Release Date: 03/05/2006 -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.392 / Virus Database: 268.5.4/332 - Release Date: 04/05/2006 ------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
