Many thanks, Andrzej, 

It makes sense that in a pure hadoop environment, where nutch has not
been distributed to the tasktracker machines, that there needs to be a
method to pass configurations and plugins to them.  Thus, I can begin to
understand why the hadoop code would need to prioritise the sources of
these configs and plugins.  My brain still aches as to why this would
apply in our case (where nutch and the configs HAVE been distributed to
the tasktrackers), but I'm willling to accept that it is so, and
compiled and distributed the job file!

As, yesterday, I'm out at meetings all day today, and will be able to
report "our" progress :), I will also clear an extra 10 hours with the
bosses, although even if they declined I will be good for it. 

Sadly my 2.5m fetch failed during the reduce phase, which raises the
priority having the tools for combining db's and segments (If you have
time on your hands today :))

I'll be back in the evening with any news.

Thanks for all your support, and talk to you later, 

Monu

-----Original Message-----
From: Michael Stack [mailto:[EMAIL PROTECTED] 
Sent: 04 May 2006 20:30
To: [email protected]
Subject: Re: plugins in job file.


Stefan Groschupf wrote:
> Hi,
>
> I'm wondering why the plugins are in the job file, since it looks like
> the plugins are never loaded from the job file but from the outside 
> (plugin folder).
> Should they?

If running your job jar on a pure hadoop platform, there are no plugins 
on local disk.  The job jar needs to carry all it needs to run.

If you have nutch everywhere on your cluster, there will be plugins on 
disk and plugins in your job jar.  Which gets favored should just be a 
matter of the CLASSPATH when the child runs: The first plugin found wins

(Looks like those on disk will be found first going by TaskRunner 
classpath).

In the past, I've had some trouble trying to load up extra plugins and 
overrides of plugins already present in the nutch default 'plugins' 
directory.   At the time, naming the plugins directory in my job jar 
other than 'plugins' -- e.g. 'xtra-plugins' -- and then adding it to the

plugins.include property in configuration loaded into my job jar AHEAD 
of default 'plugin' directory got me further. 

Nowadays, I build a job jar that that picks and chooses from multiple 
plugin sources, the plugins I need, aggregating them under a plugin dir 
in the job jar.  The resultant job jar is run on a pure hadoop rather 
than nutch platform. 

St.Ack


-- 
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.392 / Virus Database: 268.5.3/331 - Release Date:
03/05/2006
 

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.392 / Virus Database: 268.5.4/332 - Release Date:
04/05/2006
 



-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to