Re: no static NutchConf

2006-01-04 Thread Andrzej Bialecki

Stefan Groschupf wrote:


Hi,
to move forward in the direction of having a nutch gui, I would love  
to start removing the static access of NutchConf.
Based on experience first I would love to get a kind of general  
agreement and a 'go' before wasting to much time for an unaccented  
solution.



I agree with the general direction. Some comments below:



I suggest:

+ removing NutchConf.get().



I'm not sure about this... Somewhere you need to instantiate the default 
config, and this looks like a good place.


+ in case a lower level object use only one, two but not more than 3  
parameters from the nutch configuration, we add this parameter to the  
constructor of this object.

(e.g. MapFile.Reader needs only the parameter INDEX_SKIP)



I don't fully agree with this. In most such cases, you already have a 
NutchConf instance in the method or class context, so it makes sense to 
use it in the constructor. You could add these construtors with all 
parameters iterated, but I'd expect that the constructors using 
NutchConf would be used most frequently.


+ for higher level objects like fetcher tool- that need more than 3  
parameters for the lower level object -  we add a instance of  
NutchConf to the Constructor



Ok.

+ for all dynamic used object that implements a specific interface  
(interface > no control over the object constructor) we use the  
Configurable interface to set the NutchConf in a inversion of control  
like style.

(e.g. Plugin Extension Implementations like Parser or Protocols)



Ok.

+ PluginRegestry will not longer a singleton but will get an  
constructor with a NutchConf instance.



Definitely yes.

+ Getting a Extension, require also a NutchConf that is injected in  
case the Extension Object (e.g. a Parser) implements a Configurable  
interface.



Yes. If you remember our discussion, I'd like also to follow a pattern 
where such instances are cached inside this NutchConf instance, if 
appropriate (i.e. if they are reusable and multi-threaded).




Any comments, improvement suggestions, more use-cases?
I would love to do this job, can I get a go from the other developers?



+1 from me.

--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




Re: no static NutchConf

2006-01-04 Thread Stefan Groschupf


I don't fully agree with this. In most such cases, you already have  
a NutchConf instance in the method or class context, so it makes  
sense to use it in the constructor. You could add these construtors  
with all parameters iterated, but I'd expect that the constructors  
using NutchConf would be used most frequently.


My  idea is to be able using low level things outside of nutch also.  
It is may a philosophically question in case of the map file writer  
you pass a complete hashmap with a bunch of properties to the object,  
but the objects only reads one int from this hashmap. I personal  
don't like to use a hashmap to 'transport' just one value.


So my suggestion looks like:
new MapFile.Reader(parameterA, nutchConf.getInt("parameterKey", 0));
if I understand you correct you prefer:
new MapFile.Reader(parameterA, nutchConf);
...
public MapFile(...){
this.parameter = nutchConf.getInt("parameterKey",0);
}

As mentioned this is more a code philosophy question and this is not  
important for me, my only idea was to decouple things as much as  
possible if we touch it anyway.


+ Getting a Extension, require also a NutchConf that is injected  
in  case the Extension Object (e.g. a Parser) implements a  
Configurable  interface.



Yes. If you remember our discussion, I'd like also to follow a  
pattern where such instances are cached inside this NutchConf  
instance, if appropriate (i.e. if they are reusable and multi- 
threaded).



I'm afraid I still do not clearly understand your idea here. As  
discussed it makes from my point of view no sense to cache any  
objects in a nutchConf.
Especially extension implementation like parsers are multithreaded  
and exists that often as we have threads. A caching would make more  
sense behind the sense of the plugin registry, but it is may  
difficult since you can run in trouble with resource life cycle  
management. PluginClass instances are already cached and working like  
a kind of singleton for each existing plugin registry.
Also I see some trouble  when using this caching mechanism since  
NutchConf can be serialized. Actually I have no idea where this  
mechanism is used, but I guess distributed map reduce will use this  
mechanism heavily.

So the cached objects need to be Serializable as well.

Stefan



Re: no static NutchConf

2006-01-04 Thread Jérôme Charron
> My  idea is to be able using low level things outside of nutch also.
> It is may a philosophically question in case of the map file writer
> you pass a complete hashmap with a bunch of properties to the object,
> but the objects only reads one int from this hashmap. I personal
> don't like to use a hashmap to 'transport' just one value.

Yes Stefan, but passing only the NutchConf in the constructor
1. avoid breaking compatibility if a new parameter is used in a future
version of the constructor.
2. Give control of default values to the class itself instead of the calling
object.
I think that we can accept the general convention that all NutchConfigurable
objects must provide a constructor with a single NutchConf parameter.

Excuse me in advance, I probably missed something, but what are the use
cases for having many NutchConf instances with different values?

Regards

Jérôme


Re: no static NutchConf

2006-01-04 Thread Andrzej Bialecki

Jérôme Charron wrote:


Excuse me in advance, I probably missed something, but what are the use
cases for having many NutchConf instances with different values?
 



Running many different tasks in parallel, each using different config, 
inside the same JVM.


--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




RE: no static NutchConf

2006-01-04 Thread Steve Betts
If you are going to be able to reconfigure a nutch component at runtime, you
need to remove any configuration from the constructor and have a method that
allows you to get/set the configuration for the component. The problem with
keeping the entire configuration in a single component is trying to
display/filter the configuration information for the user. So the user knows
what component it is configuring.

Eclipse has a very good pattern for handling configuration for each of the
components. Basically each component is responsible for its own
configuration, and the tool just provides the framework to allow the
configuration to be displayed, updated, and stored.

The drawback of that approach is that you really don't have a GUI, or at
least have to be able to run without one.

I think that, at the very least, removing the configuration information from
the constructor is the first step.  You can still have a properties object
set the configuration. Then we can discuss the relative merits of
displaying, changing, and storing the configuration.  (Like, how a user is
supposed to know what component is affected by which property.)

Thanks,

Steve Betts
[EMAIL PROTECTED]
937-477-1797


-Original Message-
From: Stefan Groschupf [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 04, 2006 12:22 PM
To: nutch-dev@lucene.apache.org
Subject: Re: no static NutchConf

>
> I don't fully agree with this. In most such cases, you already have
> a NutchConf instance in the method or class context, so it makes
> sense to use it in the constructor. You could add these construtors
> with all parameters iterated, but I'd expect that the constructors
> using NutchConf would be used most frequently.

My  idea is to be able using low level things outside of nutch also.
It is may a philosophically question in case of the map file writer
you pass a complete hashmap with a bunch of properties to the object,
but the objects only reads one int from this hashmap. I personal
don't like to use a hashmap to 'transport' just one value.

So my suggestion looks like:
new MapFile.Reader(parameterA, nutchConf.getInt("parameterKey", 0));
if I understand you correct you prefer:
new MapFile.Reader(parameterA, nutchConf);
...
public MapFile(...){
this.parameter = nutchConf.getInt("parameterKey",0);
}

As mentioned this is more a code philosophy question and this is not
important for me, my only idea was to decouple things as much as
possible if we touch it anyway.

>> + Getting a Extension, require also a NutchConf that is injected
>> in  case the Extension Object (e.g. a Parser) implements a
>> Configurable  interface.
>
>
> Yes. If you remember our discussion, I'd like also to follow a
> pattern where such instances are cached inside this NutchConf
> instance, if appropriate (i.e. if they are reusable and multi-
> threaded).


I'm afraid I still do not clearly understand your idea here. As
discussed it makes from my point of view no sense to cache any
objects in a nutchConf.
Especially extension implementation like parsers are multithreaded
and exists that often as we have threads. A caching would make more
sense behind the sense of the plugin registry, but it is may
difficult since you can run in trouble with resource life cycle
management. PluginClass instances are already cached and working like
a kind of singleton for each existing plugin registry.
Also I see some trouble  when using this caching mechanism since
NutchConf can be serialized. Actually I have no idea where this
mechanism is used, but I guess distributed map reduce will use this
mechanism heavily.
So the cached objects need to be Serializable as well.

Stefan




Re: no static NutchConf

2006-01-04 Thread Jérôme Charron
> >Excuse me in advance, I probably missed something, but what are the use
> >cases for having many NutchConf instances with different values?
> Running many different tasks in parallel, each using different config,
> inside the same JVM.

Ok, I understand this Andrzej, but it is not really what I call a use case.
It is more a feature that you describe here.
In fact, what I mean is that I don't understand in which cases it will be
usefull. And I don't understand how a particular
NutchConfig will be selected for a particular task...

Regards

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/


Re: no static NutchConf

2006-01-04 Thread Andrzej Bialecki

Jérôme Charron wrote:


Excuse me in advance, I probably missed something, but what are the use
cases for having many NutchConf instances with different values?
 


Running many different tasks in parallel, each using different config,
inside the same JVM.
   



Ok, I understand this Andrzej, but it is not really what I call a use case.
It is more a feature that you describe here.
In fact, what I mean is that I don't understand in which cases it will be
usefull. And I don't understand how a particular
NutchConfig will be selected for a particular task...
 



Use case: executing multiple tasks on any single tasktracker node, but 
with drastically different configurations per each task.


Example: what happens now if you try to run more than one fetcher at the 
same time, where the fetcher parameters differ (or a set of activated 
plugins differs)? You can't - the local tasks on each tasktracker will 
use whatever local config is there. What happens if you change the 
config on a node that  submits the job? The changes won't be propagated 
to the tasktracker nodes, because tasktrackers use local configuration 
(through a singleton NutchConf.get()), instead of supplying a 
serialized/deserialized instance of the config from the originating 
node... etc.


NutchConf instances will be created when you create a JobConf. Then they 
will have to be serialized/deserialized when job descriptors are sent by 
jobtracker to tasktrackers on mapred nodes, and used locally by 
tasktrackers to instantiate local tasks using copies of the original 
NutchConf instance.


--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




Re: no static NutchConf

2006-01-04 Thread Piotr Kosiorowski

+1 in general
In fact I like the approach presented by Stefan to pass only required 
parameters to objects that have small number of configurable params 
instead of NutchConf - it makes it obvious which parameters are required 
for such basic objects to run and as they are usually building blocks 
for something bigger it makes it easier to reuse it with different 
params in different parts of the code. But I like the direction and will 
not oppose against passing the whole NutchConf in this case.

Regards
Piotr


Re: no static NutchConf

2006-01-04 Thread Thomas Jaeger
Hi,

Stefan Groschupf wrote:
[...]
> Any comments, improvement suggestions, more use-cases?

I completely agree with you.

I have two more ideas:
1) create NutchConf as interface (not class)
2) make it work as plugin

1) If NutchConf is an interface, the NutchConf implementation can be
written with a hashmap in mind (like now) or with JMX or
commons-configuration.
2) There are only 4 required configuration options (plugin.excludes,
plugin.includes, plugin.folders, plugin.auto-activation) the plugin
registry needs to start up. If these options are provided by a bootstrap
configuration, configuration plugins will be possible.

If help is needed, i would like to implement a JMX implementation of
NutchConf (since i will need it myself;).


Regards,

Thomas


Re: no static NutchConf

2006-01-04 Thread Doug Cutting

Andrzej Bialecki wrote:
Example: what happens now if you try to run more than one fetcher at the 
same time, where the fetcher parameters differ (or a set of activated 
plugins differs)? You can't - the local tasks on each tasktracker will 
use whatever local config is there.


That's true when mapred.job.tracker=local, but when things are 
distributed the config can vary since each task is spawned in a separate 
JVM with a separate classpath.  The nutch-site.xml on each node can 
never be overidden.  For example, so long as plugin.includes is not 
specified in nutch-site.xml on each node, then each task can override 
plugin.includes to use different plugins.


Also note that plugin implementations can submitted in a jar file with 
the job, and plugin.folders can be overridden in the job to find the new 
plugins.  So a job jar might include a folder named "my.plugins" and set 
plugin.folders to "my.plugins, plugins", then alter plugin.includes to 
include job-specific plugins.


What happens if you change the 
config on a node that  submits the job? The changes won't be propagated 
to the tasktracker nodes, because tasktrackers use local configuration 
(through a singleton NutchConf.get()), instead of supplying a 
serialized/deserialized instance of the config from the originating 
node... etc.


Again, I'm not sure this is a problem.  Properties which tasks should be 
able to override should not be specified in nutch-site.xml, but rather 
in mapred-default.xml.  Lots of job-specific properties are currently 
passed this way.


Another use case for eliminating the static uses of NutchConf is to 
simplify the construction of a configuration gui.  It would be nice to 
have a web-based interface which permits one to configure parameters and 
then have it run the system.  This should be able to run multiple Nutch 
instances in a single JVM.  For example, a single Nutch-based "search 
appliance" daemon should be able to crawl and search both your intranet 
and your public websites, each configured separately.


Doug


Re: no static NutchConf

2006-01-04 Thread David Wallace
Hi Stefan,
I think these are fine things to be doing.  Just two points:
 
(1) Why not just always pass the NutchConf to the constructor of any
class that needs it?  Instead of distinguishing between the case of
whether the class will use 1 or 2 configuration parameters; or more than
that.  Just for consistency.  Also, it's possible that a class that
CURRENTLY only uses 2 configuration parameters will use 3 or 4 at some
point in the future, and it would be a shame to have to rewrite its
constructor when that happens.
 
(2) What I'd REALLY like to see is if NutchConf were an interface, with
methods that allow the retrieval of properties from any source.  There
could be a class NutchXmlConf which implements the NutchConf interface,
which works the current way (with nutch-default.xml, nutch-site.xml and
so on).  Where we need to create a NutchConf, we actually create a
NutchXmlConf, but pass it to class constructors whose arguments are of
type NutchConf.  That way, if I want to use a non-standard mechanism for
storing my Nutch parameters (eg, a properties file, a relational
database, the Windows Registry, whatever), I can write my own class that
implements the NutchConf interface; then instantiate it and pass it
around, without having to re-write every Nutch class that uses it.
 
The benefits of (2) are legion.  In particular, for people who want to
use a Nutch search engine as part of an existing web application, where
that existing application uses a specific (non-XML) mechanism for
storing configuration parameters.  It would also give extra flexibility
for people working on Nutch installations that sit in multiple
environments (Development, System Test, UAT, Production etc) and get
deployed from one environment to the next.
 
Regards,
David.
 
 
 
 
From: Stefan Groschupf <[EMAIL PROTECTED]>
Date: Wed, 4 Jan 2006 15:39:38 +0100
Subject: [Nutch-dev] no static NutchConf

Hi,
to move forward in the direction of having a nutch gui, I would love  
to start removing the static access of NutchConf.
Based on experience first I would love to get a kind of general  
agreement and a 'go' before wasting to much time for an unaccented  
solution.

I suggest:

+ removing NutchConf.get().
+ in case a lower level object use only one, two but not more than 3  
parameters from the nutch configuration, we add this parameter to the 

constructor of this object.
(e.g. MapFile.Reader needs only the parameter INDEX_SKIP)
+ for higher level objects like fetcher tool- that need more than 3  
parameters for the lower level object -  we add a instance of  
NutchConf to the Constructor
+ for all dynamic used object that implements a specific interface  
(interface > no control over the object constructor) we use the  
Configurable interface to set the NutchConf in a inversion of control 

like style.
(e.g. Plugin Extension Implementations like Parser or Protocols)
+ PluginRegestry will not longer a singleton but will get an  
constructor with a NutchConf instance.
+ Getting a Extension, require also a NutchConf that is injected in  
case the Extension Object (e.g. a Parser) implements a Configurable  
interface.

Any comments, improvement suggestions, more use-cases?
I would love to do this job, can I get a go from the other developers?
>From my point of view NutchConf is actually a showblocker since a  
lot of people run in trouble integrating nutch in other projects,  
also my suggestions are require to write a nutch gui.

Stefan



This email may contain legally privileged information and is intended only for 
the addressee. It is not necessarily the official view or 
communication of the New Zealand Qualifications Authority. If you are not the 
intended recipient you must not use, disclose, copy or distribute this email or 
information in it. If you have received this email in error, please contact the 
sender immediately. NZQA does not accept any liability for changes made to this 
email or attachments after sending by NZQA. 

All emails have been scanned for viruses and content by MailMarshal. 
NZQA reserves the right to monitor all email communications through its network.




Re: no static NutchConf

2006-01-04 Thread Stefan Groschupf
Another use case for eliminating the static uses of NutchConf is to  
simplify the construction of a configuration gui.  It would be nice  
to have a web-based interface which permits one to configure  
parameters and then have it run the system.  This should be able to  
run multiple Nutch instances in a single JVM.  For example, a  
single Nutch-based "search appliance" daemon should be able to  
crawl and search both your intranet and your public websites, each  
configured separately.


Well this is my long term goal, I have to do that for my project in  
any case. :-)


Stefan



Re: no static NutchConf

2006-01-04 Thread ilango gurusamy
Stefan
  I would like to help you to do your project on the Nutch-based search  
appliance deamon. The reason is: I want to have experience and learn  stuff. I 
started playing around with Nutch. I wrote a scraper in perl  and now I am 
trying to run one of the sample plugins too
  
  ilango

Stefan Groschupf <[EMAIL PROTECTED]> wrote:  > Another use case for eliminating 
the static uses of NutchConf is to  
> simplify the construction of a configuration gui.  It would be nice  
> to have a web-based interface which permits one to configure  
> parameters and then have it run the system.  This should be able to  
> run multiple Nutch instances in a single JVM.  For example, a  
> single Nutch-based "search appliance" daemon should be able to  
> crawl and search both your intranet and your public websites, each  
> configured separately.

Well this is my long term goal, I have to do that for my project in  
any case. :-)

Stefan





-
Yahoo! Photos
 Ring in the New Year with Photo Calendars. Add photos, events, holidays, 
whatever.

Re: no static NutchConf

2006-01-05 Thread Jérôme Charron
> Another use case for eliminating the static uses of NutchConf is to
> simplify the construction of a configuration gui.  It would be nice to
> have a web-based interface which permits one to configure parameters and
> then have it run the system.

Yes, it is a really needed feature.


>   This should be able to run multiple Nutch
> instances in a single JVM.  For example, a single Nutch-based "search
> appliance" daemon should be able to crawl and search both your intranet
> and your public websites, each configured separately.

Ok, but why not using two JVM in such a case?

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/


Re: no static NutchConf

2006-01-05 Thread Stefan Groschupf

Hey Steve,

Eclipse has a very good pattern for handling configuration for each  
of the

components. Basically each component is responsible for its own
configuration, and the tool just provides the framework to allow the
configuration to be displayed, updated, and stored.



I know the eclipse configuration mechanism, we have different case  
with nutch.
The eclipse mechanism does not allow to run two eclipse in the same  
jvm, sure for eclipse that makes no sense, but for nutch it does very  
much (e.g. have a search engine for different parts of a corporate  
intranet on one box).
The eclipse mechanism is a kind of singleton configuration and each  
component (eclipse plugin, load what it is interested in) for nutch  
we need to pass the configuration properties down the call stack, to  
be able running 2 fetchers with different configurations and having 2  
instances of the same parser plugin but with different configuration  
values.



Stefan


 


Re: no static NutchConf

2006-01-05 Thread Stefan Groschupf


But I like the direction and will not oppose against passing the  
whole NutchConf in this case.

Ok than we will pass the NutchConf in the constructor.
It is a lot of work and will may take some time.



Re: no static NutchConf

2006-01-05 Thread Stefan Groschupf

I have two more ideas:
1) create NutchConf as interface (not class)
2) make it work as plugin


I like the idea to make the conf as a singleton and understand the  
need to be able to integrate nutch.
However I would love to do one first step and later on we can make  
this second step. I made the experience that if you change to much  
people do not accept your patch.
This is painful since you invest some days of work and in the end  
wast your time for the trash.
So lets add this to the jira as improvement suggestion and do this  
step after the actually change.


Stefan 


Re: no static NutchConf

2006-01-05 Thread Stefan Groschupf

(2) What I'd REALLY like to see is if NutchConf were an interface,


As mentioned, give us some time to get the first step done and than  
I'm sure such kind of community contributions are every-time welcome.

May people can work together on this.

Stefan




Re: no static NutchConf

2006-01-05 Thread Stefan Groschupf

Hi Andrzej,
may be  I come closer to your idea of caching some objects.

Yes. If you remember our discussion, I'd like also to follow a  
pattern where such instances are cached inside this NutchConf  
instance, if appropriate (i.e. if they are reusable and multi- 
threaded).


As mentioned I think it makes no sense to cache things like plugin  
extension object, but what you think about caching the  
PluginRepository that was already created with this specific  
configuration instance.
Of course we can not serialize this, but I guess this will improve  
the performance somehow, since we do not need to scan the plugin  
folder and time.




Stefan 


Re: no static NutchConf

2006-01-05 Thread Andrzej Bialecki

Stefan Groschupf wrote:


Hi Andrzej,
may be  I come closer to your idea of caching some objects.

Yes. If you remember our discussion, I'd like also to follow a  
pattern where such instances are cached inside this NutchConf  
instance, if appropriate (i.e. if they are reusable and multi- 
threaded).



As mentioned I think it makes no sense to cache things like plugin  
extension object, but what you think about caching the  
PluginRepository that was already created with this specific  
configuration instance.
Of course we can not serialize this, but I guess this will improve  
the performance somehow, since we do not need to scan the plugin  
folder and time.



Yes, I agree on both accounts. :-)

--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




Re: no static NutchConf

2006-01-05 Thread Doug Cutting

Stefan Groschupf wrote:

I have two more ideas:
1) create NutchConf as interface (not class)
2) make it work as plugin


I like the idea to make the conf as a singleton and understand the  need 
to be able to integrate nutch.
However I would love to do one first step and later on we can make  this 
second step. I made the experience that if you change to much  people do 
not accept your patch.


+1

I don't see a big advantage in trying to make both of these changes at 
the same time.  And, when possible, small incremental changes are easier 
for the community to process.


Doug


Re: no static NutchConf

2006-01-05 Thread Thomas Jaeger
Doug Cutting wrote:
> Stefan Groschupf wrote:
> 
>>> I have two more ideas:
>>> 1) create NutchConf as interface (not class)
>>> 2) make it work as plugin
>>
>>
>> I like the idea to make the conf as a singleton and understand the 
>> need to be able to integrate nutch.
>> However I would love to do one first step and later on we can make 
>> this second step. I made the experience that if you change to much 
>> people do not accept your patch.
> 
> 
> +1
> 
> I don't see a big advantage in trying to make both of these changes at
> the same time.  And, when possible, small incremental changes are easier
> for the community to process.

I never thought to make these changes at once. These were just some
thoughts on how to improve the nutch configuration. I agree with Stefan
in this point.


Thomas




Re: no static NutchConf

2006-01-08 Thread Marko Bauhardt
+ Getting a Extension, require also a NutchConf that is injected in  
case the Extension Object (e.g. a Parser) implements a Configurable  
interface.




I think this is a good idea. But many plugins like  
BasicIndexingFilter or ExtParse require some fileds in the "parse" or  
"filter" method. These fields are  load over the static way (over  
static NutchConf or static blocks). And this is ok, because the  
fields are load only one time. If we load the fields in the "parse"  
or "filter" methods, the fields would be load many times. And this is  
a performance problem.
The initialization of the fields over the constructor does not work,  
because setConf() is calling after the constructor.


Should we add a method like "loadNutchConfiguration()" to the  
NutchConfigurable interface, to load the NutchConfiguration  
Parameter? Hm, i don't know.
Should the fields are loading in the setConf() method? Hm, the name  
of the method says: set the NutchConf and not load the required  
NutchConfiguration-Parameter.

Has anyone an other elegant solution?

Marko



Re: no static NutchConf

2006-01-08 Thread Stefan Groschupf

Marko,
as mentioned...
All these classes will implement the NutchConfigurable interface. The  
plugin system will instantiate these objects and inject the nutch  
configuration object *BEFORE* it will return the object instance to  
the caller object.
So we can be sure that setConf is called before any e.g. parse method  
is called.
So the answer is the fields will be setted / intialized in the  
setConf method that need to be implemented by each extension class  
and we have the agreement that this method is called directly after  
the constructor but before any other call.

Does that clarify my suggestion?

Stefan



Am 08.01.2006 um 15:49 schrieb Marko Bauhardt:

+ Getting a Extension, require also a NutchConf that is injected  
in case the Extension Object (e.g. a Parser) implements a  
Configurable interface.




I think this is a good idea. But many plugins like  
BasicIndexingFilter or ExtParse require some fileds in the "parse"  
or "filter" method. These fields are  load over the static way  
(over static NutchConf or static blocks). And this is ok, because  
the fields are load only one time. If we load the fields in the  
"parse" or "filter" methods, the fields would be load many times.  
And this is a performance problem.
The initialization of the fields over the constructor does not  
work, because setConf() is calling after the constructor.


Should we add a method like "loadNutchConfiguration()" to the  
NutchConfigurable interface, to load the NutchConfiguration  
Parameter? Hm, i don't know.
Should the fields are loading in the setConf() method? Hm, the name  
of the method says: set the NutchConf and not load the required  
NutchConfiguration-Parameter.

Has anyone an other elegant solution?

Marko




---
company:http://www.media-style.com
forum:http://www.text-mining.org
blog:http://www.find23.net




Re: no static NutchConf

2006-01-08 Thread Marko Bauhardt


Am 08.01.2006 um 16:08 schrieb Stefan Groschupf:


Marko,
as mentioned...
All these classes will implement the NutchConfigurable interface.  
The plugin system will instantiate these objects and inject the  
nutch configuration object *BEFORE* it will return the object  
instance to the caller object.
So we can be sure that setConf is called before any e.g. parse  
method is called.


Thats right.

So the answer is the fields will be setted / intialized in the  
setConf method that need to be implemented by each extension class  
and we have the agreement that this method is called directly after  
the constructor but before any other call.

Does that clarify my suggestion?


Yes. I thought that the call of the method setConf only set the  
NutchConf. This is a philosophical question. All right, the  
implementation class can also load/set the fields.


Thanks, Marko



Re: no static NutchConf

2006-01-08 Thread Stefan Groschupf
Yes. I thought that the call of the method setConf only set the  
NutchConf. This is a philosophical question. All right, the  
implementation class can also load/set the fields.


Some Map / Reduce Classes already use this mechanism. E.g. see  
CrawlDbReducer, there is a configure method. But here the  
JobConfigurable interface is used.


Stefan