I propose we cleanup Nutch's tools as follows.

First, some definitions:

1. An "action" is an operation on Nutch data. For example, GenerateSegmentFromDB, FetchSegment, UpdateDB, IndexSegment, MergeIndexes, SearchServer, etc. are all actions.

2. A "tool" invokes an action from the command line.

The proposal:

1. Actions and tools should be separate classes, in separate files.

2. A tool class should define no methods other than a main() and perhaps those required to parse the command line. All application logic should be in the action class.

3. All actions must implement the following interface:

  public interface NutchConfigurable {
    void setConf(NutchConf conf);
    NutchConf getConf();
  }

4. Most actions should implement this by extending:

  public class NutchConfigured implements NutchConfigurable {
    private NutchConf conf;
    public NutchConfigured(NutchConf conf) { setConf(conf); }
    public void setConf(NutchConf conf) { this.conf = conf; }
    public NutchConf getConf() { return conf; }
  }

5. All plugins must implement NutchConfigurable.

6. Plugin factory methods must accept a NutchConf.

For example:

  public static Protocol ProtocolFactory.getProtocol(String url);

will become:

  public static Protocol ProtocolFactory.getProtocol(NutchConf, String);

Comments?

Doug



-------------------------------------------------------
This SF.net email is sponsored by Demarc:
A global provider of Threat Management Solutions.
Download our HomeAdmin security software for free today!
http://www.demarc.com/Info/Sentarus/hamr30
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to