I propose we cleanup Nutch's tools as follows.
First, some definitions:
1. An "action" is an operation on Nutch data. For example, GenerateSegmentFromDB, FetchSegment, UpdateDB, IndexSegment, MergeIndexes, SearchServer, etc. are all actions.
2. A "tool" invokes an action from the command line.
The proposal:
1. Actions and tools should be separate classes, in separate files.
2. A tool class should define no methods other than a main() and perhaps those required to parse the command line. All application logic should be in the action class.
3. All actions must implement the following interface:
public interface NutchConfigurable {
void setConf(NutchConf conf);
NutchConf getConf();
}4. Most actions should implement this by extending:
public class NutchConfigured implements NutchConfigurable {
private NutchConf conf;
public NutchConfigured(NutchConf conf) { setConf(conf); }
public void setConf(NutchConf conf) { this.conf = conf; }
public NutchConf getConf() { return conf; }
}5. All plugins must implement NutchConfigurable.
6. Plugin factory methods must accept a NutchConf.
For example:
public static Protocol ProtocolFactory.getProtocol(String url);
will become:
public static Protocol ProtocolFactory.getProtocol(NutchConf, String);
Comments?
Doug
------------------------------------------------------- This SF.net email is sponsored by Demarc: A global provider of Threat Management Solutions. Download our HomeAdmin security software for free today! http://www.demarc.com/Info/Sentarus/hamr30 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
