Well, unfortuneatly a large post I had written got lost in the gmail abyss, grrr.
But, in response to this: > From a dumb user standpoint, I like the config files. >I want something I can just copy around, version control, and edit with >vi, and don't need >to hire a java developer to configure. I don't have a problem with this. We could always provide a file, a simple property file, for users to configure. This is also an easy thing to do with spring/hivemind. This is definitely not an issue. In Spring it's handled by the "org.springframework.beans.factory.config.PropertyPlaceholderConfigurer" class. This allows the developers to handle all the wiring of the application. Then the user can fill out the properties we expose from within our context files. Where I am coming from is the fact the application has no easy way for developers to package up the implementations they want, or to have a 'fail-fast' mechanism for misconfigured/missing attributes on the internal components of Nutch. There is also no way to introspect the implementations (or interfaces that need to be defined, I'll get back to that) of the components for their dependencies/attributes. There is also a huge problem with the global namespacing that is happening within the configurations. Take a property, such as 'http.max.delays'. What class defined that property? What happens if another component decided to use the same property name? How is that developer suppose to find out what has been used? Search through the code? I say yuck. I have also seen properties that are defined in the config files, but nothing references them. This would not happen with a depenency injection container. So, when it comes to API interfaces, I have a few issues that I believe could be rectified if nutch were to follow a DI (Dependency Injection) approach. Take for instance the Fetcher/Fetcher2 classes. How would one change the implementation within their application? There is no way. They (the developer, deployer, configurer whatever) actually has to edit source code to use a different implementation. There is no common interface, and this is just where I have seen a lot of errors in the design of the application. I'll speak of the major components being Fetcher, Indexer, Generator and NutchBean. None of these are defined, anywhere other than concrete implementations yet, over time, there have been several versions that have no compatabiltiy with each other. This is where I believe that defining actual requirements and devloping clean interfaces/abstract classes to allow custom implementations would benefit the development process. If we could define what a Fetcher's interface shoud look like, we could easily have many implementations that could just be replaced within a configuration file(s). Also, by moving to a modern DI approach, tools could easily discover the properties/dependencies that are required for the components. This allows a 'fail-fast' mechanism for misconfigured and missing attributes. It also allows better namespacing of properties, easier type casting/checking. Don't forget the javadoc will also provide this easily (rather than a bunch of public static strings that define property names). Another example I see is within the plugin section. I notice that just about every plugin has the same intialization code copied from one plugin to another. Inheritence by clipboard should be discouraged. This could also be solved by applying setters on the plugin's for thier dependencies and allowing the DI framework to inject them. A second issue with the plugins is that their configuration files are configured along with the plugin itself. This does not allow multiple instances of nutch to use the same plugin repository. So, for every instance of nutch, you have to have a copy of all the plugins. Allowing the plugin's configurations to be provided by the application would be a better place. When I look at CrawlDB I see no real interface that tells me what it does (other than it's some tool). If we could have an interface with business methods on it that describe what a "CrawlDB" is/does, we could easily have different implementations (that many people are asking for) such as a JDBC version, a Hadoop Map version, JNDI version etc. I'll stop for now and I hope I haven't made anyone angry. I am just pointing out some issues that I can see are causing problems (in my case at least). briggs. On 7/5/07, Ian Holsman <[EMAIL PROTECTED]> wrote: > Briggs wrote: > > > > One thing I would love to do in the future of nutch is to get rid of > > all the custom '*-config.xml" files and replace it with a more > > standard (well, more accepted) DI container (such as spring or > > hivemind [probably hivemind]). It would be nice to be able to > > configure each component within nutch in this way. I think it would > > really help in "componentizing" the apis (fetcher, indexer, generator > > etc) so that they can have more implemenations and making plugins more > > manageable. > > > > Anyway, have fun! > > From a dumb user standpoint, I like the config files. > I want something I can just copy around, version control, and edit with > vi, and don't need > to hire a java developer to configure. > > > What would make life easier for me is if you removed all the XML bs and > just had name/value pairs > with a # comment above it describing what it is for, and the default > setting. > > I do agree with briggs that there are too many seperate places to edit, > and having a single file would be nice. > > regards > Ian > > -- "Conscious decisions by conscious minds are what make reality real" ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
