I think that if you share some (preliminar/broken or working) code you're actually writing and:
1) Expected results 2) Actual results Could be useful to start diagnosing your problem. IMHO, there's nothing more specific than the actual test code ;) Regards, Roman On Thu, Aug 7, 2008 at 12:07 PM, Rick Moynihan <[EMAIL PROTECTED]> wrote: > I first posted this to Nutch-Dev, but had no response; so I'm reposting it > here. If you've already seen it, apologies for the dupe. > > Hi all, > > A colleague I have been working with has developed a plugin to index > content with Nutch. And though it does the job admirably, the > complexity and design of Nutch has proven resistent to easily writing > automated tests for this component. > > I'm desperately trying to write some JUnit unit/integration tests for > this component, however Nutch doesn't make this simple enough, and I > fear this amongst other things is a barrier to Nutch adoption. > > What I want to do is: > > - Setup a Jetty server within the test with the content I want to index > (easy enough with CrawlDBTestUtil) > - Configure a crawl (i.e. fetch, index, merge, dedup etc...) and > override the configuration with my plugin and configuration. > - Store the index (preferably in memory, but on the disk is ok). > - assert that particular searches return items etc... > > > At first I thought this would be a simple matter of using > CrawlDBTestUtil to establish the server side, then using > org.apache.nutch.crawl.Crawl to perform all the relevant steps resulting > in an index of the content, which I can then run assertions on via > NutchBean. > > Ideally I'd like to create just one Configuration object, override the > settings as I wish, and then pass this object into Crawl and NutchBean > appropriately. > > Sadly however org.apache.nutch.crawl.Crawl isn't really a class, as it > really only has a static main method which performs all the operations > in batch. This design makes the class hard to reuse within the context > of my test. This leaves me with the following options: > > - call the main method and pass it an ugly array of Strings to do what I > require. This is ugly due also to assumptions underlying the design of > this component (configuration files on the classpath etc...) Also it > allows little or no reuse of configuration with other parts of the code > (e.g. NutchBean). > > - Copy/Paste/Modify Crawl into my test. The code in Crawl recently > changed to account for hadoop 0.17, so I don't really want to do this > only to find the API changes. Plus I believe that tests should be > simple to read. Explicitly performing 30 steps in order to test a > component isn't a good idea, as it hides the forest for the trees. > > CrawlDBTestUtil is a step in the right direction, but more work is > needed. Is it possible to get this marked as a bug/feature-request and > fixed in time for 1.0? > > Thanks again for your help. > > R. > > > >
