On 10/6/05, Dawid Weiss <[EMAIL PROTECTED]> wrote: > > > > That would be great, I looked already to the code base in the plug-in > > directory and it seems you use this call to get the clustering results: > > > > controller.query("lingo-nmf-km-3", "pseudo-query", requestParams); > > am I right ? > > > > anyway, I want to have the type of algorithm used for clustering, picked > up > > from the xml file, it should be easy to do so. > > Yes, it is quite easy -- the controller above can be instantiated from > an XML file or from a Beanshell script using a local controller > component (not in the Nutch codebase yet). There are unit tests of that > controller in Carrot2 CVS, but it has been added recently so I didn't > have the time to integrate it in a solid working example.
Hi Dawid, I was able to hack the Clusterer class and made it work for STH, here is my hack ;-) // Clustering component here. LocalComponentFactory stcFactory = new LocalComponentFactoryBase() { public LocalComponent getInstance() { HashMap defaults = new HashMap(); // These are adjustments settings for the clustering algorithm... // You can play with them, but the values below are our 'best guess' // settings that we acquired experimentally. defaults.put("lsi.threshold.clusterAssignment", "0.150"); defaults.put("lsi.threshold.candidateCluster", "0.775"); // TODO: this should be eventually replaced with documents from Nutch // tagged with a language tag. There is no need to again determine // the language of a document. return new STCLocalFilterComponent(); } }; controller.addLocalComponentFactory("filter.lingo-old", stcFactory); } But I have two questions: 1. AHC doesn't have any local filter that implements LocalFilterComponent, RawClusterProducer and so on, how can I achieve that, form a very superficial point of view it seem that nobody uses AHC class?, 2. How do the stopwords and stemmers work for STC ? There is one potential problem that I see -- Nutch plugins require > explicit JAR references. If you want to switch between algorithms you'll > need to either put all Carrot2 JARs in the descriptor, put them in > CLASSPATH before Nutch starts or do some other trickery with class > loading. I just put the stc.jar in the lib directory, I will optimize it later ;-). Cheers, R. I won't be able to help you until next week, but after then I'll try to > find some time to prepare you an example of how the scriptable > controller is used (or look at the unit tests, the component is called > carrot2-local-controller. > > Dawid >