On 10/6/05, Dawid Weiss <[EMAIL PROTECTED]> wrote:
>
>
> > That would be great, I looked already to the code base in the plug-in
> > directory and it seems you use this call to get the clustering results:
> >
> > controller.query("lingo-nmf-km-3", "pseudo-query", requestParams);
> > am I right ?
> >
> > anyway, I want to have the type of algorithm used for clustering, picked
> up
> > from the xml file, it should be easy to do so.
>
> Yes, it is quite easy -- the controller above can be instantiated from
> an XML file or from a Beanshell script using a local controller
> component (not in the Nutch codebase yet). There are unit tests of that
> controller in Carrot2 CVS, but it has been added recently so I didn't
> have the time to integrate it in a solid working example.


Hi Dawid,

I was able to hack the Clusterer class and made it work for STH, here is my
hack ;-)

// Clustering component here.
LocalComponentFactory stcFactory = new LocalComponentFactoryBase() {
public LocalComponent getInstance() {
HashMap defaults = new HashMap();

// These are adjustments settings for the clustering algorithm...
// You can play with them, but the values below are our 'best guess'
// settings that we acquired experimentally.
defaults.put("lsi.threshold.clusterAssignment", "0.150");
defaults.put("lsi.threshold.candidateCluster", "0.775");

// TODO: this should be eventually replaced with documents from Nutch
// tagged with a language tag. There is no need to again determine
// the language of a document.
 return new STCLocalFilterComponent();

}
};
controller.addLocalComponentFactory("filter.lingo-old", stcFactory);
}

But I have two questions:

1. AHC doesn't have any local filter that implements LocalFilterComponent,
RawClusterProducer and so on, how can I achieve that, form a very
superficial point of view it seem that nobody uses AHC class?,
2. How do the stopwords and stemmers work for STC ?


There is one potential problem that I see -- Nutch plugins require
> explicit JAR references. If you want to switch between algorithms you'll
> need to either put all Carrot2 JARs in the descriptor, put them in
> CLASSPATH before Nutch starts or do some other trickery with class
> loading.


I just put the stc.jar in the lib directory, I will optimize it later ;-).

Cheers,
R.

I won't be able to help you until next week, but after then I'll try to
> find some time to prepare you an example of how the scriptable
> controller is used (or look at the unit tests, the component is called
> carrot2-local-controller.
>
> Dawid
>

Reply via email to