Jérôme Charron wrote:
jar. A short-term solutions could be to move the core classes (which have no
dependencies on
nutch) to a new lib-plugin (lib-lang for instance and adding a dependecy to
this plugin in the
language-identifier), so that this code could be used as a standalone lib.

Are you ok, with such changes?

Perhaps you could isolate ngram specific stuff to own plugin and the lang-id into other.

Or the other option would be (what I implemented some time ago) something like this (as ngram categorizer can also used for other
interesting stuff):

new package in core nutch containing classes like:

NGramProfile - pretty much as is
Categorizer - generic configurable ngram categorizer, configure profiles, ngram sizes etc.
CategorizerFactory - to get hold of different categorizers

In LangId plugin you just get a correct ( configured to use lang ngram profiles and predefined settings for ngramsizes etc ) categorizer from factory and tell it to do it's job when needed.

--
 Sami Siren


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to