Hi, I would like to work on that now, passing a Factory class name to the CLI tools and saving it to the model as a configuration. Do you still think it is a good idea? Or we should find a better way to load custom feature generator and custom sequence validators? I would like to do it for SentenceDetector and POS Tagger for now.
Thanks, William On Tue, Jun 21, 2011 at 11:58 AM, Jörn Kottmann <[email protected]> wrote: > On 6/14/11 4:23 AM, [email protected] wrote: > >> Hi, >> >> Currently we only have implemented custom feature generators that we can >> pass from command line only for NameFinder, but it would be very nice to >> have it for all tools. >> The Thai sentence detector customization is nice and simple, but to do >> something for other languages the user would need to branch the code. We >> should allow users to pass a factory class name from command line. Maybe >> we >> could do it for every tool that doesn't use sequence feature generator. >> Also >> would be nice to save the factory class name to the model to make sure we >> are using the same feature generator during runtime and evaluation. >> >> What do you think? Maybe you have thought a better solution for that. >> > > The first approach OpenNLP come up with to customize the feature generation > of a component is to simply pass in a context generator. Well, that does > not > really work with the new model packages and the command line. > We never really came up with a solution to this problem or discussed it. > > William suggest that we should use a class name to load a factory class. > And I think we then should also remove the support to pass in a context > generator. > > I believe it is a good way of solving the issue, since the model can than > be used > by an code which integrates OpenNLP and has an additional jar on the > classpath. > That will for example work well with our UIMA integration. > > These models might not be well suited for distribution to a wider group of > people > since they always need the factory class which we cannot put inside the > model because > of security issues. > > For components where we need to adapt the feature generation to a language > I still > suggest that we continue to define default feature generation which is > dependent on > the language, as we already do for thai in the sentence detector. > > Well, I am not yet sure how it should be done for the parser, doccat and > coref. > > Jörn >
