Hi,

I would like to work on that now, passing a Factory class name to the CLI
tools and saving it to the model as a configuration.
Do you still think it is a good idea? Or we should find a better way to
load custom feature generator and custom sequence validators? I would like
to do it for SentenceDetector and POS Tagger for now.

Thanks,
William

On Tue, Jun 21, 2011 at 11:58 AM, Jörn Kottmann <[email protected]> wrote:

> On 6/14/11 4:23 AM, [email protected] wrote:
>
>> Hi,
>>
>> Currently we only have implemented custom feature generators that we can
>> pass from command line only for NameFinder, but it would be very nice to
>> have it for all tools.
>> The Thai sentence detector customization is nice and simple, but to do
>> something for other languages the user would need to branch the code. We
>> should allow users to pass a factory class name from command line. Maybe
>> we
>> could do it for every tool that doesn't use sequence feature generator.
>> Also
>> would be nice to save the factory class name to the model to make sure we
>> are using the same feature generator during runtime and evaluation.
>>
>> What do you think? Maybe you have thought a better solution for that.
>>
>
> The first approach OpenNLP come up with to customize the feature generation
> of a component is to simply pass in a context generator. Well, that does
> not
> really work with the new model packages and the command line.
> We never really came up with a solution to this problem or discussed it.
>
> William suggest that we should use a class name to load a factory class.
> And I think we then should also remove the support to pass in a context
> generator.
>
> I believe it is a good way of solving the issue, since the model can than
> be used
> by an code which integrates OpenNLP and has an additional jar on the
> classpath.
> That will for example work well with our UIMA integration.
>
> These models might not be well suited for distribution to a wider group of
> people
> since they always need the factory class which we cannot put inside the
> model because
> of security issues.
>
> For components where we need to adapt the feature generation to a language
> I still
> suggest that we continue to define default feature generation which is
> dependent on
> the language, as we already do for thai in the sentence detector.
>
> Well, I am not yet sure how it should be done for the parser, doccat and
> coref.
>
> Jörn
>

Reply via email to