On 6/14/11 4:23 AM, [email protected] wrote:
Hi,
Currently we only have implemented custom feature generators that we can
pass from command line only for NameFinder, but it would be very nice to
have it for all tools.
The Thai sentence detector customization is nice and simple, but to do
something for other languages the user would need to branch the code. We
should allow users to pass a factory class name from command line. Maybe we
could do it for every tool that doesn't use sequence feature generator. Also
would be nice to save the factory class name to the model to make sure we
are using the same feature generator during runtime and evaluation.
What do you think? Maybe you have thought a better solution for that.
The first approach OpenNLP come up with to customize the feature generation
of a component is to simply pass in a context generator. Well, that does not
really work with the new model packages and the command line.
We never really came up with a solution to this problem or discussed it.
William suggest that we should use a class name to load a factory class.
And I think we then should also remove the support to pass in a context
generator.
I believe it is a good way of solving the issue, since the model can
than be used
by an code which integrates OpenNLP and has an additional jar on the
classpath.
That will for example work well with our UIMA integration.
These models might not be well suited for distribution to a wider group
of people
since they always need the factory class which we cannot put inside the
model because
of security issues.
For components where we need to adapt the feature generation to a
language I still
suggest that we continue to define default feature generation which is
dependent on
the language, as we already do for thai in the sentence detector.
Well, I am not yet sure how it should be done for the parser, doccat and
coref.
Jörn