On 10/12/11 2:36 PM, Nicolas Hernandez wrote:
Looking at the the Name Finder and the chunker tool, I wonder why they
do not use the same training format?
For exemple, this
Mr.<START:person> Pierre Vinken<END> is chairman
may also be represented like this
Mr. NNP O
Pierre NNP B-person
Vinken NNP I-person
is VBZ O
chairman NN O
I have noted that the Name Finder API offers the possibility to custom
the feature generation to consider for the training, but both the Name
Finder and the chunker use the same implementation of the learning
algorithm don't they ?
That has historical reasons, the name finder development was inspired by
the MUC shared tasks, and the chunker development was inspired by the
CONLL 2000
shared task.
The implementations are actually different, and the biggest difference
is the way features
are generated. The chunker can use pos tags, and the name finder cannot.
We have plans to use the feature generation framework which was created
for the name finder
also in the POS tagger and chunker.
Anyway the reasons why we have different components for sequence tagging is
that it makes it easier to integrate them if there is one component per
task.
Everything in OpenNLP uses maxent or perceptron, yes.
Jörn