Re: Name Finder and chunker training format

Jörn Kottmann Wed, 12 Oct 2011 05:47:34 -0700

On 10/12/11 2:36 PM, Nicolas Hernandez wrote:

Looking at the the Name Finder and the chunker tool, I wonder why they
do not use the same training format?


For exemple, this

Mr.<START:person>  Pierre Vinken<END>  is chairman

may also be represented like this

Mr. NNP O
Pierre NNP B-person
Vinken NNP I-person
is VBZ O
chairman NN O

I have noted that the Name Finder API offers the possibility to custom
the feature generation to consider for the training, but both the Name
Finder and the chunker use the same implementation of the learning
algorithm don't they ?


That has historical reasons, the name finder development was inspired by

the MUC shared tasks, and the chunker development was inspired by theCONLL 2000

shared task.

The implementations are actually different, and the biggest differenceis the way features

are generated. The chunker can use pos tags, and the name finder cannot.

We have plans to use the feature generation framework which was createdfor the name finder

also in the POS tagger and chunker.

Anyway the reasons why we have different components for sequence tagging is

that it makes it easier to integrate them if there is one component pertask.


Everything in OpenNLP uses maxent or perceptron, yes.

Jörn

Re: Name Finder and chunker training format

Reply via email to