On Sun, Dec 4, 2011 at 12:58 PM, Olivier Grisel <[email protected]>wrote:

> 2011/12/4 Erel Segal <[email protected]>:
> > Hello,
> >
> > I am trying to use NameFinder for a project of detecting organization
> names
> > in Wikipedia. I use the *en-ner-organization.bin* model that I downloaded
> > from the site, and I get strange results. Can you please help me
> understand
> > these results, and what I should do to correct them?
> >
> > I ran NameFinder on two sentences from this page:
> > http://en.wikipedia.org/wiki/Air_France
> >
> > The first sentence was: "*For its Première cabin , Air France 's first
> > class menu is designed by Guy Martin , chef of Le Grand Vefour , a
> Michelin
> > three-star restaurant in Paris .*" For this sentence, the name finder
> > correctly tagged "*Air France*" as an organization.
> >
> > However, the second sentence was "*The following day , Air France was
> > further instructed to share African routes with Air Afrique and UAT .*".
> > For this, the name finder tagged only "*Air*" as an organization.
> >
> > This seems strange as the two contexts seem similar. How can you explain
> > this?
>
> In the first example the "'s" possessive marker must be a strong clue
> that the token right before it must be some kind of named entity
> (hence organization in that case) whereas this clue is missing in the
> second example. But as OpenNLP NER models are statistical models with
> many dimensions (features) it's hard to pick any single reason to
> explain individual failures.
>
> Anyway the only way to fix this, it to retrain the model on a larger
> annotated dataset than the one used initially to build the default
> model or to come up with better features (the only way to tell is by
> evaluating them on an annotated corpus). Both task require significant
> investments unfortunately.
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>



OK, well, how can I find the original corpus that was used to create this
model? I would like to use it as a basis for training a new model.

Thanks,

Erel

Reply via email to