You need to train with a corpus that is as close as possible as your
runtime corpus. If your runtime corpus is like that I think it is ok.
Otherwise, the model can learn that an entity is too often. Like, there is
an entity in the middle of every window.


2016-08-12 11:35 GMT-03:00 Damiano Porta <damianopo...@gmail.com>:

> Ok, but why not just ignore all the others tokens? i mean... when i write 2
> TOKENS + ENTITY + 2 TOKENS i am interested on finding the entity with this
> surrounding tokens so it should mean that other "cases" can be ignored. No?
>
> Why do i need to write all the other cases when those must be ignored.
>
> 2016-08-12 16:26 GMT+02:00 William Colen <william.co...@gmail.com>:
>
> > You also need examples of what is not entities.
> >
> >
> > 2016-08-12 11:21 GMT-03:00 Damiano Porta <damianopo...@gmail.com>:
> >
> > > Hello everyone,
> > > pardon for the stupid question but i really do not get the point about
> > > training a maxent model with complete sentences.
> > >
> > > For example:
> > >
> > > <START:person> Pierre Vinken <END> , 61 years old , will join the board
> > as
> > > a nonexecutive director Nov. 29 .
> > >
> > > it has ~20 tokens.
> > > As described here:
> > > https://opennlp.apache.org/documentation/1.6.0/manual/
> > > opennlp.html#tools.namefind.training.featuregen
> > > the default window should be 2 tokens on the left and 2 tokens on the
> > right
> > > of the entity. So, what's the point of writing the entire sentence if
> > there
> > > are no other entities ?
> > >
> > > As far i have understood it correctly, it should take into account the
> > > Pierre Vinken (as entity name) and "," "61" as the next 2 tokens. So,
> why
> > > do we need "*years old , will join the board as a nonexecutive*" ?
> > >
> > > Thank you in advance for the clarification!
> > >
> > > Best
> > > Damiano
> > >
> >
>

Reply via email to