The non-entity tokens are not ignored, they server as negative examples and as 
context.

A machine learning algorithm learns on positive and negative examples.

It also learns on context, so e.g. it learns e.g. that inside a PERSON
entity appears in surroundings like "My name is PERSON ." or "I gave PERON a 
present".

Without negative examples and without context, you cannot learn.
Then you could also simply use a look up words in a word list, e.g.
a list of  names.

Cheers,

-- Richard

> On 12.08.2016, at 16:35, Damiano Porta <damianopo...@gmail.com> wrote:
> 
> Ok, but why not just ignore all the others tokens? i mean... when i write 2
> TOKENS + ENTITY + 2 TOKENS i am interested on finding the entity with this
> surrounding tokens so it should mean that other "cases" can be ignored. No?
> 
> Why do i need to write all the other cases when those must be ignored.
> 
> 2016-08-12 16:26 GMT+02:00 William Colen <william.co...@gmail.com>:
> 
>> You also need examples of what is not entities.
>> 
>> 
>> 2016-08-12 11:21 GMT-03:00 Damiano Porta <damianopo...@gmail.com>:
>> 
>>> Hello everyone,
>>> pardon for the stupid question but i really do not get the point about
>>> training a maxent model with complete sentences.
>>> 
>>> For example:
>>> 
>>> <START:person> Pierre Vinken <END> , 61 years old , will join the board
>> as
>>> a nonexecutive director Nov. 29 .
>>> 
>>> it has ~20 tokens.
>>> As described here:
>>> https://opennlp.apache.org/documentation/1.6.0/manual/
>>> opennlp.html#tools.namefind.training.featuregen
>>> the default window should be 2 tokens on the left and 2 tokens on the
>> right
>>> of the entity. So, what's the point of writing the entire sentence if
>> there
>>> are no other entities ?
>>> 
>>> As far i have understood it correctly, it should take into account the
>>> Pierre Vinken (as entity name) and "," "61" as the next 2 tokens. So, why
>>> do we need "*years old , will join the board as a nonexecutive*" ?
>>> 
>>> Thank you in advance for the clarification!
>>> 
>>> Best
>>> Damiano

Reply via email to