Re: Why are you using complete sentences to train a model?
oh, i forgot a thing. does the order of the surrounding tokens matter? I mean if i train: my name is PERSON when it searches the entity does it will "exactly" match "name is" or if i write "is name" is the same thing? (or maybe i need to write the "negative" version of it) 2016-08-12 16:51 GMT+02:00 Damiano Porta: > Ok thank you so much guys! > > 2016-08-12 16:43 GMT+02:00 William Colen : > >> You need to train with a corpus that is as close as possible as your >> runtime corpus. If your runtime corpus is like that I think it is ok. >> Otherwise, the model can learn that an entity is too often. Like, there is >> an entity in the middle of every window. >> >> >> 2016-08-12 11:35 GMT-03:00 Damiano Porta : >> >> > Ok, but why not just ignore all the others tokens? i mean... when i >> write 2 >> > TOKENS + ENTITY + 2 TOKENS i am interested on finding the entity with >> this >> > surrounding tokens so it should mean that other "cases" can be ignored. >> No? >> > >> > Why do i need to write all the other cases when those must be ignored. >> > >> > 2016-08-12 16:26 GMT+02:00 William Colen : >> > >> > > You also need examples of what is not entities. >> > > >> > > >> > > 2016-08-12 11:21 GMT-03:00 Damiano Porta : >> > > >> > > > Hello everyone, >> > > > pardon for the stupid question but i really do not get the point >> about >> > > > training a maxent model with complete sentences. >> > > > >> > > > For example: >> > > > >> > > > Pierre Vinken , 61 years old , will join the >> board >> > > as >> > > > a nonexecutive director Nov. 29 . >> > > > >> > > > it has ~20 tokens. >> > > > As described here: >> > > > https://opennlp.apache.org/documentation/1.6.0/manual/ >> > > > opennlp.html#tools.namefind.training.featuregen >> > > > the default window should be 2 tokens on the left and 2 tokens on >> the >> > > right >> > > > of the entity. So, what's the point of writing the entire sentence >> if >> > > there >> > > > are no other entities ? >> > > > >> > > > As far i have understood it correctly, it should take into account >> the >> > > > Pierre Vinken (as entity name) and "," "61" as the next 2 tokens. >> So, >> > why >> > > > do we need "*years old , will join the board as a nonexecutive*" ? >> > > > >> > > > Thank you in advance for the clarification! >> > > > >> > > > Best >> > > > Damiano >> > > > >> > > >> > >> > >
Re: Why are you using complete sentences to train a model?
Ok thank you so much guys! 2016-08-12 16:43 GMT+02:00 William Colen: > You need to train with a corpus that is as close as possible as your > runtime corpus. If your runtime corpus is like that I think it is ok. > Otherwise, the model can learn that an entity is too often. Like, there is > an entity in the middle of every window. > > > 2016-08-12 11:35 GMT-03:00 Damiano Porta : > > > Ok, but why not just ignore all the others tokens? i mean... when i > write 2 > > TOKENS + ENTITY + 2 TOKENS i am interested on finding the entity with > this > > surrounding tokens so it should mean that other "cases" can be ignored. > No? > > > > Why do i need to write all the other cases when those must be ignored. > > > > 2016-08-12 16:26 GMT+02:00 William Colen : > > > > > You also need examples of what is not entities. > > > > > > > > > 2016-08-12 11:21 GMT-03:00 Damiano Porta : > > > > > > > Hello everyone, > > > > pardon for the stupid question but i really do not get the point > about > > > > training a maxent model with complete sentences. > > > > > > > > For example: > > > > > > > > Pierre Vinken , 61 years old , will join the > board > > > as > > > > a nonexecutive director Nov. 29 . > > > > > > > > it has ~20 tokens. > > > > As described here: > > > > https://opennlp.apache.org/documentation/1.6.0/manual/ > > > > opennlp.html#tools.namefind.training.featuregen > > > > the default window should be 2 tokens on the left and 2 tokens on the > > > right > > > > of the entity. So, what's the point of writing the entire sentence if > > > there > > > > are no other entities ? > > > > > > > > As far i have understood it correctly, it should take into account > the > > > > Pierre Vinken (as entity name) and "," "61" as the next 2 tokens. So, > > why > > > > do we need "*years old , will join the board as a nonexecutive*" ? > > > > > > > > Thank you in advance for the clarification! > > > > > > > > Best > > > > Damiano > > > > > > > > > >
Re: Why are you using complete sentences to train a model?
You need to train with a corpus that is as close as possible as your runtime corpus. If your runtime corpus is like that I think it is ok. Otherwise, the model can learn that an entity is too often. Like, there is an entity in the middle of every window. 2016-08-12 11:35 GMT-03:00 Damiano Porta: > Ok, but why not just ignore all the others tokens? i mean... when i write 2 > TOKENS + ENTITY + 2 TOKENS i am interested on finding the entity with this > surrounding tokens so it should mean that other "cases" can be ignored. No? > > Why do i need to write all the other cases when those must be ignored. > > 2016-08-12 16:26 GMT+02:00 William Colen : > > > You also need examples of what is not entities. > > > > > > 2016-08-12 11:21 GMT-03:00 Damiano Porta : > > > > > Hello everyone, > > > pardon for the stupid question but i really do not get the point about > > > training a maxent model with complete sentences. > > > > > > For example: > > > > > > Pierre Vinken , 61 years old , will join the board > > as > > > a nonexecutive director Nov. 29 . > > > > > > it has ~20 tokens. > > > As described here: > > > https://opennlp.apache.org/documentation/1.6.0/manual/ > > > opennlp.html#tools.namefind.training.featuregen > > > the default window should be 2 tokens on the left and 2 tokens on the > > right > > > of the entity. So, what's the point of writing the entire sentence if > > there > > > are no other entities ? > > > > > > As far i have understood it correctly, it should take into account the > > > Pierre Vinken (as entity name) and "," "61" as the next 2 tokens. So, > why > > > do we need "*years old , will join the board as a nonexecutive*" ? > > > > > > Thank you in advance for the clarification! > > > > > > Best > > > Damiano > > > > > >
Re: Why are you using complete sentences to train a model?
You also need examples of what is not entities. 2016-08-12 11:21 GMT-03:00 Damiano Porta: > Hello everyone, > pardon for the stupid question but i really do not get the point about > training a maxent model with complete sentences. > > For example: > > Pierre Vinken , 61 years old , will join the board as > a nonexecutive director Nov. 29 . > > it has ~20 tokens. > As described here: > https://opennlp.apache.org/documentation/1.6.0/manual/ > opennlp.html#tools.namefind.training.featuregen > the default window should be 2 tokens on the left and 2 tokens on the right > of the entity. So, what's the point of writing the entire sentence if there > are no other entities ? > > As far i have understood it correctly, it should take into account the > Pierre Vinken (as entity name) and "," "61" as the next 2 tokens. So, why > do we need "*years old , will join the board as a nonexecutive*" ? > > Thank you in advance for the clarification! > > Best > Damiano >
Why are you using complete sentences to train a model?
Hello everyone, pardon for the stupid question but i really do not get the point about training a maxent model with complete sentences. For example: Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 . it has ~20 tokens. As described here: https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.featuregen the default window should be 2 tokens on the left and 2 tokens on the right of the entity. So, what's the point of writing the entire sentence if there are no other entities ? As far i have understood it correctly, it should take into account the Pierre Vinken (as entity name) and "," "61" as the next 2 tokens. So, why do we need "*years old , will join the board as a nonexecutive*" ? Thank you in advance for the clarification! Best Damiano