Hi Damiano,
Why are you so sure that your model with not work? A couple of things to
remember, 1. you need quite a bit of training data. Two sentences does not
make a training set. 2. You probably need more than a window of words as your
features. However, you can see that word-2=“name" and word-1=“is” tend to
precede a name. Look into other potential features and get a larger dataset
and your results may surprise you.
Daniel
On May 1, 2016, at 3:13 PM, Jeffrey Zemerick
<[email protected]<mailto:[email protected]>> wrote:
I'm sure the others on this list can give you a more complete answer so I
will try to not lead you astray.
The WindowFeatureGenerator is only one of the available feature generators.
There are many classes that implement the AdaptiveFeatureGenerator
interface [1] and you can, of course, provide your own implementation of
that interface to support additional features. For example, the
SentenceFeatureGenerator [2] looks at the beginning and end of each
training sentence. So to answer your question, the length of the training
sentence should not matter - what matters is if the combination of
configured feature generators used can provide a model that accurately
describes the training text.
Jeff
[1]
https://opennlp.apache.org/documentation/1.5.3/apidocs/opennlp-tools/opennlp/tools/util/featuregen/AdaptiveFeatureGenerator.html
[2]
https://opennlp.apache.org/documentation/1.5.3/apidocs/opennlp-tools/opennlp/tools/util/featuregen/SentenceFeatureGenerator.html
On Sun, May 1, 2016 at 12:02 PM, Damiano Porta <[email protected]>
wrote:
Hi Jeff!
Thank you so much for your fast reply.
I have a doubt, let suppose we use this feature with a window of:
2 tokens on the left + *ENTITY* + 2 tokens on the right
The doubt is how can i train the model correctly?
if only the previous 2 tokens and the next 2 tokens matters i should not
use long sentences to training the model. Right?
For example (person-model.train):
1. I am <START:person> Barack <END> and I am the president of USA
2. My name is <START:person> Barack <END> and my surname is Obama
...
Those are two stupid training samples, it is just to let you know my doubt.
In this case i should have:
*I am Barack and I*
*name is Barack and my*
the others tokens (left and right) do not matter. So the sentences on my
training set should be very short, right? Basically I should only define
all the "combinations" of the previous/next 2 tokens, right?
Thank you!
Damiano
2016-05-01 16:07 GMT+02:00 Jeffrey Zemerick <[email protected]>:
I think you are looking for the WindowFeatureGenerator [1]. You can set
the
size of the window by specifying the number of previous tokens and number
of next tokens.
Jeff
[1]
https://opennlp.apache.org/documentation/1.5.3/apidocs/opennlp-tools/opennlp/tools/util/featuregen/WindowFeatureGenerator.html
On Sun, May 1, 2016 at 5:16 AM, Damiano Porta <[email protected]>
wrote:
Hello everybody
How many surrounding tokens are kept into account to find the entity
using
a maxent model?
Basically a maxent model should detect an entity looking at the
surronding
tokens, right ?
I would like to understand if:
1. can i set the number of tokens on the left side?
2. can i set the number of tokens on the right side too ?
Thank you in advance for the clarification
Best
Damiano