On 1/22/11 7:22 AM, Khurram wrote:
How can i train SentenceDectectorME so that it does not treat dates written
like mm.dd.yyyy. as end of sentence. i tried giving a few examples in
sentences.txt and re-running the test but it always seem to treat the dots
as end of sentence...
To be able to better help you, we need to know which language you want to
train the sentence detector for.
To get good results you should try training it with a few thousand
sentences,
the few lines in our regression test data is not enough to produce a
model that
can be used.
is there a least number of time the training model has to see the pattern as
a word within sentence before it learns that the dots are not indicators of
end of sentence.
There is a cutoff which has a default of 5, so every feature which
should be part
of the model must been seen at least as often as the cutoff value.
Depending on your language we might be able to point you to training data.
There is also a bit documentation about the sentence detector in our
opennlp-docs project,
if you think something is missing there we would really appreciate to
receive a patch
for it.
Jörn