On 01/26/2014 11:29 PM, Miller, Timothy wrote:
Yes, this fixes the whitespace sentence issue but the evaluation issue
remains. I believe the problem is in SentenceSampleStream, where in the
following block the whitespace trim happens before the <LF> character is
replaced with the \n character. So test sentences that ended with <LF>
will be one character longer than they should be.
> sentence = sentence.trim();
> sentence = replaceNewLineEscapeTags(sentence);
> sentencesString.append(sentence);
> int end = sentencesString.length();
> sentenceSpans.add(new Span(begin, end));
> sentencesString.append(' ');
Yes, that must be the issue. During training the new line is inlucded in
the span, and during
detection the white space remover creates a span without the new line char.
I suggest that the evaluator just ignores white space differences
between sentences. My test case then
has the expected performance numbers.
What do you think?
Anyway, I committed the change. Please give it a try.
Jörn