On 01/26/2014 09:59 AM, Jörn Kottmann wrote:
>
> The evaluation should ignore white spaces. I committed now my fix, it 
> would be nice if you can
> test it.
>
> There might be still something wrong. In my test data I replaced all 
> question marks with white spaces, and the result
> is slightly worse than with the original data.
>
> Jörn
Yes, this fixes the whitespace sentence issue but the evaluation issue
remains. I believe the problem is in SentenceSampleStream, where in the
following block the whitespace trim happens before the <LF> character is
replaced with the \n character. So test sentences that ended with <LF>
will be one character longer than they should be.

>       sentence = sentence.trim();
>       sentence = replaceNewLineEscapeTags(sentence);
>       sentencesString.append(sentence);
>       int end = sentencesString.length();
>       sentenceSpans.add(new Span(begin, end));
>       sentencesString.append(' ');

Reply via email to