Mike Unwalla <m...@techscribe.co.uk> wrote:
> I agree with Purodha. Do not be 'smart'. Do not change the meaning of a 
> regexp.
>
> Regards,
>
> Mike Unwalla


OK. It looks like the majority does not want to pre-processs the sentence
to remove consecutive spaces (including tabs, dos/unix new lines, form
feeds, vertical space, non breaking space) before matching the regexp.
So I will go with that.

On the other other hand, nobody indicates how to avoid the regression. A
line break for example in between words, typically doesn't happens in
LIbreOffice documents or in our tests, but often happens in text files. In
emails, line breaks are used to avoid lines longer than ~80 char. Taking the
German rule GIRLS_DAY for example, it will now fail to match when "girl's
day" is on a broken line as in this sentence. I see this as a severe
regression.

I suppose that I care more than most because I only use LT to check
text files where the situation is frequent.

For grammar.xml files that I maintain (br, eo, fr), I will use \s+ or even
[\sxA0]+ in the regexp to make it work.  But I can change later if
another solution is decided.

Regards
Dominique

------------------------------------------------------------------------------
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to