Mike Unwalla <m...@techscribe.co.uk> wrote: > I agree with Purodha. Do not be 'smart'. Do not change the meaning of a > regexp. > > Regards, > > Mike Unwalla
OK. It looks like the majority does not want to pre-processs the sentence to remove consecutive spaces (including tabs, dos/unix new lines, form feeds, vertical space, non breaking space) before matching the regexp. So I will go with that. On the other other hand, nobody indicates how to avoid the regression. A line break for example in between words, typically doesn't happens in LIbreOffice documents or in our tests, but often happens in text files. In emails, line breaks are used to avoid lines longer than ~80 char. Taking the German rule GIRLS_DAY for example, it will now fail to match when "girl's day" is on a broken line as in this sentence. I see this as a severe regression. I suppose that I care more than most because I only use LT to check text files where the situation is frequent. For grammar.xml files that I maintain (br, eo, fr), I will use \s+ or even [\sxA0]+ in the regexp to make it work. But I can change later if another solution is decided. Regards Dominique ------------------------------------------------------------------------------ _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel