Mattias Gärtner wrote / napísal(a):
The character sets in synedit are 'set of char', which means only 8bit.
So, I guess the patch tries to fix an ANSI codepage accented chars problem,
right?
The fix is probably useless on other codepages including UTF-8, right?
Not as such. The problem is two fold.
1. If we ignore encoding (eg: just work in ansi space), then the old
style was simply plain wrong. It only allowed alpha (not num) chars, and
worked on the principle of "what's not alpha, isn't a word".
2. If we also consider UTF-8 encoded content, then getting words by
boundaries (eg: not-allowed chars) and not by allowed-chars means that
as long as given boundaries and whitespaces are < 127 (which the default
ones are), UTF-8 words will be parsed right, even if they contain
special multibyte chars.
I'm not sure if #2 applies also to some other encoding.
Ales
Mattias
_________________________________________________________________
To unsubscribe: mail [EMAIL PROTECTED] with
"unsubscribe" as the Subject
archives at http://www.lazarus.freepascal.org/mailarchives
_________________________________________________________________
To unsubscribe: mail [EMAIL PROTECTED] with
"unsubscribe" as the Subject
archives at http://www.lazarus.freepascal.org/mailarchives