Re: [fpc-devel] Re: enumerators

Michael Schnell Thu, 18 Nov 2010 05:16:18 -0800

On 11/18/2010 12:33 AM, Hans-Peter Diettrich wrote:

Separator characters can be assumed as ASCII, so that they can befound by a dumb byte/char scan; only few encodings have to berecognized and handled, based on the char size: MBCS (UTF-8...),WideChars (UTF-16/UCS2) and UTF-32.

In fact I suppose that for UTF-8 ("pure UTF-8" without surrogates) pos()works for all strings and an UTF-8 "character" is a string. It's justnot allowed to use the result of pos() other than in the positionargument of copy() or delete() and to calculate the length argument forcopy() or delete() as a difference between pos() results orLength(String)-values. this makes it hard to extract a single Unicodecharacter from an UTF-8 string, but of course it's easy to create alibrary function that gets a pos() result and - decoding the UTF-8 code- creates an UTF-8 string containing the next Unicode character. (UTF-8coded surrogate pairs may need additional attention)


-Michael
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Re: enumerators

Reply via email to