On 16/09/2011 00:03, cobines wrote:
2011/9/15 Hans-Peter Diettrich<drdiettri...@aol.com>:
cobines schrieb:
When doing:
MyChar := MyString[1]

appropriate function retrieves first unicode character, regardless of
encoding.
This is just wrong :-(

MyString[1] accesses the first element of the *physical* character array,
regardless of any encoding. Also Length returns the array size, not the
number of *logical* characters in it.
Right. My point was if I come from Ansi knowing MyString[1] retrieves
first character and know nothing about Unicode, I might still think it
continues to retrieve first character in Unicode regardless of string
encoding (RTL handles that). It is as you say wrong, therefore the
need to adapt the code by developer if he uses such access, but people
might don't know this. Having UTF-16 RTL might help them in a sense
they they will never have to learn, until they deal with characters
outside of the BMP.

Which means they will have to learn it immediately.

That is of course, unless the application does not have any user input at all. As soon as an text input from a user is processed, never mind what language the user speaks => this user may for some reason enter text, where string[x] will return half a char/surrogate.

If it was utf8, the developer would probably encounter the error fairly soon, and learn before creating tons of wring code. with utf16, the developer may get away for many month, creating tons of code, that he needs to correct.

there are use cases where utf16 beats utf8 and vice versa.

but the argument "easier to learn" is a fake. Trying to hide a problem, and hoping it will not surface, has never been a good idea, why should it here?

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to