Re: [fpc-devel] Unicode and UTF8String

Martin Friebe Mon, 01 Dec 2008 07:27:33 -0800

Marco van de Voort wrote:

In our previous episode, Martin Friebe said:
Of course they are still there, to be used in the few parts of yourcode, where you specialize on whatever string type you deal with.But otherwise, using RTLString IMHO will abandon this part of pascalsyntax.
It removes ASCII legacy. I don't see you complaining about the fact that
char is not 8 bit anymore, and that that abandons that part of the pascal
syntax.

It does not abandon the syntax. It only adds to it's meaning (*adds*,any existing meaning is unaltered.).

I can still do: foo[1] for *any* type of string. (well yes evenRTLstring, but see below)- If string happens to be an old ascii string, that still works as italways has- If string happens to be any unicode => that is still the same syntax,but with a new meaning.The new meaning doe snot break anything, because it only applies tonew types.It is usable too, because I know, I am dealing with codepoints, or subcode points. And I know how they look, and how to identify them

The introduction of RTLString is fine. I do say it is a good thing.RTLString does not interfere with the above. In fact even for RTLstringthe syntax foo[1] does exist. Just it is not useful. If I tread it asutf8 sub code point, I can be wrong. If I tread it as ascii, I can bewrong. If I tread it as UTF16 I can be wrong

My argument was not against RTLString. However it was my understandingthat RTL functions will "enforce" RTLString. That they will only existfor RTLString, and they will *not* exist for other string types.That I would call enforcing RTLString, because of penalties on usingother string types.

I acknowledge, that if the end result of calling the RTL function, is anOS call, the conversation/penalty is always there. But not every RTLfunction ends up in an OS call.

I admit that the Problem started (and that has been discussed more thanenough) starts with UTF8string (yes even with utf16 string). But in thiscase those functions became a new, but predictable meaning. I can doutf8string[1], and I can use the result. Only I have to be aware what itmeans.
Yes. As widestring[1] also requires interpretation. That's unicode.

See above: Yes it requires interpretation. But it allows me to do so

I can not see how I can interpret RtlString[1]. If the result is biggerthan 128, then I must know what type it is. If it is ANSI, it is asingle byte char. If it is utf8, it is a sub-codepoint which will bepart of a codepoint.If it is widestring, well yes, here breaks my assumption thatRtlString[1] returns a byte.... ouch

I can *not* do rtlString[1], as at the time of code writing I can not beaware what it means.
You don't have to. You carry it around as long as you can, and when you
don't can, you assign it to your type of choice and bite the penalty.

As I said in another mail. Every programmer starts as a beginner. Andfor many of those this is the last thing to think about.


Best Regards
Martin

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicode and UTF8String

Reply via email to