Re: [fpc-devel] Unicode and UTF8String
In our previous episode, Felipe Monteiro de Carvalho said: On Mon, Dec 1, 2008 at 7:33 PM, Martin Friebe [EMAIL PROTECTED] wrote: I suggested to have a rtl, that has overloaded functions for each string type. of course that sounds easier than in fact it will be. This is about the same as having all string routines in 3 flavours: RTLString, utf-8 and utf-16 the utf-8 and utf-16 could be done by assigning rtlstring to the adequate type. I think this is probably what we will end up with, because users of a particular encoding will build convenience routines for their favorite RTL routines. Yes, for the core routines. It is nuts to make stuff like scandatetime in two different encodings. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
I never suggested the RTL to be in a fixed encoding. I fully agree that this would be far worse. I suppose there are (quite decently workable) solutions for this. Either the RTL (and LCL, FWIW) comes in multiple versions that are used as appropriate (user selectable and/or automatically selected), or a string type is used that knows about it's internal coding and conversions are dynamically done when appropriate. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
This is about the same as having all string routines in 3 flavours: RTLString, utf-8 and utf-16 What about (real) ANSIString (OS/locale based coded) ? This needs to be allowed as the program might need to read such files. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Felipe Monteiro de Carvalho schrieb: On Mon, Dec 1, 2008 at 8:27 PM, Mattias Gaertner [EMAIL PROTECTED] wrote: I don't see, how a TLCLStrings will *not* break Delphi and Lazarus compatibility. Maybe you can give some more details, how it should work. It was just a initial idea. I now see that TStrings could be improved. Maybe we should make such classes simply generics: using wrappers as the map and list class already do, the size impact shouldn't be that big. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
For me, these attempts to make compiler do everything automatically sound like getting yet another typing saver. Maybe I am just being lazy, but it's not a typing saver but regarding the previous not-Unicode aware versions it's more a preventer of a typing enhancer :) . OTOH it's not just the typing but to work with commonly used things that just work in other programming systems (including previous versions of FPC/Lazarus) - like doing a case of a character type - the user programmer needs to learn about the internal encoding of Unicode text. I think this should be avoided. Pascal has been a great language for programming newcomers up till now. Simple things - like characters and strings - should just work (unless you explicitly need extended handling). I don't suggest that there is a simple solution for this (other than not doing Unicode at all) but it's worth discussing. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
I now understand that GB2312 and JIS 0213 in fact are the ANSI code pages 936 and 932. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Zitat von Michael Schnell [EMAIL PROTECTED]: For me, these attempts to make compiler do everything automatically sound like getting yet another typing saver. Maybe I am just being lazy, but it's not a typing saver but regarding the previous not-Unicode aware versions it's more a preventer of a typing enhancer :) . OTOH it's not just the typing but to work with commonly used things that just work in other programming systems (including previous versions of FPC/Lazarus) - like doing a case of a character type - ... and some things that just don't work like i18n. the user programmer needs to learn about the internal encoding of Unicode text. I think this should be avoided. Tell the unicode consortium. My guess: they know already. Pascal has been a great language for programming newcomers up till now. Simple things - like characters and strings - should just work (unless you explicitly need extended handling). I don't suggest that there is a simple solution for this (other than not doing Unicode at all) but it's worth discussing. IMHO it has been already discussed too often. Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
IMHO it has been already discussed too often. I did not start it and only 1% of the contributions are mine - and yours -, so quite obviously there is a decent common wish for a solution of what is percept as a problem. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
avoids automatic conversion between types as much as possible. I feel that it's a goody of a strongly typed language that automatic type conversions can be done by creating the appropriate code statically instead of having this embedded in the objects as with variants. If doing a simple assignment a := b; types are either converted appropriately or a compiler error is generated. All integer and real types are converted automatically. If you try to do myInteger := myString; you get a compiler error. But if you do myANSIString := myUTF8String; the compiler generates an assignment without a conversion, even though the types are provided by the system (not by the user) and named according to the possible internal coding. (We don't need to discuss why this is like that, the discussion is about if it should stay this way.) -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
The type is called ansistring simply for backwards compatibility. You could start arguing that everything should be intuitive. Take C for example. What does the operator tell you about what it does? Shouldn't it have a intuitive form? But in the end this is how the language is and this is a useless discussion. I don't think that all C programmers will rewrite their code anymore then pascal programmers will rewrite theirs so that you can find a better name for ansistring. -- Felipe Monteiro de Carvalho ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Don't forget that the ansistring type is actually multiple encodings and even multi byte (even not considering UTF-8). The point is: nobody took care of it. IMHO a major confusion is generated by calling a string that is supposed to hold UTF8 data ANSIString. This never should have happened ! If the Unicode support requires that there are strings that hold ANSI code and those that hold UTF8 code they should be denoted correctly as ANSIString and UTF8String. Storing ÚTF8 in an ANSIString is a sin :). IMHO, not providing automatic conversion between these type is a major shortcoming of the compiler/RTL and if it not does so, it should not provide the types. (Which does not mean that providing the (best possible) automatic conversion between these type solves all problems !) -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Mattias Gärtner schrieb: Zitat von Florian Klaempfl [EMAIL PROTECTED]: Mattias Gaertner schrieb: You can optimize for one encoding or optimize for one per platform. I know how to optimize for widestrings, for ansistring and for UTF-8 strings, but I have no experience in optimizing for multiple encodings. Don't forget that the ansistring type is actually multiple encodings and even multi byte (even not considering UTF-8). The point is: nobody took care of it. Yes, they did. They ran their programs only on systems with ansi encoded strings or simply passed the strings unchanged. That's why the lazarus solution even work with broken UTF-8 strings. But now a lot of implicit conversions will be added so all strings must have valid encodings. You can no longer pass unknown encoded strings through the functions. First, there will be a bytestring type being not converted. Secondly, I'am rather sure we find ways to cut these conversions as much as possible down. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
I meant more that a lot of people simply ignored in their code that ansistrings could be also multibyte even not considering UTF-8. Ignoring that ANSI Characters $7F are locale depending makes a program work perfectly in a single country and mostly decently in many others. Ignoring that ut8-code-points can be coded in two code-elements in an ANSIString makes a program work only in countries that use just ASCII, Thus not in at all in Europe. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Op Tue, 2 Dec 2008, schreef Michael Schnell: Nobody talks in this case about UTF-8. Even *ANSIstrings* in there native meaning can contain multi byte chars, there are *multi byte* ansi char sets. If there is a widely used multi-byte ANSI encoding, why so we need Unicode ? IMHO the introduction of Unicode has been necessary as (like you suggested) multi-byte ANSI encoding was commonly ignored nearly completely and there never has been _compiler_ support for them. What compiler support should be necessary to handle i.e. EUC-JP? You want a variable of type char to contain the JIS-0213 coordinates? Unicode, and in particular UTF-8, has not taken off either because languages got support for it. In fact, the most common language, C, has no string support at all. One reason Unicode has taken off because of document exchange, which in the internet age got very common. Another reason is the growing importance of the Far East, developers want therefore better support for the Far East languages, but note this Unicode motivation exists mainly for Western software developers. Daniël___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Michael Schnell schrieb: Don't forget that the ansistring type is actually multiple encodings and even multi byte (even not considering UTF-8). The point is: nobody took care of it. IMHO a major confusion is generated by calling a string that is supposed to hold UTF8 data ANSIString. This never should have happened ! Nobody talks in this case about UTF-8. Even *ANSIstrings* in there native meaning can contain multi byte chars, there are *multi byte* ansi char sets. However, everybody codes 1 char=1 byte when using ansistrings which is plainly wrong. Guess why Delphi has functions like CharToByteIndex or NextCharIndex. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
So, really? What is not supported? If just ignoring the fact is enough support, OK, it's supported :). ... tell this 1+ Billion (Billion=10^9 in this case) people in China. I did not know (or suppose) that code used for Chinese characters is called ANSI (American National Standards Institute). I supposed one of the main intentions for the move to Unicode was the ability to support Chinese above all. So they did not seem to have been content with what was done before. Is anybody from China here to offer the footage ? -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Michael Schnell wrote: The more I think about it the more I like this solution. I think it's better then the previous idea of a string with encode information inside it. Would Lazarus be able to follow ? Do you think it's possible to have the compiler take care of any necessary conversions automatically ? For me, these attempts to make compiler do everything automatically sound like getting yet another typing saver. The situation is already dangerously close to write once, debug forever. Recently (after Lazarus 0.9.26 release) I had encountered some cases when a trivial function call resulted in a couple of conversions inserted silently and resulting outcome could be explained only by tracing it or looking at the assembler code. Another example is issue #11327. Initially I perceived it as a code generation issue, but after digging in it was clear that it's caused by first choosing an incorrect overloaded function, then inlining it, then attempting to optimize. There are already at least 17 overloaded Pos() functions, and the compiler simply gets lost between them. That issue is likely to be fixed, but the fix will be for the consequences of the problem, not for its origin. Making the conversions automatic does not make the language clean, instead it hides the potential errors and author's intentions. Moreover, it forces anyone to (implicitly) use these conversions, even those who don't need it. A notable fact is also that while all these endless speech about lack of Unicode support in compiler, nearly all well-known Unicode processing software is written in languages that have no built-in support not only for Unicode, but for the strings itself. Regards, Sergei ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Michael Schnell schrieb: Nobody talks in this case about UTF-8. Even *ANSIstrings* in there native meaning can contain multi byte chars, there are *multi byte* ansi char sets. If there is a widely used multi-byte ANSI encoding, why so we need Unicode ? IMHO the introduction of Unicode has been necessary as (like you suggested) multi-byte ANSI encoding was commonly ignored nearly completely and there never has been _compiler_ support for them. So, really? What is not supported? Thus IMHO it's quite appropriate to only call ANSI only the 1-Byte ANSI code versions ... tell this 1+ Billion (Billion=10^9 in this case) people in China. (to be able to tell them technically from Unicode, the compiler support of which is discussed right here). ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Felipe Monteiro de Carvalho wrote: Ignore the name ansi. Take it as a string type with the system encoding. I think it will solve the confusion. Of course if you ignore ANSI and just use the type named String there is no confusion as it's clear that the coding is not predefined. That is exactly what I wanted to say: If you don't use it for ANSI coded information don't name the type ANSIString. As FPC provides the type ANSIString out of the box it should be used appropriately and this any new user will suppose that there is support for conversion between this type and other string types that explicitly are called differently according to their suggested internal coding (such as UTF8String. If it does not work that way this calls for major confusion. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Michael Schnell schrieb: I meant more that a lot of people simply ignored in their code that ansistrings could be also multibyte even not considering UTF-8. Ignoring that ANSI Characters $7F are locale depending makes a program work perfectly in a single country and mostly decently in many others. So it works in far east with its multi byte ansi encodings? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
The point is: if everybody takes care of the fact that ansistrings can be multibyte, having utf-8 in ansistrings (if it's the locale encoding), is no big deal at all. I do understand. But (in a real world) do you know anybody who does. If it would be appropriate for ANSI code handling to take care of Multi-byte encoding we would not need locale-based code tables and en effect Unicode would not have been invented. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Mattias Gaertner schrieb: You can optimize for one encoding or optimize for one per platform. I know how to optimize for widestrings, for ansistring and for UTF-8 strings, but I have no experience in optimizing for multiple encodings. Don't forget that the ansistring type is actually multiple encodings and even multi byte (even not considering UTF-8). The point is: nobody took care of it. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Michael Schnell schrieb: So, really? What is not supported? If just ignoring the fact is enough support, OK, it's supported :). What FUD is this? Pleaes give an example where the FPC compiler doesn't handle multi byte ansistrings properly. Or do you just want to troll around? This problem can be solved ... ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Zitat von Florian Klaempfl [EMAIL PROTECTED]: Mattias Gaertner schrieb: You can optimize for one encoding or optimize for one per platform. I know how to optimize for widestrings, for ansistring and for UTF-8 strings, but I have no experience in optimizing for multiple encodings. Don't forget that the ansistring type is actually multiple encodings and even multi byte (even not considering UTF-8). The point is: nobody took care of it. Yes, they did. They ran their programs only on systems with ansi encoded strings or simply passed the strings unchanged. That's why the lazarus solution even work with broken UTF-8 strings. But now a lot of implicit conversions will be added so all strings must have valid encodings. You can no longer pass unknown encoded strings through the functions. Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Nobody talks in this case about UTF-8. Even *ANSIstrings* in there native meaning can contain multi byte chars, there are *multi byte* ansi char sets. If there is a widely used multi-byte ANSI encoding, why so we need Unicode ? IMHO the introduction of Unicode has been necessary as (like you suggested) multi-byte ANSI encoding was commonly ignored nearly completely and there never has been _compiler_ support for them. Thus IMHO it's quite appropriate to only call ANSI only the 1-Byte ANSI code versions (to be able to tell them technically from Unicode, the compiler support of which is discussed right here). -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
If just ignoring the fact is enough support, OK, it's supported :). What FUD is this? Pleaes give an example where the FPC compiler doesn't handle multi byte ansistrings properly. Sorry for bad language :( ! I did not mean to be aggressive. (Did you see the smile indicator ?) I did not suggest it handles this wrong in any way, but I just don't see in what way there might be any explicit compiler support for multi-byte ANSI. (You did mention the RTL function provided.) I understand that there never has been a discussion on if there should be any explicit compiler support for multi-byte ANSI, but this thread _is_ a discussion on explicit compiler support for Unicode. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
It simply needs no explicit support except what it has already. Mainly the rtl and the user program has to take care of it and we did this already in the rtl but the compiler required no fix in this regard so far. I do see your point ! But my point is that with the introduction of Unicode, compiler support for handling of these things is introduced (and the RTL and the LCL). I think this should result in making user-code largely unnecessary and not in requiring those programmer, that did not need multi-byte support for serving the users they want to deploy their software to, to finally start to introduce multi-byte handling in their user-program code. IMHO, if ever possible, a new version of a program should make life for the majority of the actual users easier and avoid making life more complicated for those that not willingly decide that they need the complexity. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Michael Schnell schrieb: If just ignoring the fact is enough support, OK, it's supported :). What FUD is this? Pleaes give an example where the FPC compiler doesn't handle multi byte ansistrings properly. Sorry for bad language :( ! I did not mean to be aggressive. (Did you see the smile indicator ?) I did not suggest it handles this wrong in any way, but I just don't see in what way there might be any explicit compiler support for multi-byte ANSI. (You did mention the RTL function provided.) It simply needs no explicit support except what it has already. Mainly the rtl and the user program has to take care of it and we did this already in the rtl but the compiler required no fix in this regard so far. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Op Tue, 2 Dec 2008, schreef Michael Schnell: Thanks for pointing this out. GB2312 suits them well. Likewise, JIS 0213 suits the Japanese well. Are these called ANSI ? Yes, code page 936 and code page 932 are valid ANSI code pages. These standards by themselves of course not, because they are a Chinese respective Japanese industrial standard. Daniël___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Michael Schnell schrieb: Felipe Monteiro de Carvalho wrote: Ignore the name ansi. Take it as a string type with the system encoding. I think it will solve the confusion. Of course if you ignore ANSI and just use the type named String there is no confusion as it's clear that the coding is not predefined. That is exactly what I wanted to say: If you don't use it for ANSI coded information don't name the type ANSIString. As FPC provides the type ANSIString out of the box it should be used appropriately and this any new user will suppose Really? Pascal is a strongly typed language which avoids automatic conversion between types as much as possible. that there is support for conversion between this type and other string types that explicitly are called differently according to their suggested internal coding (such as UTF8String. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Btw will the LCL remain forcedly UTF-8 ? I thought the current Lazarus unicode support was temporary and all options were still open, depending on the outcome of FPC unicode support options? I understand they could not do it differently (other than just providing no Unicode support at all), maybe as there is no automatic type conversion support with FPC (e.g. ANSIString-UTF8String). As the current version is not very satisfying I do hope for a change, but I also do understand that the Lazarus team will wait for what the next FPC version offers on that behalf. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Op Tue, 2 Dec 2008, schreef Michael Schnell: I supposed one of the main intentions for the move to Unicode was the ability to support Chinese above all. So they did not seem to have been content with what was done before. Is anybody from China here to offer the footage ? It is not the Chinese that are pushing for Unicode. GB2312 suits them well. Likewise, JIS 0213 suits the Japanese well. Those encodings also have the characters to support Western languages, or Greek, or Russian. Unicode is used as a technical solution to handle Chinese is for an important part a Western development. Eastern developers already support Eastern languages. Daniël___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On Tue, Dec 2, 2008 at 9:00 AM, Michael Schnell [EMAIL PROTECTED] wrote: I still don't understand what ANSI has to do with System. Ignore the name ansi. Take it as a string type with the system encoding. I think it will solve the confusion. -- Felipe Monteiro de Carvalho ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
The more I think about it the more I like this solution. I think it's better then the previous idea of a string with encode information inside it. Would Lazarus be able to follow ? Do you think it's possible to have the compiler take care of any necessary conversions automatically ? -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Zitat von Michael Schnell [EMAIL PROTECTED]: The point is: if everybody takes care of the fact that ansistrings can be multibyte, having utf-8 in ansistrings (if it's the locale encoding), is no big deal at all. I do understand. But (in a real world) do you know anybody who does. If it would be appropriate for ANSI code handling to take care of Multi-byte encoding we would not need locale-based code tables and en effect Unicode would not have been invented. UTF-8 is unicode and it is the system encoding on linux, OS X, some BSDs and Solaris. So ansistrings are UTF-8 there. Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Thanks for pointing this out. GB2312 suits them well. Likewise, JIS 0213 suits the Japanese well. Are these called ANSI ? -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Michael Schnell schrieb: Thanks for pointing this out. GB2312 suits them well. Likewise, JIS 0213 suits the Japanese well. Are these called ANSI ? Every well educated windows programmer knows that the ansi functions/strings whatever are not limited to the so-called ansi code pages (which aren't ansi either afaik) or is CP850/CP1252 as on my german windows an ansi code page ;)? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
UTF-8 is unicode and it is the system encoding on linux, OS X, some BSDs and Solaris. So ansistrings are UTF-8 there. I still don't understand what ANSI has to do with System. AFAIK, The term ANSI Code stands for a (codepage depending) definition for a character encoding and Unicode is another one. Both are independent of Operating systems. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Michael Schnell schrieb: The point is: if everybody takes care of the fact that ansistrings can be multibyte, having utf-8 in ansistrings (if it's the locale encoding), is no big deal at all. I do understand. But (in a real world) do you know anybody who does. If it would be appropriate for ANSI code handling to take care of Multi-byte encoding we would not need locale-based code tables and en effect Unicode would not have been invented. Multibyte ansi chars are still not unique and require the code page for proper interpretation, this is why Unicode has been invented. By using properly functions like ChartoByteIndex or NextCharIndex, it makes very likely no difference for string processing code if the strings are multi byte ansi or utf-8 unicode. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Not to mention: What would the alternative be? Even if I am not satisfied with the current state of Lazarus on that behalf, I would not dare to suggest that Lazarus should do any change here before the next version of FPC offers a new string handling with either a string type that knows it's internal coding and with that any conversions can be done automatically, or with multiple string types defined of which the compiler knows how to convert them if necessary, or with whatever solution the FPC team comes up with. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Felipe Monteiro de Carvalho schrieb: Hello, Some things weren't clear from the previous discussion, so I would like to clarify them. For instance, the GetTempFileName routine: http://www.freepascal.org/docs-html/rtl/sysutils/gettempfilename.html The routine is currently ANSI, but we need a unicode version of it. How would that unicode version look like? We currently have 3 unicode string types planned AFAIK: No. I assume that the new variable encoding type would be used for all unicode routines, am I right? No, it will be RTLString which type depends on the OS. Or would versions for all 3 types be added? (for example, if someone donates utf8 routines). thanks, ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On Mon, Dec 1, 2008 at 10:13 AM, Florian Klaempfl [EMAIL PROTECTED] wrote: No, it will be RTLString which type depends on the OS. Ok, so code would be something like this: var OSString: RTLString; MyString: UTF8String; begin OSString := SomeRTLRoutine; MyString := OSString; ? It will be funny to use a string type about which nothing is known. I wonder if people will abuse this and start operating system dependent code. -- Felipe Monteiro de Carvalho ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On Mon, Dec 1, 2008 at 10:42 AM, Florian Klaempfl [EMAIL PROTECTED] wrote: Why would you do this and not MyString := SomeRTLRoutine; You are right, that should do it. I was thinking about var parameters. -- Felipe Monteiro de Carvalho ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Florian Klaempfl wrote: Felipe Monteiro de Carvalho schrieb: On Mon, Dec 1, 2008 at 10:13 AM, Florian Klaempfl [EMAIL PROTECTED] wrote: No, it will be RTLString which type depends on the OS. Ok, so code would be something like this: var OSString: RTLString; MyString: UTF8String; begin OSString := SomeRTLRoutine; MyString := OSString; ? Why would you do this and not MyString := SomeRTLRoutine; ? If I understand that right, this may cause some overhead, that in some(few) cases is not needed. If I write an application using stringtype X (WideString for example), then in the above MyString would be WideString. The in/ouput for SomeRTLRoutine are RtlString, they are OS depended. If I compile for a OS using UTF8 then that means for each and every call, it needs a string conversation. Of course I understand, *if* some RTLFunction calls the OS, then the string must be converted. But if I simply want to extract the drive letter, or trim the path, and get the file name, without actually accessing the file or OS? Should it be possible to skip converting? Best Regards Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Felipe Monteiro de Carvalho schrieb: On Mon, Dec 1, 2008 at 10:13 AM, Florian Klaempfl [EMAIL PROTECTED] wrote: No, it will be RTLString which type depends on the OS. Ok, so code would be something like this: var OSString: RTLString; MyString: UTF8String; begin OSString := SomeRTLRoutine; MyString := OSString; ? Why would you do this and not MyString := SomeRTLRoutine; ? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
In our previous episode, Martin Friebe said: ? Why would you do this and not MyString := SomeRTLRoutine; ? If I understand that right, this may cause some overhead, that in some(few) cases is not needed. Correct. If I write an application using stringtype X (WideString for example), then in the above MyString would be WideString. Correct. The in/ouput for SomeRTLRoutine are RtlString, they are OS depended. If I compile for a OS using UTF8 then that means for each and every call, it needs a string conversation. Correct. Of course I understand, *if* some RTLFunction calls the OS, then the string must be converted. But if I simply want to extract the drive letter, or trim the path, and get the file name, without actually accessing the file or OS? Should it be possible to skip converting? Use rtlstring. Do the conversion to widestring after. IOW, you should do it the other way around. Use the OS dependant stringtype for mostly encoding independant operations, and only the few things where you need specific encodings force a certain encoding (using utf8string or widestring) . ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Marco van de Voort wrote: In our previous episode, Martin Friebe said: Why would you do this and not MyString := SomeRTLRoutine; ? If I understand that right, this may cause some overhead, that in some(few) cases is not needed. Correct. If I write an application using stringtype X (WideString for example), then in the above MyString would be WideString. Correct The in/ouput for SomeRTLRoutine are RtlString, they are OS depended. If I compile for a OS using UTF8 then that means for each and every call, it needs a string conversation. Correct. Of course I understand, *if* some RTLFunction calls the OS, then the string must be converted. But if I simply want to extract the drive letter, or trim the path, and get the file name, without actually accessing the file or OS? Should it be possible to skip converting? Use rtlstring. Do the conversion to widestring after. IOW, you should do it the other way around. Use the OS dependant stringtype for mostly encoding independant operations, and only the few things where you need specific encodings force a certain encoding (using utf8string or widestring) I agree, using RTlString will probably help fpc to optimize your exe for each OS. But, using RTLString means you do not know, if you have UTF8 or not. Because UTF8 behaves slightly different from other Strings, many operations can not be performed on RTLString foo[1], copy, pos ... simply because you do not know, if the result is a char, a codepoint or a subcodepoint (single utf8 byte) RTLString is or will be great, if you simply need to store an OS depended string in order to later give it back to the OS. (eg open file, remember file name, but do not process it (displaying it would be vi OS), and save file back to the same name.) For this you could also use ByteString: if there is such a thing, and if it behaves as not converting, if assigned to any string Best Regards Martin --- Disclaimer: Just to keep this discussion where it was: - I do understand why the above is as it is (string index not being utf8 chart access). - I do not believe that this is correct too (and any discussion should be a new thread) ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Marco van de Voort wrote: In our previous episode, Martin Friebe said: I agree, using RTlString will probably help fpc to optimize your exe for each OS. But, using RTLString means you do not know, if you have UTF8 or not. Correct. Because UTF8 behaves slightly different from other Strings, many operations can not be performed on RTLString foo[1], copy, pos ... simply because you do not know, if the result is a char, a codepoint or a subcodepoint (single utf8 byte) You don't know that about UTF-16 either. Even though that is no problem in True, good point most cases, it is slowly time to abandon too simplistic thinking about strings. The best solution is to minimize editing, and localize them in certain parts of the code, keeping most of the code encoding agnostic. True, too. But we are talking Pascal, not some other language. string[index], copy, pos, length have always been part of Pascal. Of course they are still there, to be used in the few parts of your code, where you specialize on whatever string type you deal with. But otherwise, using RTLString IMHO will abandon this part of pascal syntax. A function of which the result can not be used, as it can change at compile time = such a function can not be used. (or we will have buffer overflows, code injection and more ...) I admit that the Problem started (and that has been discussed more than enough) starts with UTF8string (yes even with utf16 string). But in this case those functions became a new, but predictable meaning. I can do utf8string[1], and I can use the result. Only I have to be aware what it means. I can *not* do rtlString[1], as at the time of code writing I can not be aware what it means. It is only decided, at compilation time. IFDEFs won't help neither, because they can only cope with the set of stringtypes know at the time the code is written. This breaks each time FPC will be extended. and localize them in certain parts of the code, keeping most of the code encoding agnostic. Sorry I can't help taking that into another direction, (which also has been discussed before). The above quote sounds like a sentence from a introduction into object orientation. Sure it is the right thing.. It is right for OO. So it should be right for strings as well. Just again, it simply will be a new language, which a string-object, instead of pascal. And yes, if you lazy, you lose performance due to automatic conversions. It has always been that way (also when mixing short and ansistring) In other words, write pascal code, just do not use some of the (imho) most common elements of pascal syntax? I acknowledge a language is a living thing, and needs to be adjusted to the new things, that come up over time. I only ask, if this is the best way? This is not just a good thing for OS interfacing code, but a good thing in general. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Martin Friebe schrieb: In other words, write pascal code, just do not use some of the (imho) most common elements of pascal syntax? I acknowledge a language is a living thing, and needs to be adjusted to the new things, that come up over time. I only ask, if this is the best way? We're open to proposals, make one. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
In our previous episode, Martin Friebe said: most cases, it is slowly time to abandon too simplistic thinking about strings. The best solution is to minimize editing, and localize them in certain parts of the code, keeping most of the code encoding agnostic. True, too. But we are talking Pascal, not some other language. string[index], copy, pos, length have always been part of Pascal. So keep using ansistring? It doesn't change. Of course they are still there, to be used in the few parts of your code, where you specialize on whatever string type you deal with. But otherwise, using RTLString IMHO will abandon this part of pascal syntax. It removes ASCII legacy. I don't see you complaining about the fact that char is not 8 bit anymore, and that that abandons that part of the pascal syntax. A function of which the result can not be used, as it can change at compile time = such a function can not be used. (or we will have buffer overflows, code injection and more ...) Hence my suggestion to minimize this functionality. I admit that the Problem started (and that has been discussed more than enough) starts with UTF8string (yes even with utf16 string). But in this case those functions became a new, but predictable meaning. I can do utf8string[1], and I can use the result. Only I have to be aware what it means. Yes. As widestring[1] also requires interpretation. That's unicode. I can *not* do rtlString[1], as at the time of code writing I can not be aware what it means. You don't have to. You carry it around as long as you can, and when you don't can, you assign it to your type of choice and bite the penalty. Delaying that as long as possible avoids excessive penalities, which IMHO are as much part of the Pascal language. Doing that would hurt the general purpose nature by turning into basic. (and then I mean the real Basics, not the C-with-basic-syntax that is FreeBasic), or worse: Excel. It is only decided, at compilation time. IFDEFs won't help neither, because they can only cope with the set of stringtypes know at the time the code is written. This breaks each time FPC will be extended. Any such big transition as ASCII - Unicode will break. However we have had these discussions before, but avoiding all pitfalls is simply too costly, and that breaks other Pascal traditions. and localize them in certain parts of the code, keeping most of the code encoding agnostic. Sorry I can't help taking that into another direction, (which also has been discussed before). The above quote sounds like a sentence from a introduction into object orientation. It is an introduction to abstraction maybe. I don't see the OO in there. It is right for OO. So it should be right for strings as well. Just again, it simply will be a new language, which a string-object, instead of pascal. This is all gibberish for me. I never said OO, and never will. And yes, if you lazy, you lose performance due to automatic conversions. It has always been that way (also when mixing short and ansistring) In other words, write pascal code, just do not use some of the (imho) most common elements of pascal syntax? There is no just. Strings simply get more complicated if you go unicode, and that can't be hidden. Either you stay with safe ASCII strings, or you use Unicode. If you do the latter, you will have to adapt anyway. And top-heavy emulation layers are not Pascallike either. I acknowledge a language is a living thing, and needs to be adjusted to the new things, that come up over time. I only ask, if this is the best way? IMHO there is not even a choice, since there simply no is a viable alternative. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Marco van de Voort escreveu: In our previous episode, Martin Friebe said: most cases, it is slowly time to abandon too simplistic thinking about strings. The best solution is to minimize editing, and localize them in certain parts of the code, keeping most of the code encoding agnostic. True, too. But we are talking Pascal, not some other language. string[index], copy, pos, length have always been part of Pascal. So keep using ansistring? It doesn't change. Not true if fpc will follow Delphi. The new AnsiString type will be also automatically converted in Delphi 2009. See the Marco Cantu doc about Unicode (linked some threads ago). Luiz ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Martin Friebe schrieb: Marco van de Voort wrote: In our previous episode, Martin Friebe said: Of course they are still there, to be used in the few parts of your code, where you specialize on whatever string type you deal with. But otherwise, using RTLString IMHO will abandon this part of pascal syntax. It removes ASCII legacy. I don't see you complaining about the fact that char is not 8 bit anymore, and that that abandons that part of the pascal syntax. It does not abandon the syntax. It only adds to it's meaning (*adds*, any existing meaning is unaltered.). I can still do: foo[1] for *any* type of string. (well yes even RTLstring, but see below) - If string happens to be an old ascii string, that still works as it always has - If string happens to be any unicode = that is still the same syntax, but with a new meaning. The new meaning doe snot break anything, because it only applies to new types. It is usable too, because I know, I am dealing with codepoints, or sub code points. And I know how they look, and how to identify them The introduction of RTLString is fine. I do say it is a good thing. RTLString does not interfere with the above. In fact even for RTLstring the syntax foo[1] does exist. Just it is not useful. If I tread it as utf8 sub code point, I can be wrong. If I tread it as ascii, I can be wrong. If I tread it as UTF16 I can be wrong My argument was not against RTLString. However it was my understanding that RTL functions will enforce RTLString. That they will only exist for RTLString, and they will *not* exist for other string types. That I would call enforcing RTLString, because of penalties on using other string types. I acknowledge, that if the end result of calling the RTL function, is an OS call, the conversation/penalty is always there. But not every RTL function ends up in an OS call. I admit that the Problem started (and that has been discussed more than enough) starts with UTF8string (yes even with utf16 string). But in this case those functions became a new, but predictable meaning. I can do utf8string[1], and I can use the result. Only I have to be aware what it means. Yes. As widestring[1] also requires interpretation. That's unicode. See above: Yes it requires interpretation. But it allows me to do so I can not see how I can interpret RtlString[1]. If the result is bigger than 128, then I must know what type it is. If it is ANSI, it is a single byte char. If it is utf8, it is a sub-codepoint which will be part of a codepoint. If it is widestring, well yes, here breaks my assumption that RtlString[1] returns a byte ouch I see this as a theoretic consideration. Please give a real world (!) code example when this causes a problem. If you assign the result of an rtl function to an rtlstring, this means you don't care about the type of rtlstring[1] or the knowledge, that it's type is rtlchar is enough for you. If you assign it to an ansistring/widestring whatever, you know what you get. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Martin Friebe escreveu: Marco van de Voort wrote: In our previous episode, Martin Friebe said: I agree, using RTlString will probably help fpc to optimize your exe for each OS. But, using RTLString means you do not know, if you have UTF8 or not. Correct. Because UTF8 behaves slightly different from other Strings, many operations can not be performed on RTLString foo[1], copy, pos ... simply because you do not know, if the result is a char, a codepoint or a subcodepoint (single utf8 byte) You don't know that about UTF-16 either. Even though that is no problem in True, good point most cases, it is slowly time to abandon too simplistic thinking about strings. The best solution is to minimize editing, and localize them in certain parts of the code, keeping most of the code encoding agnostic. True, too. But we are talking Pascal, not some other language. string[index], copy, pos, length have always been part of Pascal. Of course they are still there, to be used in the few parts of your code, where you specialize on whatever string type you deal with. But otherwise, using RTLString IMHO will abandon this part of pascal syntax. A function of which the result can not be used, as it can change at compile time = such a function can not be used. (or we will have buffer overflows, code injection and more ...) To use safely RTLString, at first look, would be be sufficient to use overloaded functions from the Characters unit (introduced in Delphi 2009). See http://www.jacobthurman.com/?p=30 how you can use them to get Copy, Pos behavior. Next week, i'll implement those functions for UTF16 and UTF8 and do some tests. Luiz ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
And yes, if you lazy, you lose performance due to automatic conversions. It has always been that way (also when mixing short and ansistring) Of course you are very right here ! If you are lazy and write your code like you are used to, you will not get optimum performance with a new compiler that now allows for Unicode. But the code still needs to be working as expected (as with a compiler version that does not allow for Unicode, but simply uses ANSI or whatever OS and locale depending 8-Bit code). In most programs that will not be a problem at all as doing extensive string calculations in user-code is not necessary. Of course, if you want to take real advantage of Unicode (using characters outside your current locale) or if you want to optimize (for speed or for memory size) you need to be aware of the Unicode stuff and write your code appropriately. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
From: Michael Schnell [EMAIL PROTECTED] And yes, if you lazy, you lose performance due to automatic conversions. It has always been that way (also when mixing short and ansistring) Of course you are very right here ! If you are lazy and write your code like you are used to, you will not get optimum performance with a new compiler that now allows for Unicode. But the code still needs to be working as expected (as with a compiler version that does not allow for Unicode, but simply uses ANSI or whatever OS and locale depending 8-Bit code). In most programs that will not be a problem at all as doing extensive string calculations in user-code is not necessary. Of course, if you want to take real advantage of Unicode (using characters outside your current locale) or if you want to optimize (for speed or for memory size) you need to be aware of the Unicode stuff and write your code appropriately. It is planned to allow users to build ANSI version of RTL which will be fully compatible with existing user code. But if you choose to use unicode RTL, you must keep in mind all unicode specific things... Yury. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
So keep using ansistring? It doesn't change. Only if the bytes in the ANSIString in fact are ANSI (which the compiler in the moment) does not take care for if doing myANSIString := myUTF8String etc. I feel that with Widestring the pos() etc paradigms stay usable in more cases than with ANSIString. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
I don't see you complaining about the fact that char is not 8 bit anymore, and that that abandons that part of the pascal syntax. When doing the most common string stuff like case s[i] of '1', 'a', 'ä': ... This does not really hurt. even n := ord(s[i]) - ord('0'); works with 16 bit/char strings. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
It is planned to allow users to build ANSI version of RTL which will be fully compatible with existing user code. But if you choose to use unicode RTL, you must keep in mind all unicode specific things... This will be very helpful for the time being. Let's hope that the LCL will follow the Path of allowing the user to choose if he actually wants to use Unicode in the user code without explicitly calling Unicode functions. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Michael Schnell schrieb: It is planned to allow users to build ANSI version of RTL which will be fully compatible with existing user code. But if you choose to use unicode RTL, you must keep in mind all unicode specific things... This will be very helpful for the time being. It is not helpful because on an utf-8 system ansistring contains utf-8. Ansistring just means: use the system locale 8 bit encoding. Let's hope that the LCL will follow the Path of allowing the user to choose if he actually wants to use Unicode in the user code without explicitly calling Unicode functions. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Martin Friebe escreveu: All the code Widestring := RtlFunction; Utf8string := RtlFunction; will run, it may just perform badly. Yes and no. Let's assume the platforms windows and unix having UnicodeString (UTF-16) and UTF8String as native types respectively. You choose to use UnicodeString type in your app. Using the rtlstring approach you get: Under windows: the native string type of platform is the same as you are using no conversion is taken. Good. Under unix: the native string type of platform is NOT the same as you are using ONE conversion is taken. Bad. Now let's assume that fpc team decided to use a fixed unicode encoding for the RTL. Let's say a UnicodeString RTL. You choose to use UnicodeString type in your app. Under windows no conversions. Everything is UTF16. Good. Under unix the RTL must internally convert from the native type (UTF8) to UTF16. Bad. The same result as above. But someone else wants/needs to use UTF8 strings in your project. Under windows you will get one conversion: UTF16 - UTF8. Bad. Under unix you will get TWO conversions: UTF8 - UTF16 - UTF8. Very Bad. Luiz ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On Mon, 01 Dec 2008 16:36:23 +0100 Florian Klaempfl [EMAIL PROTECTED] wrote: [...] Martin Friebe schrieb: I can not see how I can interpret RtlString[1]. If the result is bigger than 128, then I must know what type it is. If it is ANSI, it is a single byte char. If it is utf8, it is a sub-codepoint which will be part of a codepoint. If it is widestring, well yes, here breaks my assumption that RtlString[1] returns a byte ouch I see this as a theoretic consideration. Please give a real world (!) code example when this causes a problem. Can you give a real world example where a different RTLString for each platform solves a problem? If you assign the result of an rtl function to an rtlstring, this means you don't care about the type of rtlstring[1] or the knowledge, that it's type is rtlchar is enough for you. If you assign it to an ansistring/widestring whatever, you know what you get. What string type will be TStrings.Items and the many other strings in the classes.pp? Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Mattias Gaertner schrieb: On Mon, 01 Dec 2008 16:36:23 +0100 Florian Klaempfl [EMAIL PROTECTED] wrote: [...] Martin Friebe schrieb: I can not see how I can interpret RtlString[1]. If the result is bigger than 128, then I must know what type it is. If it is ANSI, it is a single byte char. If it is utf8, it is a sub-codepoint which will be part of a codepoint. If it is widestring, well yes, here breaks my assumption that RtlString[1] returns a byte ouch I see this as a theoretic consideration. Please give a real world (!) code example when this causes a problem. Can you give a real world example where a different RTLString for each platform solves a problem? It solves for example the problem that there are platforms where no unicode support is available or desired and it avoids unneeded conversions. I'd be fine using utf-16 on all platforms :) If you assign the result of an rtl function to an rtlstring, this means you don't care about the type of rtlstring[1] or the knowledge, that it's type is rtlchar is enough for you. If you assign it to an ansistring/widestring whatever, you know what you get. What string type will be TStrings.Items and the many other strings in the classes.pp? Not yet decided though I'd make them RTLString as well. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
In our previous episode, Luiz Americo Pereira Camara said: string[index], copy, pos, length have always been part of Pascal. So keep using ansistring? It doesn't change. Not true if fpc will follow Delphi. The new AnsiString type will be also automatically converted in Delphi 2009. As far as I know, the default is still ascii in the default system ascii encoding. See the Marco Cantu doc about Unicode (linked some threads ago). I got it from Alan Bauers blog in may (before Tiburon was out), but while ansistring changes, afaik the widestring to ansistring-without-qualifier stays the same? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On Mon, 01 Dec 2008 15:06:45 + Martin Friebe [EMAIL PROTECTED] wrote: Florian Klaempfl wrote: [...] My opinion is that it should be the programmers choice. I a programmer wants or needs a simpler way (keeping all the strings in is application in one format, which will be known to him) then he/she should have that choice. And then on this type the person could perform any index or index-like operation. About: keeping all the strings in is application in one format, which will be known to him Only small programs can do that. All others use third party packages. If you want choice, then all used third packages must support all possible choices. Unlikely. That would mean that in order to avoid conversation, some functions of the RTL would be needed in overloaded versions for each string type. IMHO this applies only to those, which do not (or not always) make calls to the OS. Any other function does the conversation anyway. (It will be a case by case base) Sorry, I can't follow here. Please enlighten me, why an overloaded function with an internal conversion is better than an implicit conversion? [...] Also it would be nice (so I do not know how) not to have to duplicate code, in order to archive this. Something like generics, maybe. The goal of RTLString is to avoid duplicate code in the RTL. Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
In our previous episode, Mattias Gaertner said: I see this as a theoretic consideration. Please give a real world (!) code example when this causes a problem. Can you give a real world example where a different RTLString for each platform solves a problem? It avoids pingpong repeated conversions between OS encoding and whatever encoding is default. If you assign the result of an rtl function to an rtlstring, this means you don't care about the type of rtlstring[1] or the knowledge, that it's type is rtlchar is enough for you. If you assign it to an ansistring/widestring whatever, you know what you get. What string type will be TStrings.Items and the many other strings in the classes.pp? IMHO rtlstring. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On Mon, Dec 1, 2008 at 10:13 AM, Florian Klaempfl [EMAIL PROTECTED] wrote: I assume that the new variable encoding type would be used for all unicode routines, am I right? No, it will be RTLString which type depends on the OS. The more I think about it the more I like this solution. I think it's better then the previous idea of a string with encode information inside it. -- Felipe Monteiro de Carvalho ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
In our previous episode, Mattias Gaertner said: Florian Klaempfl wrote: [...] My opinion is that it should be the programmers choice. I a programmer wants or needs a simpler way (keeping all the strings in is application in one format, which will be known to him) then he/she should have that choice. And then on this type the person could perform any index or index-like operation. About: keeping all the strings in is application in one format, which will be known to him This is not possible, since you don't control OS + headers. Most stuff will come from the outside in the system encoding. This way you can do the whole app in the system encoding, and only face conversions when outputing to the GUI, which is (relatively) infinitely slow anyway. You did nail a big problem though, and a weakness in Delphi's design. What to do with classes that are used both straight and in the GUI? Only small programs can do that. All others use third party packages. If you want choice, then all used third packages must support all possible choices. Unlikely. If you want to be the lowest common denomitor and ask nothing from the 3rd party packages, it is the best to stay with Ascii. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On Mon, Dec 1, 2008 at 5:50 PM, Marco van de Voort [EMAIL PROTECTED] wrote: You did nail a big problem though, and a weakness in Delphi's design. What to do with classes that are used both straight and in the GUI? You mean like TStrings? I think we will eventually roll our own TUTF8Strings We could add a unit in FPC for all kinds of UTF-8 versions of routines. -- Felipe Monteiro de Carvalho ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On Mon, Dec 1, 2008 at 5:40 PM, Florian Klaempfl [EMAIL PROTECTED] wrote: What string type will be TStrings.Items and the many other strings in the classes.pp? Not yet decided though I'd make them RTLString as well. I think you can't change TStrings because that would break all code using it (huges amount of code). I would recommend adding a similar class with a different name. -- Felipe Monteiro de Carvalho ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On Mon, 01 Dec 2008 20:40:14 +0100 Florian Klaempfl [EMAIL PROTECTED] wrote: Mattias Gaertner schrieb: On Mon, 01 Dec 2008 16:36:23 +0100 Florian Klaempfl [EMAIL PROTECTED] wrote: [...] Martin Friebe schrieb: I can not see how I can interpret RtlString[1]. If the result is bigger than 128, then I must know what type it is. If it is ANSI, it is a single byte char. If it is utf8, it is a sub-codepoint which will be part of a codepoint. If it is widestring, well yes, here breaks my assumption that RtlString[1] returns a byte ouch I see this as a theoretic consideration. Please give a real world (!) code example when this causes a problem. Can you give a real world example where a different RTLString for each platform solves a problem? It solves for example the problem that there are platforms where no unicode support is available or desired :) and it avoids unneeded conversions. I understand it 'avoids unneeded conversions' *inside* the RTL, by adding implicit conversions to the code accessing the RTL. I'd be fine using utf-16 on all platforms :) Me2. At least for the file functions. I have some doubt about the classes.pp. If you assign the result of an rtl function to an rtlstring, this means you don't care about the type of rtlstring[1] or the knowledge, that it's type is rtlchar is enough for you. If you assign it to an ansistring/widestring whatever, you know what you get. What string type will be TStrings.Items and the many other strings in the classes.pp? Not yet decided though I'd make them RTLString as well. :( TStrings is dog slow and the only reason, why it was still reasonable was assigning strings was only reference counting. If TStrings uses a platform dependent string, this is a big performance problem. Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On Mon, 1 Dec 2008 20:44:32 +0100 (CET) [EMAIL PROTECTED] (Marco van de Voort) wrote: In our previous episode, Mattias Gaertner said: I see this as a theoretic consideration. Please give a real world (!) code example when this causes a problem. Can you give a real world example where a different RTLString for each platform solves a problem? It avoids pingpong repeated conversions between OS encoding and whatever encoding is default. A real world example please. Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
In our previous episode, Felipe Monteiro de Carvalho said: You did nail a big problem though, and a weakness in Delphi's design. What to do with classes that are used both straight and in the GUI? You mean like TStrings? I think we will eventually roll our own TUTF8Strings We could add a unit in FPC for all kinds of UTF-8 versions of routines. Doesn't work per se. Tstringlist is also used in libraries, to save GUI parts etc. A better solution would be to simply not try to fix this and give lazarus their own copy of said classes, so that they can keep the encoding of that in sync with whatever they decide for their own encoding. That would actually require less fixups (a few conversions procedures for the rare points where tlclstrings are passed to e.g. registry units. Lazarus already has their own XML units). ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On 01 Dec 2008, at 20:57, Felipe Monteiro de Carvalho wrote: On Mon, Dec 1, 2008 at 5:40 PM, Florian Klaempfl [EMAIL PROTECTED] wrote: What string type will be TStrings.Items and the many other strings in the classes.pp? Not yet decided though I'd make them RTLString as well. I think you can't change TStrings because that would break all code using it (huges amount of code). I would recommend adding a similar class with a different name. In that case, I would recommend giving it the string with attached encoding style type so you don't need 5 tstrings variants. Regarding how to deal with file system representations, conversions etc, it may also be interesting to look at Apple's NSString class (http://developer.apple.com/documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/Reference/NSString.html ) or, if you prefer a procedural approach, CFStrings (http://developer.apple.com/documentation/CoreFoundation/Reference/CFStringRef/index.html ) I'm not suggesting to mimik that exact API, but only to see what kind of APIs they support (and are deprecating). NSString/CFString (one is just an OOP version of the other) are also a universal string container type, with embedded encoding. For example, there are routines such as * CFStringGetCharacterAtIndex() (and more optimised approaches as documented there, such as CFStringGetRangeOfComposedCharactersAtIndex()) * CFStringGetFileSystemRepresentation() (basically the rtlstring version of the string) * CFStringConvertWindowsCodepageToEncoding() (Returns the Core Foundation encoding constant that is the closest mapping to a given Windows codepage identifier.) * ... The advantage when using such a type is that you also only need to convert it (internally, hidden from the user) on demand or when some helper routine requires it (such as e.g. case-insensitive comparisons). Otherwise, no conversion whatsoever is necessary. Jonas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On Mon, 1 Dec 2008 17:53:58 -0200 Felipe Monteiro de Carvalho [EMAIL PROTECTED] wrote: On Mon, Dec 1, 2008 at 5:50 PM, Marco van de Voort [EMAIL PROTECTED] wrote: You did nail a big problem though, and a weakness in Delphi's design. What to do with classes that are used both straight and in the GUI? You mean like TStrings? I think we will eventually roll our own TUTF8Strings This must be added to the classes.pp and TStrings must know it, so that Assign et al works. We could add a unit in FPC for all kinds of UTF-8 versions of routines. Yes, that is a good idea anyway - independent of RTLString and the current topic. Perhaps this should be discussed in a separate thread. Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
In our previous episode, Felipe Monteiro de Carvalho said: On Mon, Dec 1, 2008 at 5:40 PM, Florian Klaempfl [EMAIL PROTECTED] wrote: What string type will be TStrings.Items and the many other strings in the classes.pp? Not yet decided though I'd make them RTLString as well. I think you can't change TStrings because that would break all code using it (huges amount of code). Depends. The few last msgs kept me thinking, and if what I saw on the web about Tiburon is correct, they simply control the type of ansistring in tstringlist, and default let it be the system encoding. (default like in old delphi). For unicodecontrols they set some new property or so to change it to UTF8, and take the conversion penalties for granted. This allows them to do MyStringList.SaveToFile('SomeFilename.txt', TEncoding.Unicode); It has to be something like that, since if mystringlist always was ansistring in whatever ISO encoding, that would be a pretty pointless unicode control. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
In our previous episode, Jonas Maebe said: (nsstring) The advantage when using such a type is that you also only need to convert it (internally, hidden from the user) on demand or when some helper routine requires it (such as e.g. case-insensitive comparisons). Otherwise, no conversion whatsoever is necessary. Do they have some way to indicate that a procedure/method only supports a certain encoding? Or do you have to manually force the encoding in that way? I prefer a declarative way to solve this. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
In our previous episode, Mattias Gaertner said: and it avoids unneeded conversions. I understand it 'avoids unneeded conversions' *inside* the RTL, by adding implicit conversions to the code accessing the RTL. It allows the user to stay conversion free, and have some control over how many conversions are being done. It is way better than making this decision for him, and forcing him to an encoding he normally wouldn't use in the first place. I'd be fine using utf-16 on all platforms :) Me2. At least for the file functions. I would too. If all platforms had chosen it. But they didn't. Not yet decided though I'd make them RTLString as well. :( TStrings is dog slow and the only reason, why it was still reasonable was assigning strings was only reference counting. If TStrings uses a platform dependent string, this is a big performance problem. Because exactly why? See also my previous msg. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On 01 Dec 2008, at 21:17, Marco van de Voort wrote: In our previous episode, Jonas Maebe said: (nsstring) The advantage when using such a type is that you also only need to convert it (internally, hidden from the user) on demand or when some helper routine requires it (such as e.g. case-insensitive comparisons). Otherwise, no conversion whatsoever is necessary. Do they have some way to indicate that a procedure/method only supports a certain encoding? No. Or do you have to manually force the encoding in that way? Yes. I prefer a declarative way to solve this. In the Pascal case, you could simply declare your parameter as UTF8String (or whatever) and the compiler would insert a conversion from this universal string type into a utf8string. Jonas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On Mon, Dec 1, 2008 at 6:22 PM, Mattias Gaertner [EMAIL PROTECTED] wrote: Compatibility was always the bigger goal for lazarus. IMHO a TLCLStrings breaks more than it would solve. I don't fully understand how the Tiburon TStrings works, but consider that we are used to mixing TStrings with LCL code, and then we migrate to the proposed UTF8String. Now every assignment of a string to TStrings will have a implicit conversion. Unless there are many overloaded methods in TStrings, one for each encoding, and it is able to internally use our desired encoding so that no useless convertions are done. If this requirements aren't met, we need a new class. -- Felipe Monteiro de Carvalho ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
In our previous episode, Mattias Gaertner said: encoding. That would actually require less fixups (a few conversions procedures for the rare points where tlclstrings are passed to e.g. registry units. Lazarus already has their own XML units). Only at places where we had the choice. The LCL uses the FCL xml units. As said it can be fixed. Maybe even easier than I thought. Compatibility was always the bigger goal for lazarus. IMHO a TLCLStrings breaks more than it would solve. A lot will change. Even with Delphi not everything automatically is unicode, and they only have one platform to regard. I usually am sb who is pretty serious about Delphi compatibility, except for some of the more bizar post D7 experiments. This however is different. While I really like Tiburon as Delphi user, I loathe it as FPC user. It will never be totally transparent, whatever you do. Just like something things never were transparent when porting to Linux. We are a multi OS compiler, not a version of Wine oriented towards Pascal development. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On Mon, Dec 1, 2008 at 10:03 PM, Mattias Gaertner [EMAIL PROTECTED] wrote: and it avoids unneeded conversions. I understand it 'avoids unneeded conversions' *inside* the RTL, by adding implicit conversions to the code accessing the RTL. This is exactly what I was thinking. The conversion is simply passed on to a different piece of code. So the end result is the same - you still have conversion. Regards, - Graeme - ___ fpGUI - a cross-platform Free Pascal GUI toolkit http://opensoft.homeip.net/fpgui/ ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On Mon, Dec 1, 2008 at 6:22 PM, Mattias Gaertner [EMAIL PROTECTED] wrote: Compatibility was always the bigger goal for lazarus. IMHO a TLCLStrings breaks more than it would solve. You mean compatibility with Delphi? With Tiburon I think this will become very hard, if possible at all. We can, however, keep compatible with previous Delphi versions. -- Felipe Monteiro de Carvalho ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
In our previous episode, Graeme Geldenhuys said: and it avoids unneeded conversions. I understand it 'avoids unneeded conversions' *inside* the RTL, by adding implicit conversions to the code accessing the RTL. This is exactly what I was thinking. The conversion is simply passed on to a different piece of code. So the end result is the same - you still have conversion. Not necesarily, since you might not use a different type. Or only use them in a few rare cases where you must do character level access. Or you might convert to UTF-32, because the char level access is particularly difficult, or you want to be correct. IOW, you give the programmer the choice about the type, instead of forcing him an arbitrary one, based on your favorite platform. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On Mon, Dec 1, 2008 at 6:38 PM, Marco van de Voort [EMAIL PROTECTED] wrote: In our previous episode, Graeme Geldenhuys said: IOW, you give the programmer the choice about the type, instead of forcing him an arbitrary one, based on your favorite platform. This is the part I like about this approach. The most likely fixed encoding to be adopted would be UTF-16, and something not very nice would happen to Lazarus users in UNIXes: LCL (UTF-8) -- RTL (UTF-16) --- Operating System (UTF-8) 2 useless conversions. -- Felipe Monteiro de Carvalho ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
In our previous episode, Jonas Maebe said: Do they have some way to indicate that a procedure/method only supports a certain encoding? No. Or do you have to manually force the encoding in that way? Yes. Clear. I just wondered how they solved it. I prefer a declarative way to solve this. In the Pascal case, you could simply declare your parameter as UTF8String (or whatever) and the compiler would insert a conversion from this universal string type into a utf8string. I know. With that modification I thought that was the best too until Tiburon details emerged. Actually I still think that our original is the best, not considering compatibility issues, but I don't think the difference is worth losing at least base level Tiburon compatibility over. Specially because their solution has some advantages too (can more gradually change ansistring code) ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On Mon, 1 Dec 2008 21:07:50 +0100 (CET) [EMAIL PROTECTED] (Marco van de Voort) wrote: In our previous episode, Felipe Monteiro de Carvalho said: You did nail a big problem though, and a weakness in Delphi's design. What to do with classes that are used both straight and in the GUI? You mean like TStrings? I think we will eventually roll our own TUTF8Strings We could add a unit in FPC for all kinds of UTF-8 versions of routines. Doesn't work per se. Tstringlist is also used in libraries, to save GUI parts etc. A better solution would be to simply not try to fix this and give lazarus their own copy of said classes, so that they can keep the encoding of that in sync with whatever they decide for their own encoding. That would actually require less fixups (a few conversions procedures for the rare points where tlclstrings are passed to e.g. registry units. Lazarus already has their own XML units). Only at places where we had the choice. The LCL uses the FCL xml units. Compatibility was always the bigger goal for lazarus. IMHO a TLCLStrings breaks more than it would solve. Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
In our previous episode, Felipe Monteiro de Carvalho said: This is the part I like about this approach. The most likely fixed encoding to be adopted would be UTF-16, and something not very nice would happen to Lazarus users in UNIXes: LCL (UTF-8) -- RTL (UTF-16) --- Operating System (UTF-8) 2 useless conversions. Btw will the LCL remain forcedly UTF-8 ? I thought the current Lazarus unicode support was temporary and all options were still open, depending on the outcome of FPC unicode support options? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On Mon, Dec 1, 2008 at 6:48 PM, Marco van de Voort [EMAIL PROTECTED] wrote: Btw will the LCL remain forcedly UTF-8 ? I thought the current Lazarus unicode support was temporary and all options were still open, depending on the outcome of FPC unicode support options? It is certainly not temporary, also considering people won't be very happy to see us make a big incompatible change right after telling them to convert their source code to UTF-8. I think we have a responsability to stay coherent here. I think we may consider migrating to UTF8String when it's implemented and if it proves a viable solution. Not to mention: What would the alternative be? -- Felipe Monteiro de Carvalho ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On Mon, 1 Dec 2008 18:45:46 -0200 Felipe Monteiro de Carvalho [EMAIL PROTECTED] wrote: On Mon, Dec 1, 2008 at 6:38 PM, Marco van de Voort [EMAIL PROTECTED] wrote: In our previous episode, Graeme Geldenhuys said: IOW, you give the programmer the choice about the type, instead of forcing him an arbitrary one, based on your favorite platform. This is the part I like about this approach. The most likely fixed encoding to be adopted would be UTF-16, and something not very nice would happen to Lazarus users in UNIXes: LCL (UTF-8) -- RTL (UTF-16) --- Operating System (UTF-8) 2 useless conversions. The LCL is a visual component library. It's string speed is slow anyway. Except maybe for TMemo.Lines and running through a big TreeView. Same is true for file functions. The OS overhead checking for permissions and the other file system issues makes even a triple encoding/decoding a non issue. For example under Mac OS X the lazarus IDE uses the CFString functions to compare a filename. This normalizes the string each time. CompareFilenames is used easily hundred thousands of time and no one said yet, that the IDE runs slowly under OS X. It's a different thing for TStrings. Many algorithms need the O(1) time accessing a Items[i]. Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
In our previous episode, Felipe Monteiro de Carvalho said: On Mon, Dec 1, 2008 at 6:48 PM, Marco van de Voort [EMAIL PROTECTED] wrote: Btw will the LCL remain forcedly UTF-8 ? I thought the current Lazarus unicode support was temporary and all options were still open, depending on the outcome of FPC unicode support options? It is certainly not temporary, also considering people won't be very happy to see us make a big incompatible change right after telling them to convert their source code to UTF-8. I think we have a responsability to stay coherent here. Well, euh, you will need a change anyway from manual to automated ? I think we may consider migrating to UTF8String when it's implemented and if it proves a viable solution. Not to mention: What would the alternative be? Well, the logical one of course: RTLString, IOW encoding platform dependant. Except maybe selected widgets like synedit. (Borland stores source in utf-8 too on windows) ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
En/na Felipe Monteiro de Carvalho ha escrit: LCL (UTF-8) -- RTL (UTF-16) --- Operating System (UTF-8) Is the last step always true? Doesn't qt support utf-16? Bye -- Luca ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On Mon, Dec 1, 2008 at 7:03 PM, Luca Olivetti [EMAIL PROTECTED] wrote: LCL (UTF-8) -- RTL (UTF-16) --- Operating System (UTF-8) Is the last step always true? Doesn't qt support utf-16? This is for operating system calls, not graphical library calls. -- Felipe Monteiro de Carvalho ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On Mon, Dec 1, 2008 at 7:01 PM, Marco van de Voort [EMAIL PROTECTED] wrote: RTLString, IOW encoding platform dependant. Except maybe selected widgets like synedit. (Borland stores source in utf-8 too on windows) A string whose encoding is unknown is very inconvenient for developers. The idea just saves itself in the RTL because of the eventual need to do some extremely high performance applications. For Lazarus it would be simply a useless inconvenience. -- Felipe Monteiro de Carvalho ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Felipe Monteiro de Carvalho schrieb: On Mon, Dec 1, 2008 at 7:01 PM, Marco van de Voort [EMAIL PROTECTED] wrote: RTLString, IOW encoding platform dependant. Except maybe selected widgets like synedit. (Borland stores source in utf-8 too on windows) A string whose encoding is unknown is very inconvenient for developers. The idea just saves itself in the RTL because of the eventual need to do some extremely high performance applications. For Lazarus it would be simply a useless inconvenience. So how did people work for years with ansistring? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
In our previous episode, Felipe Monteiro de Carvalho said: RTLString, IOW encoding platform dependant. Except maybe selected widgets like synedit. (Borland stores source in utf-8 too on windows) A string whose encoding is unknown is very inconvenient for developers. I don't see that so strongly as most. The idea just saves itself in the RTL because of the eventual need to do some extremely high performance applications. For Lazarus it would be simply a useless inconvenience. Same as above. That is an opinion, not fact, and I don't agree. It is btw not just about performance, but also about predictability. Less encodings in use, means better preditability. If RTL+LCL are in the system encoding (with LCL mostly hiding odd ball libs as QT if that is the widgetset), you have a fair chance not to have a multiencoding app, without thick layers of emu. Or at least keep the encoding dependant part localised to a fairly small part of the program. To be honest, I think a case for LCL follows widget set encoding could also be made. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
In our previous episode, Florian Klaempfl said: A string whose encoding is unknown is very inconvenient for developers. The idea just saves itself in the RTL because of the eventual need to do some extremely high performance applications. For Lazarus it would be simply a useless inconvenience. So how did people work for years with ansistring? Depends on country I guess. Here one simply skips accents, except for a few apps/fields where e.g. own names (of persons, cities etc) are used. At least until recent years. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On Mon, Dec 1, 2008 at 7:24 PM, Florian Klaempfl [EMAIL PROTECTED] wrote: So how did people work for years with ansistring? A ansistring used in the way proposed by FPC is extremely inconvenient for any GUI application which will be run in different parts of the globe. You develop a application in a russian machine, sends it to a english machine and it shows rubbish instead of text. Even if you actually could read that russian GUI. It introduces dependency of what will be shown at runtime with the operating system you are running it. It's exactly the mess Unicode was invented to end with. People worked for years with ansistring suffering from it's short comings. -- Felipe Monteiro de Carvalho ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Luiz Americo Pereira Camara wrote: Martin Friebe escreveu: All the code Widestring := RtlFunction; Utf8string := RtlFunction; will run, it may just perform badly. Yes and no. Let's assume the platforms windows and unix having UnicodeString (UTF-16) and UTF8String as native types respectively. You choose to use UnicodeString type in your app. Using the rtlstring approach you get: Under windows: the native string type of platform is the same as you are using no conversion is taken. Good. Under unix: the native string type of platform is NOT the same as you are using ONE conversion is taken. Bad. Now let's assume that fpc team decided to use a fixed unicode encoding for the RTL. Let's say a UnicodeString RTL. You choose to use UnicodeString type in your app. I never suggested the RTL to be in a fixed encoding. I fully agree that this would be far worse. I suggested to have a rtl, that has overloaded functions for each string type. of course that sounds easier than in fact it will be. Florian pointed out a few issues, like overloading by result is not possible (yet?). And code duplication would be a maintenance hell. But those limits can be overcome. Maybe not in full for the first Unicode fpc release. I can see that in order to get at least something (and in a way forward compatible) to all the waiting users of fpc, the RTLString solution is a good solution (or compromise: full function, limited optimization). The functions that can be overloaded with what fpc already has, could be written for the various types. Maybe a template system for plain functions (like generics for objects) could be found? So code would not be duplicated. Maybe fpc could be extended to allow overloading by result? (sure that has other uses too?) It's just a suggestion. Best Regards Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
On Mon, Dec 1, 2008 at 7:33 PM, Martin Friebe [EMAIL PROTECTED] wrote: I suggested to have a rtl, that has overloaded functions for each string type. of course that sounds easier than in fact it will be. This is about the same as having all string routines in 3 flavours: RTLString, utf-8 and utf-16 the utf-8 and utf-16 could be done by assigning rtlstring to the adequate type. I think this is probably what we will end up with, because users of a particular encoding will build convenience routines for their favorite RTL routines. -- Felipe Monteiro de Carvalho ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unicode and UTF8String
Felipe Monteiro de Carvalho schrieb: On Mon, Dec 1, 2008 at 7:24 PM, Florian Klaempfl [EMAIL PROTECTED] wrote: So how did people work for years with ansistring? A ansistring used in the way proposed by FPC is extremely inconvenient for any GUI application which will be run in different parts of the globe. I meant more that a lot of people simply ignored in their code that ansistrings could be also multibyte even not considering UTF-8. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel