Re: [fpc-devel] String and UnicodeString and UTF8Stringt
Am 12.01.2011 07:16, schrieb LacaK: P.S. I still does not understand, how can things work correctly if LCL expect that all AnsiStrings (String) are UTF8Strings, byt RTL/FCL does not strictly follow this (at least in Windows) ? LCL uses SysToUTF8 and UTF8ToSys if it uses the RTL (and the FCL). This is often done with wrappers that wrap the RTL method and do the conversion (e.g. FileExistsUTF8, etc.). Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8Stringt
Sven Barth wrote / napísal(a): Am 12.01.2011 07:16, schrieb LacaK: P.S. I still does not understand, how can things work correctly if LCL expect that all AnsiStrings (String) are UTF8Strings, byt RTL/FCL does not strictly follow this (at least in Windows) ? LCL uses SysToUTF8 and UTF8ToSys if it uses the RTL (and the FCL). This is often done with wrappers that wrap the RTL method and do the conversion (e.g. FileExistsUTF8, etc.). As I wrote in any of my previous message, AFAIK this is not true in case of fcl-db and Lazarus data-aware components like TDBGrid, TDBEdit ... They use TField.Text: String property to get string conent of field and display them. AFAIU LCL expects, that TField.Text will always return UTF-8 encoded string (because no conversion (SysToUTF8) is done in dbgrids.pas or dbedit.inc) , but this is not true always. So where is error ? 1. Is it wrong expectation by LCL, that TField.Text is always UTF8 string -or- 2. Is it wrong in implementation of TSQLConnectors, which write data into record buffer (of TStringField) and do not convert them always into UTF-8 ? (if data should be always in UTF-8 then it will be good redefine TField.Text property like property Text: UTF8String to be clear, that we always work with UTF-8 strings) -or 3. I missed something ? ;-) -Laco. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re[2]: [fpc-devel] String and UnicodeString and UTF8Stringt
Hello FPC, Wednesday, January 12, 2011, 9:45:47 AM, you wrote: L 2. Is it wrong in implementation of TSQLConnectors, which write data L into record buffer (of TStringField) and do not convert them always into L UTF-8 ? Do you set the CHARSET field in your TSQLConnector to UTF-8 ? Do you define the right code page in each field of your database ? -- Best regards, José ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8Stringt
L 2. Is it wrong in implementation of TSQLConnectors, which write data L into record buffer (of TStringField) and do not convert them always into L UTF-8 ? Do you set the CHARSET field in your TSQLConnector to UTF-8 ? not all connectors supports CharSet property. When I look into sources only MySQL and IB support them (SQLite always return UTF-8 encoded ... ODBC, Postgre and Oracle ignore it) Do you define the right code page in each field of your database ? Yes, this is not primary question of database side, but db client library api, which is used by SQLConnector to retrieve data. For example in ODBC we use SQLGetData in LoadField method to retrieve data from odbc interface. And for example in case of MS SQL Server character data are retrieved in current ANSI code page (in Windows of course, may be that for example in *nix data are retrieved in UTF-8 naturaly) . (AFAIK there is no universal way how to explicitly request character encoding from ODBC interface) So it is true, that every sql connector is mandatory write character data in UTF-8 ? or can write in some native format (Ansi, UTF-16) ... but in this case must somewhere write additional info about actual encoding. -Laco. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8Stringt
On Wednesday, 12. January 2011 09.45:47 LacaK wrote: So where is error ? 1. Is it wrong expectation by LCL, that TField.Text is always UTF8 string -or- 2. Is it wrong in implementation of TSQLConnectors, which write data into record buffer (of TStringField) and do not convert them always into UTF-8 ? (if data should be always in UTF-8 then it will be good redefine TField.Text property like property Text: UTF8String to be clear, that we always work with UTF-8 strings) -or 3. I missed something ? ;-) MSEgui sqldb version converts to UTF-16 from/to system encoding or utf-8 (selectable by option properties) and uses FPC 16bit UnicodeString to store string field values in the dataset, the tmsestringfield returns UnicodeString values. So one can either use utf-8 encoded databaseconnections or connections with the current system encoding. MSEgui uses 16 bit UnicodeString everywhere, the conversion from/to system encoding is done transparently by the FPC unicode/widestring-manager if necessary. This is a solution which works now, no additional complicated and possibly less performant codepage and encoding aware stringtype necessary... Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re[2]: [fpc-devel] String and UnicodeString and UTF8Stringt
Hello FPC, Wednesday, January 12, 2011, 11:02:00 AM, you wrote: L 2. Is it wrong in implementation of TSQLConnectors, which write data L into record buffer (of TStringField) and do not convert them always into L UTF-8 ? Do you set the CHARSET field in your TSQLConnector to UTF-8 ? L not all connectors supports CharSet property. When I look into sources L only MySQL and IB support them (SQLite always return UTF-8 encoded ... L ODBC, Postgre and Oracle ignore it) So partially it is a lack of support in TSQLConnector. Also UTF-8 in Firebird does not work as expected due a design decision (I think). L Yes, this is not primary question of database side, Oh yes it is! If you miss any of the three steps, it will fail: 1) Database field 2) SQLConnector and Client DLL/so 3) GUI L but db client library api, which is used by SQLConnector to L retrieve data. How an UTF8 SQLConnector can retrieve UTF8 data from a field defined as binary ? Client libraries have all the needed resources to handle the database, a different thing is that SQLConnector implements them and/or do it right. L For example in ODBC we use SQLGetData in LoadField L method to retrieve data from odbc interface. And for example in L case of MS SQL Server character data are retrieved in current ANSI L code page (in Windows of course, may be that for example in *nix L data are retrieved in UTF-8 naturaly) . Via ODBC ? L (AFAIK there is no universal way how to explicitly request L character encoding from ODBC interface) But that's a problem of ODBC, but: http://web.datadirect.com/resources/odbc/unicode/unix.html L So it is true, that every sql connector is mandatory write character L data in UTF-8 ? No. It is mandatory that you send/receive UTF8 to/from GUI LCL elements. In case you are using a DBF, in example which does not have encoding information, you can use the transliterate facility of dataset, but it is a bit awful. -- Best regards, José ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8Stringt
Martin Schreiber wrote / napísal(a): On Wednesday, 12. January 2011 09.45:47 LacaK wrote: So where is error ? 1. Is it wrong expectation by LCL, that TField.Text is always UTF8 string -or- 2. Is it wrong in implementation of TSQLConnectors, which write data into record buffer (of TStringField) and do not convert them always into UTF-8 ? (if data should be always in UTF-8 then it will be good redefine TField.Text property like property Text: UTF8String to be clear, that we always work with UTF-8 strings) -or 3. I missed something ? ;-) MSEgui sqldb version converts to UTF-16 from/to system encoding or utf-8 (selectable by option properties) and uses FPC 16bit UnicodeString to store string field values in the dataset, the tmsestringfield returns UnicodeString values. So one can either use utf-8 encoded databaseconnections or connections with the current system encoding. MSEgui uses 16 bit UnicodeString everywhere, the conversion from/to system encoding is done transparently by the FPC unicode/widestring-manager if necessary. This is a solution which works now, no additional complicated and possibly less performant codepage and encoding aware stringtype necessary... Yes, sounds logicaly to me. Then you propose same way for TStringField ? (internaly store as UnicodeString UTF-16 and also TStringField.Text should return UnicodeString instead of String ? ... what will happens in LCL, when visual component will read UTF-16 string, will they be translated into UTF-8 automagicaly?) -Laco. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8Stringt
On Wed, 2011-01-12 at 09:45 +0100, LacaK wrote: Sven Barth wrote / napísal(a): Am 12.01.2011 07:16, schrieb LacaK: P.S. I still does not understand, how can things work correctly if LCL expect that all AnsiStrings (String) are UTF8Strings, byt RTL/FCL does not strictly follow this (at least in Windows) ? LCL uses SysToUTF8 and UTF8ToSys if it uses the RTL (and the FCL). This is often done with wrappers that wrap the RTL method and do the conversion (e.g. FileExistsUTF8, etc.). As I wrote in any of my previous message, AFAIK this is not true in case of fcl-db and Lazarus data-aware components like TDBGrid, TDBEdit ... They use TField.Text: String property to get string conent of field and display them. AFAIU LCL expects, that TField.Text will always return UTF-8 encoded string (because no conversion (SysToUTF8) is done in dbgrids.pas or dbedit.inc) , but this is not true always. So where is error ? 1. Is it wrong expectation by LCL, that TField.Text is always UTF8 string -or- 2. Is it wrong in implementation of TSQLConnectors, which write data into record buffer (of TStringField) and do not convert them always into UTF-8 ? (if data should be always in UTF-8 then it will be good redefine TField.Text property like property Text: UTF8String to be clear, that we always work with UTF-8 strings) -or 3. I missed something ? ;-) Didn't I explain this to you and others a few times? The database-components itself are encoding-agnostic. This means: encoding in = encoding out. So it is up to the developer what codepage he want to use. So TField.Text can have the encoding _you_ want. So, if you want to work with Lazarus, which uses UTF-8, you have to use UTF-8 encoded strings in your database. If there is some strange reason why you don't want the strings in your database to be UTF-8 encoded, you have to convert the strings from the encoding your database uses to UTF-8 while reading data from the database. Luckily, you can specify the encoding of strings you want to use for most databases. Not only the encoding in which the strings are stored, but also the encoding which has to be used when you send and retrieve data from the database. And you can set this for each connection made. Ie: you can resolve the problem by changing the connection-string, or by adding some connection-parameter. There's also another solution you can find on the forum and other places. You can convert the strings to UTF-8 not only when they are read from the database, but also when they are read from the internal memory. There's a hook for that. Joost. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8Stringt
On Wed, 2011-01-12 at 11:02 +0100, LacaK wrote: Yes, this is not primary question of database side, but db client library api, which is used by SQLConnector to retrieve data. For example in ODBC we use SQLGetData in LoadField method to retrieve data from odbc interface. And for example in case of MS SQL Server character data are retrieved in current ANSI code page (in Windows of course, may be that for example in *nix data are retrieved in UTF-8 naturaly) . (AFAIK there is no universal way how to explicitly request character encoding from ODBC interface) Almost each DB-server has a setting to specify the encoding, which has to be added to the connection-string. So it is true, that every sql connector is mandatory write character data in UTF-8 ? or can write in some native format (Ansi, UTF-16) ... but in this case must somewhere write additional info about actual encoding. If you add a hook that converts this data, yes. (I woudn't do that, use the database-servers functionality instead) Joost. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8Stringt
On Wednesday, 12. January 2011 14.27:14 LacaK wrote: Yes, sounds logicaly to me. Then you propose same way for TStringField ? (internaly store as UnicodeString UTF-16 and also TStringField.Text should return UnicodeString instead of String ? It is done so in MSEgui fork of sqldb. In case you don't know MSEide+MSEgui, it is here: http://developer.berlios.de/projects/mseide-msegui/ ... what will happens in LCL, when visual component will read UTF-16 string, will they be translated into UTF-8 automagicaly?) It works for MSEgui where all strings are utf-16 FPC UnicodeString. It does not work for Lazarus with the utf-8 encoded ansistrings. Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8Stringt
L but db client library api, which is used by SQLConnector to L retrieve data. How an UTF8 SQLConnector can retrieve UTF8 data from a field defined as binary ? It cann't . Here I am speaking about TStringField, which is IMHO designed for character data, for binary data is designed TBinaryField L For example in ODBC we use SQLGetData in LoadField L method to retrieve data from odbc interface. And for example in L case of MS SQL Server character data are retrieved in current ANSI L code page (in Windows of course, may be that for example in *nix L data are retrieved in UTF-8 naturaly) . Via ODBC ? L (AFAIK there is no universal way how to explicitly request L character encoding from ODBC interface) But that's a problem of ODBC, but: http://web.datadirect.com/resources/odbc/unicode/unix.html Yes in UNIX world it may be so (I do not know), but in Windows ODBC we have no such possibility AFAIK L So it is true, that every sql connector is mandatory write character L data in UTF-8 ? No. It is mandatory that you send/receive UTF8 to/from GUI LCL elements. As LCL elements are using TStringField.Text property, then this property should return UTF8String, right (not AnsiString in ANSI code page) ? If yes, then also TStringField must store internaly data in any unicode format (to not lose any characters), right ? So it can be UTF-8, UTF-16 or UTF-32 ... in all cases we must allocate space 4*[max.number of characters in field], right ? So in what encoding are string data stored now in TStringField ? -Laco. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8Stringt
On Wed, 2011-01-12 at 14:59 +0100, LacaK wrote: No. It is mandatory that you send/receive UTF8 to/from GUI LCL elements. As LCL elements are using TStringField.Text property, then this property should return UTF8String, right (not AnsiString in ANSI code page) ? If yes, then also TStringField must store internaly data in any unicode format (to not lose any characters), right ? So it can be UTF-8, UTF-16 or UTF-32 ... in all cases we must allocate space 4*[max.number of characters in field], right ? So in what encoding are string data stored now in TStringField ? The encoding you've specified. In the connection-string or some other database-server dependent setting. Not that when you want to use UTF-16 (or 32) you have to use TWideStringFields. Joost. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re[2]: [fpc-devel] String and UnicodeString and UTF8Stringt
Hello FPC, Wednesday, January 12, 2011, 2:59:53 PM, you wrote: L but db client library api, which is used by SQLConnector to L retrieve data. How an UTF8 SQLConnector can retrieve UTF8 data from a field defined as binary ? L It cann't . L Here I am speaking about TStringField, which is IMHO designed for L character data, for binary data is designed TBinaryField And a binary field is an string without encoding, collate and other text explicit attributes. But that's a problem of ODBC, but: http://web.datadirect.com/resources/odbc/unicode/unix.html L Yes in UNIX world it may be so (I do not know), L but in Windows ODBC we have no such possibility AFAIK Quote from Microsoft: The ODBC 3.5 (or higher) Driver Manager supports both ANSI and Unicode versions of all functions that accept pointers to character strings or SQLPOINTER in their arguments. The Unicode functions are implemented as functions (with a suffix of W), not as macros. The ANSI functions (which can be called with or without a suffix of A) are identical to the current ODBC API functions. ODBC 3.5 was launched around 2000-2001. L So it can be UTF-8, UTF-16 or UTF-32 ... in all cases we must allocate L space 4*[max.number of characters in field], right ? L So in what encoding are string data stored now in TStringField ? In the same format the database bring them to it. Database returns a bunch of bytes and a description of that bytes. -- Best regards, José ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8Stringt
LacaK schrieb: ...: the new ansistring type has a hidden element size field (in addition to the reference count, length and codepage), and from what I can see at page 10 of http://edn.embarcadero.com/article/images/38980/Delphi_and_Unicode.pdf, Delphi 2009's unicodestring is simply an ansistring(1200). So it seems, that if we will have any GenericString, with properties reference count, size, character width, codepage, then all other string types can be based on this string type. So other strings will be only any shortcuts, and internaly will use same structure: AnsiString = GenericString(with actual system ANSI code page (0) ... or ... without any explicit codepage ($)) UTF8String = GenericString(with UTF-8 encoding) UnicodeString = GenericString(with UTF-16 encoding) Nice from management view, but resulting in an ugly implementation. Apart from the generic form of (internal) subroutines we still need explicit code for most variations. Also translation tables for *all* codpages must become part of every executable. A true polymorphic string class (or equivalent) would be more performant, and would allow to add only really used codepages to the applications. Such an implementation could add another VMT pointer to the string prefix, and the UnicodeString could be implemented by a simple type cast from any (generic) string reference into a class reference. Where is not agreement, it is fact what should be default string encoding (AnsiString($) or UTF-8 or UTF-16 or UTF-32) The default (internal) string type must be an UTF type, else losses are inevitable during (implicit) conversions. This means that SBCS AnsiString never can become the default encoding. The default type could be made platform dependent, so that UTF-16 would be used for Windows and UTF-8 for Linux platforms. But this will cause problems with code that assumes exactly one of these encodings, and uses indexed access to characters, when such code is recompiled for a platform with a different default encoding. The introduction of another type OSString or TFileName can eliminate many implicit conversions in passing such strings to subroutines, but OTOH can cause slowdown of all other operations with that string type. I'd ban indexed access at all, in the future, unless the default encoding is UTF-32; else the user has to accept an possible more or less significant slowdown of his code, what stands in contrast to the *intented* optimization by direct (indexed) access to the string content. Delphi has eliminated that discussion by declaring the (default) UnicodeString fixed to UTF-16, for all targets. The only remaining question is, whether this was the best choice at all. P.S. I still does not understand, how can things work correctly if LCL expect that all AnsiStrings (String) are UTF8Strings, byt RTL/FCL does not strictly follow this (at least in Windows) ? Right, UTF8String should be really different from AnsiString, so that all eventually required conversions can be inserted by the compiler. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8Stringt
In our previous episode, Michael Schnell said: I had hoped that using the dynamically encoded string type nearly everywhere would allow for a great lot of not OS-specific code in the VCL (and LCL) without the need for excessive conversions maintaining the systems' coding (UTF-16 or UTF-8) in and out with GUI-centric user code. That was our original idea. But it also required the input granularity (1,2 maybe 4) to be a variable. I thought this would have been the main reason for introducing the additional complexity of the dynamically encoded string type. Embacadero however decided otherwise and kept a wall between the 1 and 2 byte types. So at least 1 and 2 byte types as basetype are different targets. I still have to study Jonas last message. It seems to indicate that I misunderstood what rawbytestring. If that is true, Jonas is right, separating the targets will result in two targets (rawbytestring and unicodestring) ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8Stringt
On 11 Jan 2011, at 10:47, Marco van de Voort wrote: I still have to study Jonas last message. It seems to indicate that I misunderstood what rawbytestring. If that is true, Jonas is right, separating the targets will result in two targets (rawbytestring and unicodestring) Here's some nice explanation about how rawbytestring behaves in practice: http://www.micro-isv.asia/2008/08/using-rawbytestring-effectively/ And here's an answer by Barry Kelly to a post about rawbytestring explaining what the purpose of the type is (similar to what I said): http://www.codegod.de/WebAppCodeGod/Delphi-2009-RawByteString-vagaries-QID85470.aspx He mentions using them as parameter types to reduce the number of overloads, but I'm still wondering about var-parameters in particular. I would guess that it may very well be forbidden to pass an ansistring(0) to a rawbytestring var-parameter, so it would still not solve everything in that case (and if it's not forbidden, I'm curious how you can obtain the statically defined codepage of the ansistring(0) at the callee side in case the input string was empty). Jonas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8Stringt
On 01/11/2011 10:47 AM, Marco van de Voort wrote: But it also required the input granularity (1,2 maybe 4) to be a variable. Sorry I don't understand what you mean with this. Embacadero however decided otherwise and kept a wall between the 1 and 2 byte types. So at least 1 and 2 byte types as basetype are different targets. Unfortunately I don't have Delphi 2007. From what O read I understand that the dynamically code string type can hold 1, 2, and 4 byte (maybe even more) Codes for it's elements (denoted in one control-value) and each of those (theoretically) in different coding schemes (denoted in another control-value), allowing e.g. for UTF-8, UTF-16, UCS4, German ANSI, raw Byte, string Each assignment would auto recode the string if necessary. I suppose that s1 := s2 would not do any recoding, but s1 := s2 + s3; would automatically synchronize the coding. I suppose there are ways do define the coding (and force recoding), maybe similar to setlength(s, 10). -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8Stringt
On 01/11/2011 11:11 AM, Jonas Maebe wrote: in case the input string was empty). As the coding scheme and element size are control-block-variables it seems that even an empty string should have the appropriate definitions. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8Stringt
In our previous episode, Michael Schnell said: Sorry I don't understand what you mean with this. Embacadero however decided otherwise and kept a wall between the 1 and 2 byte types. So at least 1 and 2 byte types as basetype are different targets. Unfortunately I don't have Delphi 2007. From what O read I understand that the dynamically code string type can hold 1, 2, and 4 byte (maybe even more) Codes for it's elements (denoted in one control-value) and each of those (theoretically) in different coding schemes (denoted in another control-value), allowing e.g. for UTF-8, UTF-16, UCS4, German ANSI, raw Byte, string That is wrong. Better read up on that. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8Stringt
On 01/11/2011 02:05 PM, Marco van de Voort wrote: That is wrong. Better read up on that. AFAIK, this is what they announced some time ago, Seemingly it turned out to be done some other way... Nonetheless fpc seems to intend to offer something like this (right now in an experimental branch). -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8Stringt
On 11 Jan 2011, at 10:47, Marco van de Voort wrote: Embacadero however decided otherwise and kept a wall between the 1 and 2 byte types. So at least 1 and 2 byte types as basetype are different targets. I'm actually not sure about that: the new ansistring type has a hidden element size field (in addition to the reference count, length and codepage), and from what I can see at page 10 of http://edn.embarcadero.com/article/images/38980/Delphi_and_Unicode.pdf , Delphi 2009's unicodestring is simply an ansistring(1200). Jonas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8Stringt
...: the new ansistring type has a hidden element size field (in addition to the reference count, length and codepage), and from what I can see at page 10 of http://edn.embarcadero.com/article/images/38980/Delphi_and_Unicode.pdf, Delphi 2009's unicodestring is simply an ansistring(1200). So it seems, that if we will have any GenericString, with properties reference count, size, character width, codepage, then all other string types can be based on this string type. So other strings will be only any shortcuts, and internaly will use same structure: AnsiString = GenericString(with actual system ANSI code page (0) ... or ... without any explicit codepage ($)) UTF8String = GenericString(with UTF-8 encoding) UnicodeString = GenericString(with UTF-16 encoding) So it seems to me, that there is agreement on adding character width, codepage to internal string record structure and provide conversions where needed, isn't it ? (more or less same approach like in Delphi) Where is not agreement, it is fact what should be default string encoding (AnsiString($) or UTF-8 or UTF-16 or UTF-32) So if I revert to my original question ... is there any agreement on some points related to future of String type ? P.S. I still does not understand, how can things work correctly if LCL expect that all AnsiStrings (String) are UTF8Strings, byt RTL/FCL does not strictly follow this (at least in Windows) ? -Laco. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel