Re: [fpc-devel] String and UnicodeString and UTF8Stringt

2011-01-12 Thread Sven Barth

Am 12.01.2011 07:16, schrieb LacaK:

P.S. I still does not understand, how can things work correctly if LCL
expect that all AnsiStrings (String) are UTF8Strings, byt RTL/FCL does
not strictly follow this (at least in Windows) ?


LCL uses SysToUTF8 and UTF8ToSys if it uses the RTL (and the FCL). This 
is often done with wrappers that wrap the RTL method and do the 
conversion (e.g. FileExistsUTF8, etc.).


Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8Stringt

2011-01-12 Thread LacaK

Sven Barth  wrote / napísal(a):

Am 12.01.2011 07:16, schrieb LacaK:

P.S. I still does not understand, how can things work correctly if LCL
expect that all AnsiStrings (String) are UTF8Strings, byt RTL/FCL does
not strictly follow this (at least in Windows) ?


LCL uses SysToUTF8 and UTF8ToSys if it uses the RTL (and the FCL). 
This is often done with wrappers that wrap the RTL method and do the 
conversion (e.g. FileExistsUTF8, etc.).
As I wrote in any of my previous message, AFAIK this is not true in case 
of fcl-db and Lazarus data-aware components like TDBGrid, TDBEdit ...
They use TField.Text: String property to get string conent of field 
and display them.
AFAIU LCL expects, that TField.Text will always return UTF-8 encoded 
string (because no conversion (SysToUTF8) is done in dbgrids.pas or 
dbedit.inc) , but this is not true always.


So where is error ?
1. Is it wrong expectation by LCL, that TField.Text is always UTF8 string
-or-
2. Is it wrong in implementation of TSQLConnectors, which write data 
into record buffer (of TStringField) and do not convert them always into 
UTF-8 ?
(if data should be always in UTF-8 then it will be good redefine 
TField.Text property like property Text: UTF8String to be clear, that 
we always work with UTF-8 strings)

-or
3. I missed something ? ;-)

-Laco.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re[2]: [fpc-devel] String and UnicodeString and UTF8Stringt

2011-01-12 Thread José Mejuto
Hello FPC,

Wednesday, January 12, 2011, 9:45:47 AM, you wrote:

L 2. Is it wrong in implementation of TSQLConnectors, which write data
L into record buffer (of TStringField) and do not convert them always into
L UTF-8 ?

Do you set the CHARSET field in your TSQLConnector to UTF-8 ? Do you
define the right code page in each field of your database ?

-- 
Best regards,
 José

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8Stringt

2011-01-12 Thread LacaK



L 2. Is it wrong in implementation of TSQLConnectors, which write data
L into record buffer (of TStringField) and do not convert them always into
L UTF-8 ?

Do you set the CHARSET field in your TSQLConnector to UTF-8 ?
not all connectors supports CharSet property. When I look into sources 
only MySQL and IB support them (SQLite always return UTF-8 encoded ... 
ODBC, Postgre and Oracle ignore it)



 Do you define the right code page in each field of your database ?

  
Yes, this is not primary question of database side, but db client 
library api, which is used by SQLConnector to retrieve data.
For example in ODBC we use SQLGetData in LoadField method to retrieve 
data from odbc interface.
And for example in case of MS SQL Server character data are retrieved in 
current ANSI code page (in Windows of course, may be that for example in 
*nix data are retrieved in UTF-8 naturaly) .
(AFAIK there is no universal way how to explicitly request character 
encoding from ODBC interface)


So it is true, that every sql connector is mandatory write character 
data in UTF-8 ?
or can write in some native format (Ansi, UTF-16) ... but in this case 
must somewhere write additional info about actual encoding.


-Laco.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8Stringt

2011-01-12 Thread Martin Schreiber
On Wednesday, 12. January 2011 09.45:47 LacaK wrote:

 So where is error ?
 1. Is it wrong expectation by LCL, that TField.Text is always UTF8 string
 -or-
 2. Is it wrong in implementation of TSQLConnectors, which write data
 into record buffer (of TStringField) and do not convert them always into
 UTF-8 ?
 (if data should be always in UTF-8 then it will be good redefine
 TField.Text property like property Text: UTF8String to be clear, that
 we always work with UTF-8 strings)
 -or
 3. I missed something ? ;-)

MSEgui sqldb version converts to UTF-16 from/to system encoding or utf-8 
(selectable by option properties) and uses FPC 16bit UnicodeString to store 
string field values in the dataset, the tmsestringfield returns UnicodeString 
values. So one can either use utf-8 encoded databaseconnections or 
connections with the current system encoding.
MSEgui uses 16 bit UnicodeString everywhere, the conversion from/to system 
encoding is done transparently by the FPC unicode/widestring-manager if 
necessary.
This is a solution which works now, no additional complicated and possibly 
less performant codepage and encoding aware stringtype necessary...

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re[2]: [fpc-devel] String and UnicodeString and UTF8Stringt

2011-01-12 Thread José Mejuto
Hello FPC,

Wednesday, January 12, 2011, 11:02:00 AM, you wrote:

 L 2. Is it wrong in implementation of TSQLConnectors, which write data
 L into record buffer (of TStringField) and do not convert them always into
 L UTF-8 ?
 Do you set the CHARSET field in your TSQLConnector to UTF-8 ?
L not all connectors supports CharSet property. When I look into sources
L only MySQL and IB support them (SQLite always return UTF-8 encoded ...
L ODBC, Postgre and Oracle ignore it)

So partially it is a lack of support in TSQLConnector. Also UTF-8 in
Firebird does not work as expected due a design decision (I think).

L Yes, this is not primary question of database side,

Oh yes it is! If you miss any of the three steps, it will fail:
1) Database field
2) SQLConnector and Client DLL/so
3) GUI

L but db client library api, which is used by SQLConnector to
L retrieve data.

How an UTF8 SQLConnector can retrieve UTF8 data from a field defined
as binary ? Client libraries have all the needed resources to handle
the database, a different thing is that SQLConnector implements them
and/or do it right.

L For example in ODBC we use SQLGetData in LoadField
L method to retrieve data from odbc interface. And for example in
L case of MS SQL Server character data are retrieved in current ANSI
L code page (in Windows of course, may be that for example in *nix
L data are retrieved in UTF-8 naturaly) .

Via ODBC ?

L (AFAIK there is no universal way how to explicitly request
L character encoding from ODBC interface)

But that's a problem of ODBC, but:

http://web.datadirect.com/resources/odbc/unicode/unix.html

L So it is true, that every sql connector is mandatory write character
L data in UTF-8 ?

No. It is mandatory that you send/receive UTF8 to/from GUI LCL
elements. In case you are using a DBF, in example which does not have
encoding information, you can use the transliterate facility of
dataset, but it is a bit awful.

-- 
Best regards,
 José

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8Stringt

2011-01-12 Thread LacaK

Martin Schreiber  wrote / napísal(a):

On Wednesday, 12. January 2011 09.45:47 LacaK wrote:
  

So where is error ?
1. Is it wrong expectation by LCL, that TField.Text is always UTF8 string
-or-
2. Is it wrong in implementation of TSQLConnectors, which write data
into record buffer (of TStringField) and do not convert them always into
UTF-8 ?
(if data should be always in UTF-8 then it will be good redefine
TField.Text property like property Text: UTF8String to be clear, that
we always work with UTF-8 strings)
-or
3. I missed something ? ;-)


MSEgui sqldb version converts to UTF-16 from/to system encoding or utf-8 
(selectable by option properties) and uses FPC 16bit UnicodeString to store 
string field values in the dataset, the tmsestringfield returns UnicodeString 
values. So one can either use utf-8 encoded databaseconnections or 
connections with the current system encoding.
MSEgui uses 16 bit UnicodeString everywhere, the conversion from/to system 
encoding is done transparently by the FPC unicode/widestring-manager if 
necessary.
This is a solution which works now, no additional complicated and possibly 
less performant codepage and encoding aware stringtype necessary...


  

Yes, sounds logicaly to me.
Then you propose same way for TStringField ? (internaly store as 
UnicodeString UTF-16 and also TStringField.Text should return 
UnicodeString instead of String ? ... what will happens in LCL, when 
visual component will read UTF-16 string, will they be translated into 
UTF-8 automagicaly?)

-Laco.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8Stringt

2011-01-12 Thread Joost van der Sluis
On Wed, 2011-01-12 at 09:45 +0100, LacaK wrote:
 Sven Barth  wrote / napísal(a):
  Am 12.01.2011 07:16, schrieb LacaK:
  P.S. I still does not understand, how can things work correctly if LCL
  expect that all AnsiStrings (String) are UTF8Strings, byt RTL/FCL does
  not strictly follow this (at least in Windows) ?
 
  LCL uses SysToUTF8 and UTF8ToSys if it uses the RTL (and the FCL). 
  This is often done with wrappers that wrap the RTL method and do the 
  conversion (e.g. FileExistsUTF8, etc.).
 As I wrote in any of my previous message, AFAIK this is not true in case 
 of fcl-db and Lazarus data-aware components like TDBGrid, TDBEdit ...
 They use TField.Text: String property to get string conent of field 
 and display them.
 AFAIU LCL expects, that TField.Text will always return UTF-8 encoded 
 string (because no conversion (SysToUTF8) is done in dbgrids.pas or 
 dbedit.inc) , but this is not true always.
 
 So where is error ?
 1. Is it wrong expectation by LCL, that TField.Text is always UTF8 string
 -or-
 2. Is it wrong in implementation of TSQLConnectors, which write data 
 into record buffer (of TStringField) and do not convert them always into 
 UTF-8 ?
 (if data should be always in UTF-8 then it will be good redefine 
 TField.Text property like property Text: UTF8String to be clear, that 
 we always work with UTF-8 strings)
 -or
 3. I missed something ? ;-)

Didn't I explain this to you and others a few times?

The database-components itself are encoding-agnostic. This means:
encoding in = encoding out.

So it is up to the developer what codepage he want to use. So
TField.Text can have the encoding _you_ want.

So, if you want to work with Lazarus, which uses UTF-8, you have to use
UTF-8 encoded strings in your database. 

If there is some strange reason why you don't want the strings in your
database to be UTF-8 encoded, you have to convert the strings from the
encoding your database uses to UTF-8 while reading data from the
database.

Luckily, you can specify the encoding of strings you want to use for
most databases. Not only the encoding in which the strings are stored,
but also the encoding which has to be used when you send and retrieve
data from the database. And you can set this for each connection made.

Ie: you can resolve the problem by changing the connection-string, or by
adding some connection-parameter.

There's also another solution you can find on the forum and other
places. You can convert the strings to UTF-8 not only when they are read
from the database, but also when they are read from the internal memory.
There's a hook for that.

Joost.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8Stringt

2011-01-12 Thread Joost van der Sluis
On Wed, 2011-01-12 at 11:02 +0100, LacaK wrote:
 Yes, this is not primary question of database side, but db client 
 library api, which is used by SQLConnector to retrieve data.
 For example in ODBC we use SQLGetData in LoadField method to retrieve 
 data from odbc interface.
 And for example in case of MS SQL Server character data are retrieved in 
 current ANSI code page (in Windows of course, may be that for example in 
 *nix data are retrieved in UTF-8 naturaly) .
 (AFAIK there is no universal way how to explicitly request character 
 encoding from ODBC interface)

Almost each DB-server has a setting to specify the encoding, which has
to be added to the connection-string.

 So it is true, that every sql connector is mandatory write character 
 data in UTF-8 ?
 or can write in some native format (Ansi, UTF-16) ... but in this case 
 must somewhere write additional info about actual encoding.

If you add a hook that converts this data, yes. (I woudn't do that, use
the database-servers functionality instead)

Joost.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8Stringt

2011-01-12 Thread Martin Schreiber
On Wednesday, 12. January 2011 14.27:14 LacaK wrote:

 Yes, sounds logicaly to me.
 Then you propose same way for TStringField ? (internaly store as
 UnicodeString UTF-16 and also TStringField.Text should return
 UnicodeString instead of String ?

It is done so in MSEgui fork of sqldb.
In case you don't know MSEide+MSEgui, it is here:
http://developer.berlios.de/projects/mseide-msegui/

 ... what will happens in LCL, when 
 visual component will read UTF-16 string, will they be translated into
 UTF-8 automagicaly?)

It works for MSEgui where all strings are utf-16 FPC UnicodeString. It does 
not work for Lazarus with the utf-8 encoded ansistrings.

Martin


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8Stringt

2011-01-12 Thread LacaK



L but db client library api, which is used by SQLConnector to
L retrieve data.

How an UTF8 SQLConnector can retrieve UTF8 data from a field defined
as binary ?

It cann't .
Here I am speaking about TStringField, which is IMHO designed for 
character data, for binary data is designed TBinaryField



L For example in ODBC we use SQLGetData in LoadField
L method to retrieve data from odbc interface. And for example in
L case of MS SQL Server character data are retrieved in current ANSI
L code page (in Windows of course, may be that for example in *nix
L data are retrieved in UTF-8 naturaly) .

Via ODBC ?

L (AFAIK there is no universal way how to explicitly request
L character encoding from ODBC interface)

But that's a problem of ODBC, but:

http://web.datadirect.com/resources/odbc/unicode/unix.html
  

Yes in UNIX world it may be so (I do not know),
but in Windows ODBC we have no such possibility AFAIK


L So it is true, that every sql connector is mandatory write character
L data in UTF-8 ?

No. It is mandatory that you send/receive UTF8 to/from GUI LCL
elements. 
As LCL elements are using TStringField.Text property, then this property 
should return UTF8String, right (not AnsiString in ANSI code page) ?
If yes, then also TStringField must store internaly data in any unicode 
format (to not lose any characters), right ?
So it can be UTF-8, UTF-16 or UTF-32 ... in all cases we must allocate 
space 4*[max.number of characters in field], right ?

So in what encoding are string data stored now in TStringField ?

-Laco.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8Stringt

2011-01-12 Thread Joost van der Sluis
On Wed, 2011-01-12 at 14:59 +0100, LacaK wrote:

  No. It is mandatory that you send/receive UTF8 to/from GUI LCL
  elements. 
 As LCL elements are using TStringField.Text property, then this property 
 should return UTF8String, right (not AnsiString in ANSI code page) ?
 If yes, then also TStringField must store internaly data in any unicode 
 format (to not lose any characters), right ?
 So it can be UTF-8, UTF-16 or UTF-32 ... in all cases we must allocate 
 space 4*[max.number of characters in field], right ?
 So in what encoding are string data stored now in TStringField ?

The encoding you've specified. In the connection-string or some other
database-server dependent setting.

Not that when you want to use UTF-16 (or 32) you have to use
TWideStringFields.

Joost.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re[2]: [fpc-devel] String and UnicodeString and UTF8Stringt

2011-01-12 Thread José Mejuto
Hello FPC,

Wednesday, January 12, 2011, 2:59:53 PM, you wrote:

 L but db client library api, which is used by SQLConnector to
 L retrieve data.
 How an UTF8 SQLConnector can retrieve UTF8 data from a field defined
 as binary ?
L It cann't .
L Here I am speaking about TStringField, which is IMHO designed for 
L character data, for binary data is designed TBinaryField

And a binary field is an string without encoding, collate and other
text explicit attributes.

 But that's a problem of ODBC, but:
 http://web.datadirect.com/resources/odbc/unicode/unix.html
L Yes in UNIX world it may be so (I do not know),
L but in Windows ODBC we have no such possibility AFAIK

Quote from Microsoft:
The ODBC 3.5 (or higher) Driver Manager supports both ANSI and
Unicode versions of all functions that accept pointers to character
strings or SQLPOINTER in their arguments. The Unicode functions are
implemented as functions (with a suffix of W), not as macros. The ANSI
functions (which can be called with or without a suffix of A) are
identical to the current ODBC API functions.

ODBC 3.5 was launched around 2000-2001.

L So it can be UTF-8, UTF-16 or UTF-32 ... in all cases we must allocate
L space 4*[max.number of characters in field], right ?
L So in what encoding are string data stored now in TStringField ?

In the same format the database bring them to it. Database returns a
bunch of bytes and a description of that bytes.

-- 
Best regards,
 José

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8Stringt

2011-01-12 Thread Hans-Peter Diettrich

LacaK schrieb:

...: the new ansistring type has a hidden element size field (in 
addition to the reference count, length and codepage), and from what I 
can see at page 10 of 
http://edn.embarcadero.com/article/images/38980/Delphi_and_Unicode.pdf, 
Delphi 2009's unicodestring is simply an ansistring(1200).
So it seems, that if we will have any GenericString, with properties 
reference count, size, character width, codepage, then all other 
string types can be based on this string type. So other strings will be 
only any shortcuts, and internaly will use same structure:
AnsiString = GenericString(with actual system ANSI code page (0) ... or 
... without any explicit codepage ($))

UTF8String = GenericString(with UTF-8 encoding)
UnicodeString = GenericString(with UTF-16 encoding)


Nice from management view, but resulting in an ugly implementation. 
Apart from the generic form of (internal) subroutines we still need 
explicit code for most variations. Also translation tables for *all* 
codpages must become part of every executable.


A true polymorphic string class (or equivalent) would be more 
performant, and would allow to add only really used codepages to the 
applications. Such an implementation could add another VMT pointer to 
the string prefix, and the UnicodeString could be implemented by a 
simple type cast from any (generic) string reference into a class reference.



Where is not agreement, it is fact what should be default string 
encoding (AnsiString($) or UTF-8 or UTF-16 or UTF-32)


The default (internal) string type must be an UTF type, else losses are 
inevitable during (implicit) conversions. This means that SBCS 
AnsiString never can become the default encoding.


The default type could be made platform dependent, so that UTF-16 would 
be used for Windows and UTF-8 for Linux platforms. But this will cause 
problems with code that assumes exactly one of these encodings, and uses 
indexed access to characters, when such code is recompiled for a 
platform with a different default encoding. The introduction of another 
type OSString or TFileName can eliminate many implicit conversions in 
passing such strings to subroutines, but OTOH can cause slowdown of all 
other operations with that string type.


I'd ban indexed access at all, in the future, unless the default 
encoding is UTF-32; else the user has to accept an possible more or less 
significant slowdown of his code, what stands in contrast to the 
*intented* optimization by direct (indexed) access to the string content.


Delphi has eliminated that discussion by declaring the (default) 
UnicodeString fixed to UTF-16, for all targets. The only remaining 
question is, whether this was the best choice at all.



P.S. I still does not understand, how can things work correctly if LCL 
expect that all AnsiStrings (String) are UTF8Strings, byt RTL/FCL does 
not strictly follow this (at least in Windows) ?


Right, UTF8String should be really different from AnsiString, so that 
all eventually required conversions can be inserted by the compiler.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8Stringt

2011-01-11 Thread Marco van de Voort
In our previous episode, Michael Schnell said:
 I had hoped that using the dynamically encoded string type nearly 
 everywhere would allow for a great lot of not OS-specific code in the 
 VCL (and LCL) without the need for excessive conversions maintaining the 
 systems' coding (UTF-16 or UTF-8) in and out with GUI-centric user code.

That was our original idea. But it also required the input granularity (1,2
maybe 4) to be a variable.
 
 I thought this would have been the main reason for introducing the 
 additional complexity of the dynamically encoded string type.

Embacadero however decided otherwise and kept a wall between the 1 and 2
byte types. So at least 1 and 2 byte types as basetype are different
targets.

I still have to study Jonas last message. It seems to indicate that I
misunderstood what rawbytestring.  If that is true, Jonas is right,
separating the targets will result in two targets (rawbytestring and
unicodestring)
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8Stringt

2011-01-11 Thread Jonas Maebe


On 11 Jan 2011, at 10:47, Marco van de Voort wrote:


I still have to study Jonas last message. It seems to indicate that I
misunderstood what rawbytestring.  If that is true, Jonas is right,
separating the targets will result in two targets (rawbytestring and
unicodestring)


Here's some nice explanation about how rawbytestring behaves in  
practice: http://www.micro-isv.asia/2008/08/using-rawbytestring-effectively/


And here's an answer by Barry Kelly to a post about rawbytestring  
explaining what the purpose of the type is (similar to what I said): http://www.codegod.de/WebAppCodeGod/Delphi-2009-RawByteString-vagaries-QID85470.aspx


He mentions using them as parameter types to reduce the number of  
overloads, but I'm still wondering about var-parameters in particular.  
I would guess that it may very well be forbidden to pass an  
ansistring(0) to a rawbytestring var-parameter, so it would still not  
solve everything in that case (and if it's not forbidden, I'm curious  
how you can obtain the statically defined codepage of the  
ansistring(0) at the callee side in case the input string was empty).



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8Stringt

2011-01-11 Thread Michael Schnell

On 01/11/2011 10:47 AM, Marco van de Voort wrote:

But it also required the input granularity (1,2
maybe 4) to be a variable.

Sorry I don't understand what you mean with this.

Embacadero however decided otherwise and kept a wall between the 1 and 2
byte types. So at least 1 and 2 byte types as basetype are different
targets.
Unfortunately I don't have Delphi  2007. From what O read I understand 
that the dynamically code string type can hold 1, 2, and 4 byte (maybe 
even more) Codes for it's elements (denoted in one control-value) and 
each of those (theoretically) in different coding schemes (denoted in 
another control-value), allowing e.g. for UTF-8, UTF-16, UCS4, German 
ANSI, raw Byte, string


Each assignment would auto recode the string if necessary. I suppose 
that s1 := s2 would not do any recoding, but s1 := s2 + s3; would 
automatically synchronize the coding.


I suppose there are ways do define the coding (and force recoding), 
maybe similar to setlength(s, 10).


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8Stringt

2011-01-11 Thread Michael Schnell

On 01/11/2011 11:11 AM, Jonas Maebe wrote:


 in case the input string was empty).

As the coding scheme and element size are control-block-variables it 
seems that even an empty string should have the appropriate definitions.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8Stringt

2011-01-11 Thread Marco van de Voort
In our previous episode, Michael Schnell said:
 Sorry I don't understand what you mean with this.
  Embacadero however decided otherwise and kept a wall between the 1 and 2
  byte types. So at least 1 and 2 byte types as basetype are different
  targets.
 Unfortunately I don't have Delphi  2007. From what O read I understand 
 that the dynamically code string type can hold 1, 2, and 4 byte (maybe 
 even more) Codes for it's elements (denoted in one control-value) and 
 each of those (theoretically) in different coding schemes (denoted in 
 another control-value), allowing e.g. for UTF-8, UTF-16, UCS4, German 
 ANSI, raw Byte, string

That is wrong. Better read up on that.
 
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8Stringt

2011-01-11 Thread Michael Schnell

On 01/11/2011 02:05 PM, Marco van de Voort wrote:


That is wrong. Better read up on that.
AFAIK, this is what they announced some time ago, Seemingly it turned 
out to be done some other way...


Nonetheless fpc seems to intend to offer something like this (right now 
in an experimental branch).


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8Stringt

2011-01-11 Thread Jonas Maebe


On 11 Jan 2011, at 10:47, Marco van de Voort wrote:

Embacadero however decided otherwise and kept a wall between the 1  
and 2

byte types. So at least 1 and 2 byte types as basetype are different
targets.


I'm actually not sure about that: the new ansistring type has a hidden  
element size field (in addition to the reference count, length and  
codepage), and from what I can see at page 10 of http://edn.embarcadero.com/article/images/38980/Delphi_and_Unicode.pdf 
, Delphi 2009's unicodestring is simply an ansistring(1200).



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8Stringt

2011-01-11 Thread LacaK




...: the new ansistring type has a hidden element size field (in 
addition to the reference count, length and codepage), and from what I 
can see at page 10 of 
http://edn.embarcadero.com/article/images/38980/Delphi_and_Unicode.pdf, 
Delphi 2009's unicodestring is simply an ansistring(1200).
So it seems, that if we will have any GenericString, with properties 
reference count, size, character width, codepage, then all other 
string types can be based on this string type. So other strings will be 
only any shortcuts, and internaly will use same structure:
AnsiString = GenericString(with actual system ANSI code page (0) ... or 
... without any explicit codepage ($))

UTF8String = GenericString(with UTF-8 encoding)
UnicodeString = GenericString(with UTF-16 encoding)

So it seems to me, that there is agreement on adding character width, 
codepage to internal string record structure and provide conversions 
where needed, isn't it ? (more or less same approach like in Delphi)


Where is not agreement, it is fact what should be default string 
encoding (AnsiString($) or UTF-8 or UTF-16 or UTF-32)


So if I revert to my original question ... is there any agreement on 
some points related to future of String type ?


P.S. I still does not understand, how can things work correctly if LCL 
expect that all AnsiStrings (String) are UTF8Strings, byt RTL/FCL does 
not strictly follow this (at least in Windows) ?


-Laco.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel