Re: [fpc-devel] Re: new 27 page document describing Unicode support in D2009

2008-12-21 Thread Zaher Dirkey
2008/11/21 Graeme Geldenhuys graemeg.li...@gmail.com:
Memo1.Lines.SaveToFile('test.txt',  TEncoding.Unicode);

I am not agree to make TStrings responsible of converting to another
encoding, it is enough for us save and load the text as it in the same
application (TStringList).
We need to convert when need to read result text from another
applications to send it, there you need some functions to make
converting pass the TStrings as params.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


[fpc-devel] Re: new 27 page document describing Unicode support in D2009

2008-11-21 Thread Graeme Geldenhuys
On Fri, Nov 21, 2008 at 11:08 PM, Graeme Geldenhuys
[EMAIL PROTECTED] wrote:

 I thought you guys might find this interesting. It's a new 27 page
 document describing Unicode support in D2009.

 http://dn.codegear.com/article/38980

Seeing that I don't own D2009 and only read about it's Unicode support
I found some of the information interesting - and it was things we
argued about in this mailing list.

For example:

1...
  Length() returns the bytes for UTF8String
  but Length() returns the elements (what we know as characters) for
String or UTF16 strings.
  Length() also returns bytes for AnsiString.


var
  str8: Utf8String;
  str16: string;
begin
  str8 := 'Cantù';
  Memo1.Lines.Add ('UTF-8');
  Memo1.Lines.Add('Length: ' + IntToStr (Length (str8)));
  Memo1.Lines.Add('5: ' + IntToStr (Ord (str8[5])));
  Memo1.Lines.Add('6: ' + IntToStr (Ord (str8[6])));
  str16 := str8;
  Memo1.Lines.Add ('UTF-16');
  Memo1.Lines.Add('Length: ' + IntToStr (Length (str16)));
  Memo1.Lines.Add('5: ' + IntToStr (Ord (str16[5])));
As you might expect, the str8 string has a length of 6 (meaning 6
bytes), while the str16
string has a length of 5 (meaning 10 bytes, though). Notice that
Length invariably returns the
number of string elements, which in case of variable-length
representations don't match the
number of Unicode code points represented by the string. This is the
output of the program:
UTF-8
Length: 6
5: 195
6: 185
UTF-16
Length: 5
5: 249



2...   TStrings can now take an encoding parameter to specify how it
should load or save files.

-
STREAMING TSTRINGS
The ReadFromFile and WriteToFile methods of the TStrings class can be
called with
an encoding. If you write a string list to text file without providing
a specific encoding, the class
will use TEncoding.Default, which uses the internal DefaultEncoding in turn
extracted at the first occurrence by the current Windows code page. In
other words, if you save
a file you'll get the same ANSI file as before.
Of course, you can also easily force the file to a different format,
for example the UTF-16 format:

Memo1.Lines.SaveToFile('test.txt',  TEncoding.Unicode);
-


anyway, there are a lot more interesting facts in this document. Well
worth reading to get a better understanding of unicode.


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Re: new 27 page document describing Unicode support in D2009

2008-11-21 Thread Marco van de Voort
In our previous episode, Graeme Geldenhuys said:
  I thought you guys might find this interesting. It's a new 27 page
  document describing Unicode support in D2009.
 
  http://dn.codegear.com/article/38980
 
 Seeing that I don't own D2009 and only read about it's Unicode support
 I found some of the information interesting - and it was things we
 argued about in this mailing list.

This is all information that is already on the blogs since July. Note that
Tcharacter is a sealed class, something that FPC doesn't support yet.

The whole tencoding/tcharacter is a bastard-class stuff seems to be out of .NET
compatibility (as noted in the document), but Borland changed course of its
.NET efforts after Tiburon. Sigh.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Re: new 27 page document describing Unicode support in D2009

2008-11-21 Thread Luiz Americo Pereira Camara

Graeme Geldenhuys escreveu:

On Fri, Nov 21, 2008 at 11:08 PM, Graeme Geldenhuys
[EMAIL PROTECTED] wrote:
  

I thought you guys might find this interesting. It's a new 27 page
document describing Unicode support in D2009.

http://dn.codegear.com/article/38980



Seeing that I don't own D2009 and only read about it's Unicode support
I found some of the information interesting - and it was things we
argued about in this mailing list.

For example:

1...
  Length() returns the bytes for UTF8String
  but Length() returns the elements (what we know as characters) for
String or UTF16 strings.
  


No Length for String will return the number of Code Units (the number of 
WideChar in UnicodeString case). When there's surrogate pairs it will 
differ the number of Code Points (Characters) and Code Units. See the 
excerpt:



A way to create a string with surrogate pairs is to use the 
ConvertFromUtf32 function that
returns a string with the surrogate pair (two WideChar) in the proper 
circumstances, like the

following:

var
str1: string;
begin
str1 := 'Surr. ' + ConvertFromUtf32($1D11E);

Now if you ask for the string length, you'll get 8, which is the number 
of WideChar, but not the
number of logical Unicode code points in the string. If you print the 
string you get the proper
effect (well, at least Windows will generally show one square block as 
placeholder of the

surrogate pair, rather than two).




  Length() also returns bytes for AnsiString.


var
  str8: Utf8String;
  str16: string;
begin
  str8 := 'Cantù';
  Memo1.Lines.Add ('UTF-8');
  Memo1.Lines.Add('Length: ' + IntToStr (Length (str8)));
  Memo1.Lines.Add('5: ' + IntToStr (Ord (str8[5])));
  Memo1.Lines.Add('6: ' + IntToStr (Ord (str8[6])));
  str16 := str8;
  Memo1.Lines.Add ('UTF-16');
  Memo1.Lines.Add('Length: ' + IntToStr (Length (str16)));
  Memo1.Lines.Add('5: ' + IntToStr (Ord (str16[5])));
As you might expect, the str8 string has a length of 6 (meaning 6
bytes), while the str16
string has a length of 5 (meaning 10 bytes, though). Notice that
Length invariably returns the
number of string elements, which in case of variable-length
representations don't match the
number of Unicode code points represented by the string. This is the
output of the program:
UTF-8
Length: 6
5: 195
6: 185
UTF-16
Length: 5
5: 249



2...   TStrings can now take an encoding parameter to specify how it
should load or save files.

-
STREAMING TSTRINGS
The ReadFromFile and WriteToFile methods of the TStrings class can be
called with
an encoding. If you write a string list to text file without providing
a specific encoding, the class
will use TEncoding.Default, which uses the internal DefaultEncoding in turn
extracted at the first occurrence by the current Windows code page. In
other words, if you save
a file you'll get the same ANSI file as before.
Of course, you can also easily force the file to a different format,
for example the UTF-16 format:

Memo1.Lines.SaveToFile('test.txt',  TEncoding.Unicode);
-


anyway, there are a lot more interesting facts in this document. Well
worth reading to get a better understanding of unicode.


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
  



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel
  


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Re: new 27 page document describing Unicode support in D2009

2008-11-21 Thread Sergei Gorelkin

Graeme Geldenhuys wrote:

On Fri, Nov 21, 2008 at 11:08 PM, Graeme Geldenhuys
[EMAIL PROTECTED] wrote:

I thought you guys might find this interesting. It's a new 27 page
document describing Unicode support in D2009.

http://dn.codegear.com/article/38980


Seeing that I don't own D2009 and only read about it's Unicode support
I found some of the information interesting - and it was things we
argued about in this mailing list.

Well, with exclusion of the class helper for TStrings (notable is that 
they call it a hack themselves :) the design looks rather clean. Since 
each string stores its element size, both ansi and unicode strings are 
probably handled with common set of procedures, avoiding RTL size bloat.


And they explain why there is no compiler option for switching back and 
forth.


Unfortunately, the article does not provide information about how things 
like Pos() and Copy() work with utf8 strings. However, one may 
understand words utf-8 support is more limited than utf-16 as they 
continue to work with elements (bytes).


Regards,
Sergei
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Re: new 27 page document describing Unicode support in D2009

2008-11-21 Thread Luiz Americo Pereira Camara

Sergei Gorelkin escreveu:


Well, with exclusion of the class helper for TStrings (notable is 
that they call it a hack themselves :) the design looks rather clean. 
Since each string stores its element size, both ansi and unicode 
strings are probably handled with common set of procedures, avoiding 
RTL size bloat.




I also like the design since is flexible enough to allow the programmer 
work with different encodings.


And they explain why there is no compiler option for switching back 
and forth.


Unfortunately, the article does not provide information about how 
things like Pos() and Copy() work with utf8 strings. 
Here ( http://www.jacobthurman.com/?p=30  see comments) there's an 
explanation about those functions. Basically they will handle Code Units 
and not Code Points (characters)


However, one may understand words utf-8 support is more limited than 
utf-16 as they continue to work with elements (bytes).



Yes. This is a good decision also IMO.

Luiz
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel