On 07/30/2013 04:29 AM, Noah Silva wrote:

No, UTF16 only needs more memory if most of the text is ASCII. It actually uses less than UTF8 in the average case for Japanese, for example.
Of course you are right here.

    Linux OS API in most cases is 8 Bit,


I assume by 8bit, you mean variable byte encoding like UTF8.
Yep.

    Conversions are very expensive.


This is not as bad as some people make it out to be. You have to be converting a *lot* of data for it to be noticeable.
That is why I pointed out that the way to select an encoding depends on how much "calculations" are done on the strings.

But in fact I tend to agree, while the argument why - when converting to Unicode - the Lazarus team chose to do the LCL API in UTF-8 (while MSE chose UTF-16 for the same purpose) was exactly this (I never felt comfortable with that, BTW).


> I suppose this is bound to change once fpc has completed the move to "new Delphi Strings".

I really don't think so, the reasons are even well detailed in the Wiki.
I always was told that Delphi compatibility is the primary driving forth for any modifications. This necessarily suggests this move (which is not possible before fpc does provides "new Delphi Strings"). But there might be multiple opinions.

In fact my primary intentions with Lazarus / fpc are not to do my own generic projects, but to help my colleagues to move their huge Delphi XE program system to Linux. This in fact needs complete support for "new Delphi Strings".

From what I understand, the plan is for strings to store their codepage as an attribute internally along with their length, and since the compiler/runtime library will know their codepage, it can convert as necessary.
That already is ready to use in the svn and is exactly the said "new Delphi Strings", and - when activated - completely compatible with Delphi XE. It's rather nice and fast, but Delphi lacks a _completely_dynamic_encoding_ type with auto-conversion only when necessary. (IMHO rather easy doable by compiler magic, but "forgotten" in Delphi XE)
Either way, you can make your own StringList variants for each type easily enough.
Not without compiler support (if you want auto-conversion when necessary).

In fact, I am fine with manual conversions, so long as 99% of everything "just works" with UTF8 and/or UTF16.
I'm not fine with TStringList and friends forcing any predefined encoding. This in fact does work rather nicely without the application programmer even noticing it. But IMHO a cross platform system like fpc can be expected to do better, doing away with windowish remains from Delphi whenever possible.

-Michael
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Reply via email to