Re: [fpc-devel] String and UnicodeString and UTF8String

Hans-Peter Diettrich Thu, 13 Jan 2011 11:33:04 -0800

Marco van de Voort schrieb:

In our previous episode, Hans-Peter Diettrich said:
"non-native" strings, it can also be a performance win).
IMO a single encoding, i.e. UTF-8, can cover all cases.
Well, for starters, it doesn't cover the existing Delphi/unicode codebase.
Because it's bound to UTF-16? That's not a problem, because WideStringwill continue to exist, and according conversions are still inserted bythe compiler.
That is DIY compatibility, or, in other words, no compaibility.


I still don't understand the problem :-(

Widestring will also grind the application to a halt due to being COM based
on Windows.


How that?

When system encoding changes with the target platform, indexed access tosuch strings can lead to different results. Unless the compiler can readthe coder's mind...
You don't have to. The Delphi model provides a stringtype for the system
encoding, and then as such all strings from the system can be labeled. With
other stringtypes, the necessary conversions can be edited.

Indexed string access produces other results for Ansi and UTF-8 systemencoding. Such code is not portable, and the data (ini files) are not,too. Allowing for UTF-8 as the system encoding will frustrate Windowsusers (dunno whether Windows allows for such a system encoding), andLinux users are frustrated when UTF-8 is disallowed.

Only solution: using OS encoding restricts the code to run on a singlemachine only, or on similarly configured machines.

The group of users, which accept this restriction, will be happy with asingle AnsiString type and no implicit conversions. Without implicitconversions such a string type can hold UTF-8 as well.

Likewise, e.g. win32 console routines can be labeled with OEMString. (Since
windows uses a different default encoding for the console)

This either implies OEM encoding as the system encoding of Win32 consoleapplications, or the use of multiple codepages, as before. But IMO Win32console also implements a "W" interface, so that it's up to the user touse whatever is more appropriate for his code.

The RTL has to distinguish between system-wide "filesystem" and "GUI"encoding, in file handling (CreateFile...).

Why spend time in the design of multiple RTL/LCL versions, whena single version will be perfectly sufficient?
Why spent 13 years being compatible when you can throw it away in a
second?
It's sufficient to throw away what's no more needed :-)


The previous message from Jeff shows that even shortstring is still in major
production use. Nothing is unused and can be clipped without a long winded
transition, or Delphi 2009 like painful breaks.


It's all about the well known dilemma:
- force (possibly many) implicit conversions, or
- supply multiple RTL/LCL versions, or

- break legacy user code by moving to a different (but again unique)string type.

Moreover, these discussions are useless since you know as well as I do that
no one stringtype will ever satisfy everybody. So IMHO it is time to take
the consequences from the 500 posts on this subject on the unicode subject
on this and other FPC/Lazarus lists and start thinking in solutions to
manage that, instead of reiterating the "one type to rule them all" mantra
ad infinitum.

The discussion is only about the pros and cons of the various possiblesolutions. I.e. it should reveal the critical cases and consequences,that have to be considered and handled in every implementation.

The implementation can choose any model. Different models can beimplemented as well, so that the final decision about the new standardcan be delayed, until the models can be tested in real world applications.

One model has already been implemented: UTF-8. It may need someadds/improvements, like a *hard* separation of AnsiString fromUTF8String, and nothing has to be thrown away.


DoDi

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] String and UnicodeString and UTF8String

Reply via email to