On Wed, 4 Oct 2017 13:10:02 +0100 Tony Whyman <tony.why...@mccallumwhyman.com> wrote:
> Unicode Character String handling is a question that keeps coming up on > the Free Pascal Mailing lists and, empirically, it is hard to avoid the > conclusion that there is something wrong with the way these character > string types are handled. Otherwise, why does this issue keep arising? Mixing string types, mixing encodings, mixing legacy code, confusing UCS-2 with UTF-16, .... >[...] > Another problem is that there is no character type for a Unicode > Character. I'm curious: What languages have such a type? > The built-in type “WideChar” is only two bytes and cannot > hold a UTF-16 code point comprising two surrogate pairs. There is no > char type for a UTF-8 character and, while UCS4Char exists, the Lazarus > UTF-8 utilities use “cardinal” as the type for a code point (not exactly > strong typing). Should be remedied. >[...] >Let the programmer worry about the algorithm and the compiler worry about the best implementation. An UTF-32 string type is seldom the best choice for memory and/or speed. >[...] > I want to propose a new character type called “UniChar” - short for > Unicode Character, along with a new string type “UniString” and a new > collection “TUniStrings”. I have presented my thoughts here in a > detailed paper > > see https://mwasoftware.co.uk/docs/unistringproposal.pdf > > This is intended to be a fully worked proposal and I have circulated it > to provoke discussion and in the hope that it may be useful. Adding another string type without disabling some old string types will increase the confusion. Please provide a proposal for disabling old string types. Also keep in mind, that there is still no UTF-16 RTL, even though many people need that for Delphi compatibility. Starting yet another UTF-32 RTL need some heavy dedicated programmers. Mattias _______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal