Re: Wide characters support in D

Michel Fortin Tue, 08 Jun 2010 07:50:17 -0700

On 2010-06-08 09:22:02 -0400, Ruslan Nikolaev <nruslan_de...@yahoo.com> said:

you don't need to provide instances for every other character type, andat the same time - use native character encoding available on system.


My opinion is thinking this will work is a fallacy. Here's why...

Generally Linux systems use UTF-8 so I guess the "system encoding"there will be UTF-8. But then if you start to use QT you have to useUTF-16, but you might have to intermix UTF-8 to work with otherlibraries in the backend (libraries which are not necessarily Dlibraries, nor system libraries). So you may have a UTF-8 backend (suchas the MySQL library), UTF-8 "system encoding" glue code, and UTF-16GUI code (QT). That might be a good or a bad choice, depending onvarious factors, such as whether the glue code send more strings to thebackend or the GUI.

Now try to port the thing to Windows where you define the "systemencoding" as UTF-16. Now you still have the same UTF-8 backend, and thesame UTF-16 GUI code, but for some reason you're changing the glue codein the middle to UTF-16? Sure, it can be made to work, but all thestring conversions will start to happen elsewhere, which may change theperformance characteristics and add some potential for bugs, and allthis for no real reason.

The problem is that what you call "system encoding" is only theencoding used by the system frameworks. It is relevant when workingwith the system frameworks, but when you're working with any other API,you'll probably want to use the same character type as that API does,not necessarily the "system encoding". Not all programs are based onextensive use of the system frameworks. In some situations you'll wantto use UTF-16 on Linux, or UTF-8 on Windows, because you're dealingwith libraries that expect that (QT, MySQL).

A compiler switch is a poor choice there, because you can't mixlibraries compiled with a different compiler switches when that switchchanges the default character type.

In most cases, it's much better in my opinion if the programmer justuses the same character type as one of the libraries it uses, stick tothat, and is aware of what he's doing. If someone really want to dealwith the complexity of supporting both character types depending on theenvironment it runs on, it's easy to create a "tchar" and "tstring"alias that depends on whether it's Windows or Linux, or on a customversion flag from a compiler switch, but that'll be his choice and hisresponsibility to make everything work. But I think in this case abetter option might be to abstract all those 'strings' under a singletype that work with all UTF encodings (something like [mtext]).


[mtext]: http://www.dprogramming.com/mtext.php

--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/

Re: Wide characters support in D

Reply via email to