On 2010-06-08 09:22:02 -0400, Ruslan Nikolaev <nruslan_de...@yahoo.com> said:

you don't need to provide instances for every other character type, and at the same time - use native character encoding available on system.

My opinion is thinking this will work is a fallacy. Here's why...

Generally Linux systems use UTF-8 so I guess the "system encoding" there will be UTF-8. But then if you start to use QT you have to use UTF-16, but you might have to intermix UTF-8 to work with other libraries in the backend (libraries which are not necessarily D libraries, nor system libraries). So you may have a UTF-8 backend (such as the MySQL library), UTF-8 "system encoding" glue code, and UTF-16 GUI code (QT). That might be a good or a bad choice, depending on various factors, such as whether the glue code send more strings to the backend or the GUI.

Now try to port the thing to Windows where you define the "system encoding" as UTF-16. Now you still have the same UTF-8 backend, and the same UTF-16 GUI code, but for some reason you're changing the glue code in the middle to UTF-16? Sure, it can be made to work, but all the string conversions will start to happen elsewhere, which may change the performance characteristics and add some potential for bugs, and all this for no real reason.

The problem is that what you call "system encoding" is only the encoding used by the system frameworks. It is relevant when working with the system frameworks, but when you're working with any other API, you'll probably want to use the same character type as that API does, not necessarily the "system encoding". Not all programs are based on extensive use of the system frameworks. In some situations you'll want to use UTF-16 on Linux, or UTF-8 on Windows, because you're dealing with libraries that expect that (QT, MySQL).

A compiler switch is a poor choice there, because you can't mix libraries compiled with a different compiler switches when that switch changes the default character type.

In most cases, it's much better in my opinion if the programmer just uses the same character type as one of the libraries it uses, stick to that, and is aware of what he's doing. If someone really want to deal with the complexity of supporting both character types depending on the environment it runs on, it's easy to create a "tchar" and "tstring" alias that depends on whether it's Windows or Linux, or on a custom version flag from a compiler switch, but that'll be his choice and his responsibility to make everything work. But I think in this case a better option might be to abstract all those 'strings' under a single type that work with all UTF encodings (something like [mtext]).

[mtext]: http://www.dprogramming.com/mtext.php

--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/

Reply via email to