Those who've been following SDWF will by now have realised that it abuses char[] for ANSI
strings, whereas D strings are meant to be in Unicode. It's high time I did something
about this.
When I started on it, I was still using Windows 98, which has very limited Unicode
support. But that was years ago now. And it must be coming on 7 years now since MS
discontinued support for it. So maybe I might as well drop Windows 9x support, just like
16-bit support was dropped with the creation of D (which was only 4 years after Windows
95, after all).
As such, I plan to change SDWF to work in Unicode. Probably using UTF-16 internally, but
possibly giving the programmer the choice between UTF-8 and UTF-16.
But this begs the question of what to do with the existing char-based API. Possibilities
I've thought of:
(a) Just get rid of it. Programmers upgrading to the new SDWF version will be forced to
change instances of char to wchar; what more there is to do depends on what else the
program does with character/string data.
(b) Keep functions that take a char or char[] parameter, make them convert from ANSI to
UTF-16, but deprecate them. Thinking about it now, there are problems:
- In order to have versions of each function that return an ANSI string and that return a
Unicode string, I would need to name them differently, which could get ugly.
- When returning ANSI, what would happen to characters outside the code page?
- Mixing ANSI and Unicode could also have adverse effects on the interpretation of string
literals.
So maybe this isn't a good plan at all.
(c) Use versioning to give the programmer the choice of an ANSI API or a UTF-16 API,
rather like the WindowsAPI bindings themselves.
(d) Change char functions to use UTF-8. This would break any code that relies on the
characters being ANSI, or even that manipulates text on a one character, one byte basis.
As with (c), versioning could be used to give a choice between UTF-8 and UTF-16.
If path (b) or (c) is taken, the ANSI API could later be removed. Once this is done, or
if path (a) is taken, we could add UTF-8 support, thereby ending up at (d).
It's early days yet, but the thread I started a few hours ago ("D1, D2 and the future of
libraries") could still lead to my migrating SDWF to D2. If it does, I'll likely combine
the migration to Unicode with this.
Thoughts?
Stewart.