On Friday, 24 October 2014 at 22:53:15 UTC, Jonathan M Davis via
Digitalmars-d-learn wrote:
Also, given how DirEntry works internally, I'd definitely be
inclined to argue
that it would be too much of a mess to support wstring unless
it's by simply
converting the name to a wstring when requested (which is kind
of pointless,
since you can just do to!wstring on the name if that's what you
want). Making
it support wstring directly would involve a lot of code
duplication, and it
would increase the memory footprint, because the structs
involved would then
have to hold the path and whatnot as both a string and wstring.
So, I question
that it's at all worth it to try and make dirEntries support
wstring.
I would suggest that the string be kept as wstring inside the
DirEntry structure, rather than converting twice as you suggest.
Then a decision can be made as to whether .name() returns a
string or wstring. If backwards compatibility is a concern, then
it could be converted to a string on that call. It would break
the nothrow promise that way, though. Adding something like
.wname() would work here for getting the native wstring, I
suppose.
Another alternative is to have a union of string and wstring, and
a bool indicating how strings are handled internally. Of course,
the .name and .wname properties would need to check it and
convert depending on how it is stored. Its not pretty, but its
just another possibility.
The whole point is that there is a lot of wasted time doing the
UTF16-UTF8 conversions when using these library functions.
And we
definitely don't want to encourage the use of wstring. It's
there for when you
need it (which is great), but programs really should be using
string if they
don't actually need to use wstring or dstring.
I get that wstring on a whole is ugly, but its the native unicode
string type in Windows. If someone is doing serious work on
Windows, wstring will eventually need to be used. It'd be nice
to keep the abstraction of string at every level of a program,
but in Windows its impossible. The standard library, even if it
was comprehensive enough, will never cover every corner case
where strings are needed. Whether using the Windows API, COM, or
interfacing with other Windows libraries, wstring will still rear
its ugly head.
But, idealism aside, there are good reasons for keeping the
pathname in its native format on Windows:
- If a program is processing lots of files, there's going to be a
lot of wasted cycles doing those wstring->string conversions.
- Doing anything more with the files, besides listing them, will
probably result in a string->wstring conversion during a call to
Windows for opening or querying information about the file = more
cycles wasted
- Additionally, Windows has a peculiar way of handling long
pathnames that requires a "\\?\" prefix, and only works with the
unicode versions of its functions. This also makes the pathname
uniquely OS-specific..
Anyway, some things to think about.