On Wednesday, September 02, 2015 11:47:11 drug via Digitalmars-d-learn wrote:
> On 02.09.2015 11:30, FreeSlave wrote:
> >> I see, thanks. So I should always treat char[] as UTF in D itself, but
> >> because I need to pass char[], wchar[] or dchar[] to a C library I
> >> should treat it as not UTF but ubytes sequence or ushort or uint
> >> sequence - just to pass it correctly, right?
> >
> > You should just keep in mind that strings returned by Phobos are UTF
> > encoded. Does your C library have UTF support? Is it relevant at all?
> > Maybe it just treats char array as binary data. But if it does some
> > non-trivial string and character manipulations or talks to file system,
> > then it surely should expect strings in some specific encoding, and if
> > it's not UTF, you should re-encode data before passing from D to this
> > library.
> >
> > Also C does not have wchar and dchar, but has wchar_t which size is not
> > fixed and depends on particular platform.
> Well, I think it's not simple question. The C library I used is hdf5 lib
> and it stores data without processing. In general. In particular I need
> to evalutate a situation concretely, I guess.
> Thanks all for anwers.

Yeah. char in C is often used for what D uses ubyte, so just because C uses
a char doesn't mean that it even has anything to do with strings, let alone
UTF. The correct way to deal with a C function depends on the C function,
and that requires that you understand enough about what it's doing to know
whether you're really dealing with a string or just bytes.

Fortunately, most of the time - in *nix-land anyway - when char* is treated
as string data, it's either ASCII or UTF-8. However, in Windows, it's not,
and the situation gets far less pleasant (though if you're dealing with
strings a Windows API, you should almost always be using UTF-16 and avoid
that whole issue altogether).

In any case, you have to be familiar with what the C function is doing and
whether it's operating on string data or not rather than just blindly seeing
char* and thinking that it's a zero-terminated string.

- Jonathan M Davis

Reply via email to