On Wednesday, September 02, 2015 11:47:11 drug via Digitalmars-d-learn wrote: > On 02.09.2015 11:30, FreeSlave wrote: > >> I see, thanks. So I should always treat char[] as UTF in D itself, but > >> because I need to pass char[], wchar[] or dchar[] to a C library I > >> should treat it as not UTF but ubytes sequence or ushort or uint > >> sequence - just to pass it correctly, right? > > > > You should just keep in mind that strings returned by Phobos are UTF > > encoded. Does your C library have UTF support? Is it relevant at all? > > Maybe it just treats char array as binary data. But if it does some > > non-trivial string and character manipulations or talks to file system, > > then it surely should expect strings in some specific encoding, and if > > it's not UTF, you should re-encode data before passing from D to this > > library. > > > > Also C does not have wchar and dchar, but has wchar_t which size is not > > fixed and depends on particular platform. > Well, I think it's not simple question. The C library I used is hdf5 lib > and it stores data without processing. In general. In particular I need > to evalutate a situation concretely, I guess. > Thanks all for anwers.
Yeah. char in C is often used for what D uses ubyte, so just because C uses a char doesn't mean that it even has anything to do with strings, let alone UTF. The correct way to deal with a C function depends on the C function, and that requires that you understand enough about what it's doing to know whether you're really dealing with a string or just bytes. Fortunately, most of the time - in *nix-land anyway - when char* is treated as string data, it's either ASCII or UTF-8. However, in Windows, it's not, and the situation gets far less pleasant (though if you're dealing with strings a Windows API, you should almost always be using UTF-16 and avoid that whole issue altogether). In any case, you have to be familiar with what the C function is doing and whether it's operating on string data or not rather than just blindly seeing char* and thinking that it's a zero-terminated string. - Jonathan M Davis