On Thu, 15 Sep 2011 13:52:12 -0400, Austin Clements <amdragon at mit.edu> wrote: > On Tue, Sep 13, 2011 at 11:55 PM, Martin Owens <doctormo at gmail.com> wrote: > > Hello Again, > > > > I notice in the lib code notmuch_database_open(), > > notmuch_database_create() these functions use const char *path for the > > directory path input. Is this unicode safe? > > > > The python bindings (and ctype docs) seem to suggest using something > > called 'wchar_t *' for accepting unicode but that's for C not C++. > > > > Is this something that should be patched? > > char* is the correct type for paths on POSIX systems. The *meaning* > of those bytes is a more complicated matter and depends on your locale > settings. On old systems it was generally ASCII, on modern systems > it's generally UTF-8, and it can be many other things. However, as a > consequence of UNIX's C heritage, it is *always* terminated with a > NULL byte and cannot contain embedded NULL's.
Right, that's what we are doing, passing in utf-8 encoded unicode strings to char*, which should be just fine if that is what the underlying OS uses. > wchar_t is another matter entirely. wchar_t is the type used by C to > represent wide strings internally, which generally (but not > necessarily!) means it stores a Unicode code point. However, this > isn't an encoding, and different compilers can give wchar_t different > meanings, so wchar_t strings aren't generally appropriate for storing > or sharing between processes or with the kernel. Mmh, I remember I attempted to user wchar_t to pass in unicode objects directly and it had failed miserably. Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20110916/fa047b02/attachment.pgp>