Hi all, after I was notified about how notmuch's python bindings perform differently depending on whether we hand it (byte-based) ASCII strings or unicode, I tried to disentangle what encodings to expect and send it to. The answer is that things are very implicit. notmuch.h speaks of strings but never mentions encodings, xapian docs don't mention encodings but ojwb confirmed that it expects utf-8.
So, can be document what encoding we are expected to pass in the various APIs and where we can guarantee to actually return UTF-8 encoded strings? For some of the stuff we read directly from the files, eg arbitrary headers, we can probably be least sure, but are e.g. the returned tags always utf-8? I would love to make the python bindings use unicode() instances in cases where we can be sure to actually receive utf-8 encoded strings. Encodings make my brain hurt. Unfortunately one cannot simply ignore them. Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20110711/0c5a127e/attachment.pgp>