On Jun 9, 2014, at 9:40 PM, Christian K. <ckk...@hoc.net> wrote: > Am 09.06.14 16:00, schrieb paul_kon...@dell.com: >> >> On Jun 9, 2014, at 2:53 PM, Christian K. <ckk...@hoc.net> wrote: >> >>> <Paul_Koning <at> Dell.com> writes: >>> >>>> >>>> >>>> On Jun 9, 2014, at 9:07 AM, Christian K. <ckkart <at> hoc.net> wrote: >>>> >>>>> Hi, >>>>> >>>>> I was very pleased to see that retrieving properties of a MAPI object >>>>> yields >>>>> either a <str> or <bytes> type depending on whether the _A or _W property >>>>> was queried … >>>> >>>> Really? That seems strange. As I recall, the *_W APIs are “wide >>> character” ones. So in Python 3, they >>>> should both map to <str> type. <bytes> applies only to non-text data. >>> >>> At least for text properties like e.g. PR_SUBJECT_A / _W the former returns >>> a mbcs encoded "string", i.e. of bytes type and the latter a 2-byte unicode >>> string. Binary properties are always returned as bytes in contrast to >>> earlier when using pyrhon2. >> >> Yes, “bytes” for binary values is clearly correct. But MBCS and “2 byte >> Unicode” (more accurately called either UCS-2 or UCS-2 BMP subset, not sure >> which) are both text strings. The different encoding in the API doesn’t >> mean they should be different datatypes in Python 3; both cases are properly >> mapped to “str”. > > No, this is not what I am seeing. MBCS encoded properties, i.e. those > terminating with _A are mapped to 'bytes' and the _W ones to 'str' which is > consistent with the handling of unicode and encoded information in python3. > And this is great indeed because having to distinguish between strings which > can be encoded or not while having the same type is really painful.
Perhaps I’m missing something. I’m used to Windows API calls that come in a foo_A and foo_W flavor, the only difference being that the _A flavor has ASCII arguments and the _W flavor has Unicode arguments (for those arguments that are, abstractly, strings). In Python 3, the “str” type is an abstract string; its character repertoire is Unicode but it doesn’t have an encoding. Instead, encoding and decoding is done when it is converted to/from external interfaces — files, external API calls, etc. So... I would expect foo_A and foo_W to have “str” arguments, and the interface machinery between Python3 and those functions would run the appropriate encoding to generate the string representation expected. For example, if a given API wants strings in ASCII form, it would be str.encode (“ascii”) or perhaps str.encode (“latin1”). If it wants MBCS data, it would be encode to that encoding. If 2-byte Unicode, it would be encode to ucs-2. And so on. Ditto in the reverse direction, when strings are delivered by an external function. I would only want/expect to see “bytes” types when the values in question are binary data streams, or unknown format. But anytime we’re dealing with text strings, the Python 3 approach is that the Python code sees “str” type, and questions of encoding have been handled at the edge. This is where Python 3 gets it right and Python 2 was a big muddle. Mark, could you clarify how you would expect this to work? paul _______________________________________________ python-win32 mailing list python-win32@python.org https://mail.python.org/mailman/listinfo/python-win32