On Jun 9, 2014, at 9:40 PM, Christian K. <ckk...@hoc.net> wrote:

> Am 09.06.14 16:00, schrieb paul_kon...@dell.com:
>> 
>> On Jun 9, 2014, at 2:53 PM, Christian K. <ckk...@hoc.net> wrote:
>> 
>>> <Paul_Koning <at> Dell.com> writes:
>>> 
>>>> 
>>>> 
>>>> On Jun 9, 2014, at 9:07 AM, Christian K. <ckkart <at> hoc.net> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I was very pleased to see that retrieving properties of a MAPI object 
>>>>> yields
>>>>> either a <str> or <bytes> type depending on whether the _A or _W property
>>>>> was queried …
>>>> 
>>>> Really?  That seems strange.  As I recall, the *_W APIs are “wide
>>> character” ones.  So in Python 3, they
>>>> should both map to <str> type.  <bytes> applies only to non-text data.
>>> 
>>> At least for text properties like e.g. PR_SUBJECT_A / _W the former returns
>>> a mbcs encoded "string", i.e. of bytes type and the latter a 2-byte unicode
>>> string. Binary properties are always returned as bytes in contrast to
>>> earlier when using pyrhon2.
>> 
>> Yes, “bytes” for binary values is clearly correct.  But MBCS and “2 byte 
>> Unicode” (more accurately called either UCS-2 or UCS-2 BMP subset, not sure 
>> which) are both text strings.  The different encoding in the API doesn’t 
>> mean they should be different datatypes in Python 3; both cases are properly 
>> mapped to “str”.
> 
> No, this is not what I am seeing. MBCS encoded properties, i.e. those 
> terminating with _A are mapped to 'bytes' and the _W ones to 'str' which is 
> consistent with the handling of unicode and encoded information in python3. 
> And this is great indeed because having to distinguish between strings which 
> can be encoded or not while having the same type is really painful.

Perhaps I’m missing something.

I’m used to Windows API calls that come in a foo_A and foo_W flavor, the only 
difference being that the _A flavor has ASCII arguments and the _W flavor has 
Unicode arguments (for those arguments that are, abstractly, strings).

In Python 3, the “str” type is an abstract string; its character repertoire is 
Unicode but it doesn’t have an encoding.  Instead, encoding and decoding is 
done when it is converted to/from external interfaces — files, external API 
calls, etc.

So... I would expect foo_A and foo_W to have “str” arguments, and the interface 
machinery between Python3 and those functions would run the appropriate 
encoding to generate the string representation expected.

For example, if a given API wants strings in ASCII form, it would be str.encode 
(“ascii”) or perhaps str.encode (“latin1”).  If it wants MBCS data, it would be 
encode to that encoding.  If 2-byte Unicode, it would be encode to ucs-2.  And 
so on.  Ditto in the reverse direction, when strings are delivered by an 
external function.

I would only want/expect to see “bytes” types when the values in question are 
binary data streams, or unknown format.  But anytime we’re dealing with text 
strings, the Python 3 approach is that the Python code sees “str” type, and 
questions of encoding have been handled at the edge.  This is where Python 3 
gets it right and Python 2 was a big muddle.

Mark, could you clarify how you would expect this to work?

        paul

_______________________________________________
python-win32 mailing list
python-win32@python.org
https://mail.python.org/mailman/listinfo/python-win32

Reply via email to