Re: [python-win32] python3 and extended mapi

Tim Roberts Tue, 10 Jun 2014 10:02:34 -0700

paul_kon...@dell.com wrote:
> Perhaps I’m missing something.

I'm not sure you're missing anything.  You're simply describing another
implementation choice that could have been made.  Both your scheme and
the actual scheme have their merits.



> I’m used to Windows API calls that come in a foo_A and foo_W flavor, the only 
> difference being that the _A flavor has ASCII arguments and the _W flavor has 
> Unicode arguments (for those arguments that are, abstractly, strings).

Technically speaking, the _A flavor is MBCS.  8-bit entities, with an
unknown encoding, where a single character can span multiple bytes.


> In Python 3, the “str” type is an abstract string; its character repertoire 
> is Unicode but it doesn’t have an encoding.  Instead, encoding and decoding 
> is done when it is converted to/from external interfaces — files, external 
> API calls, etc.

It doesn't have an encoding because it doesn't NEED an encoding.


> So... I would expect foo_A and foo_W to have “str” arguments, and the 
> interface machinery between Python3 and those functions would run the 
> appropriate encoding to generate the string representation expected.

The big problem here is determining what is "the appropriate encoding". 
I'm not convinced there is any way for the Python COM machinery to know
that definitively.  It could make a guess, but you're always going to be
wrong sometimes.  Absent that confidence, it seems to me that the
correct solution is to deliver the MBCS string exactly as it arrived,
and that's what the current implementation does.  Leave it to the
application to figure out how to decode it.


> For example, if a given API wants strings in ASCII form, it would be 
> str.encode (“ascii”) or perhaps str.encode (“latin1”).

Assuming the API wants Latin-1.  Does it?  You don't know that.  It
varies from machine to machine, and even from run to run.  That's the
problem.


> If it wants MBCS data, it would be encode to that encoding.

I assume you understand that all 8-bit strings in Windows are MBCS. 
Latin-1 is just another MBCS.


> If 2-byte Unicode, it would be encode to ucs-2.

UTF-16; that's what the Windows Unicode encoding is.


> I would only want/expect to see “bytes” types when the values in question are 
> binary data streams, or unknown format.  But anytime we’re dealing with text 
> strings, the Python 3 approach is that the Python code sees “str” type, and 
> questions of encoding have been handled at the edge.  This is where Python 3 
> gets it right and Python 2 was a big muddle.

The muddle is not the fault of Python.  It is the fault of the character
encoding decisions made by Microsoft in the mists of antiquity.  When
you get an 8-bit string that includes the byte 0x9F, there is absolutely
no way for you answer the question "what character is that?"  If that
question cannot be answered, middleware should not be making a guess.

-- 
Tim Roberts, t...@probo.com
Providenza & Boekelheide, Inc.

_______________________________________________
python-win32 mailing list
python-win32@python.org
https://mail.python.org/mailman/listinfo/python-win32

Re: [python-win32] python3 and extended mapi

Reply via email to