Hi Steve,

do you know of a definite resource for Windows code pages
on MSDN or another official MS website ?

I tried to find some links, but only got these ancient
ones:

https://msdn.microsoft.com/en-us/library/cc195054.aspx

(this version of cp1252 doesn't even have the euro sign yet)

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Jan 19 2018)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> Python Database Interfaces ...           http://products.egenix.com/
>>> Plone/Zope Database Interfaces ...           http://zope.egenix.com/
________________________________________________________________________

::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/
                      http://www.malemburg.com/



On 19.01.2018 18:17, M.-A. Lemburg wrote:
> On 19.01.2018 17:24, Random832 wrote:
>> On Fri, Jan 19, 2018, at 08:30, M.-A. Lemburg wrote:
>>>> Someone did discover that Microsoft's current implementations of the
>>>> windows-* encodings matches the WHAT-WG spec, rather than the Unicode
>>>> spec that Microsoft originally wrote.
>>>
>>> No, MS implements somethings called "best fit encodings"
>>> and these are different than what WHATWG uses.
>>
>> NO. I made this absolutely clear in my previous message, best fit mappings 
>> can be clearly distinguished from regular mappings by the behavior of the 
>> native conversion functions with certain argument flags (the mapping of 0xA0 
>> to some private use character in cp932, for example, is a best-fit mapping 
>> in the decoding direction - but is treated as a regular mapping for encoding 
>> purposes), and the mapping of 0x81 to U+0081 in cp1252 etc is NOT a best fit 
>> mapping or in any way different from the rest of the mappings.
>>
>> We are not talking about implementing the best fit mappings. We are talking 
>> about real regular mappings that actually exist in these codepages that were 
>> for some unknown reason not included in the files published by Unicode.
> 
> I only know the best fit encoding maps that are available
> on the Unicode site.
> 
> If I read your comment correctly, you are saying that MS has
> moved away from the standard code pages towards something
> else - perhaps even something other than the best fit encodings
> listed on the Unicode site ?
> 
> Do you have some references for this ?
> 
> Note that the Windows code page codecs implemented in Python
> are all based on the Unicode mapping files and those were
> created by MS.
> 
>>> https://msdn.microsoft.com/en-us/library/windows/desktop/dd374130%28v=vs.85%29.aspx
>>>
>>> unfortunately uses the above mentioned best fit encodings,
>>> but this can and should be switched off by specifying the
>>> WC_NO_BEST_FIT_CHARS for anything that requires validation
>>> or needs to be interoperable:
>>
>> Specifying this flag (and MB_ERR_INVALID_CHARS in the other direction) in 
>> fact does not disable the mappings we are discussing.
> 
> Interesting. The CP1252 mapping clearly defines 0x80 to map
> to undefined, whereas the bestfit1252 maps it to 0x0081:
> 
> http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
> http://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt
> 
> Same for the example you gave for CP932:
> 
> http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
> http://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit932.txt
> 
> So at least following the documentation you'd expect the function
> to implement the regular mappings.
> 

_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to