eryksun added the comment: It seems VC 14 has a bug here. In the new C runtime, strftime is implemented by calling wcsftime as follows:
size_t const result = _Wcsftime_l(wstring.get(), maxsize, wformat.get(), timeptr, lc_time_arg, locale); if (result == 0) return 0; // Copy output from wide char string if (!WideCharToMultiByte(lc_time_cp, 0, wstring.get(), -1, string, static_cast<int>(maxsize), nullptr, nullptr)) { __acrt_errno_map_os_error(GetLastError()); return 0; } return result; The WideCharToMultiByte call returns the number of bytes in the converted string, but strftime doesn't update the value of "result". This worked correctly in the old CRT. For example, in 3.4 built with VC 10: >>> sys.version_info[:2] (3, 4) >>> locale.setlocale(locale.LC_ALL, 'kor_kor') 'Korean_Korea.949' >>> time.strftime('%a') '\ud654' Here's an overview of the problem in 3.5, stepped through in the debugger: >>> sys.version_info[:2] (3, 5) >>> locale.setlocale(locale.LC_ALL, 'ko') 'ko' >>> time.strftime('%a') Breakpoint 0 hit ucrtbase!Wcsftime_l: 000007fe`e9e6fd74 48895c2410 mov qword ptr [rsp+10h],rbx ss:00000000`003df6d8=0000000000666ce0 wcsftime returns the output buffer length in wide characters: 0:000> pt; r rax rax=0000000000000001 WideCharToMultiByte is called to convert the wide-character string to the locale encoding: 0:000> pc ucrtbase!Strftime_l+0x17f: 000007fe`e9e6c383 ff15dfa00200 call qword ptr [ucrtbase!_imp_WideCharToMultiByte (000007fe`e9e96468)] ds:000007fe` e9e96468={KERNELBASE!WideCharToMultiByte (000007fe`fd631be0)} 0:000> p ucrtbase!Strftime_l+0x185: 000007fe`e9e6c389 85c0 test eax,eax This returns the length of the converted string (including the null): 0:000> r rax rax=0000000000000003 But strftime ignores this value, and instead returns the wide-character string length, which gets passed to PyUnicode_DecodeLocaleAndSize: 0:000> bp python35!PyUnicode_DecodeLocaleAndSize 0:000> g Breakpoint 1 hit python35!PyUnicode_DecodeLocaleAndSize: 00000000`5ec15160 4053 push rbx 0:000> r rdx rdx=0000000000000001 U+D654 was converted correctly to '\xc8\cad' (codepaged 949): 0:000> db @rcx l3 00000000`007e5d20 c8 ad 00 ... However, since (str[len] != '\0'), PyUnicode_DecodeLocaleAndSize errors out as follows: 0:000> bd 0,1; g Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: embedded null byte It works as expected if the length is manually changed to 2: >>> time.strftime('%a') Breakpoint 1 hit python35!PyUnicode_DecodeLocaleAndSize: 00000000`5ec15160 4053 push rbx 0:000> r rdx=2 0:000> g '\ud654' The string is null-terminated, so can time_strftime simply substitute PyUnicode_DecodeLocale in place of PyUnicode_DecodeLocaleAndSize? ---------- components: +Windows nosy: +eryksun, paul.moore, steve.dower, tim.golden, zach.ware _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue25023> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com