STINNER Victor added the comment: Extract of nfc_nfkc():
/* Hangul Composition. We don't need to check for <LV,T> pairs, since we always have decomposed data. */ code = PyUnicode_READ(kind, data, i); if (LBase <= code && code < (LBase+LCount) && i + 1 < len && VBase <= PyUnicode_READ(kind, data, i+1) && PyUnicode_READ(kind, data, i+1) <= (VBase+VCount)) { int LIndex, VIndex; LIndex = code - LBase; VIndex = PyUnicode_READ(kind, data, i+1) - VBase; code = SBase + (LIndex*VCount+VIndex)*TCount; i+=2; if (i < len && TBase <= PyUnicode_READ(kind, data, i) && PyUnicode_READ(kind, data, i) <= (TBase+TCount)) { code += PyUnicode_READ(kind, data, i)-TBase; i++; } output[o++] = code; continue; } With the input string (1101 116e, 11a7), we get: * LIndex = 1 * VIndex = 13 code = SBase + (LIndex*VCount+VIndex)*TCount + (ch3 - TBase) = 0xAC00 + (1 * 21 + 13) * 28 + 0 = 0xafb8 Constants: * LBase = 0x1100, LCount = 19 * VBase = 0x1161, VCount = 21 * TBase = 0x11A7, TCount = 28 * SBase = 0xAC00 The problem is maybe than we used the 3rd character whereas (ch3 - TBase) is equal to 0. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue26917> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com