New submission from Ma Lin <malin...@163.com>:
CJK encode/decode functions only have three error-handler fast-paths: replace ignore strict See the code: [1][2] If use other built-in error-handlers, need to get the error-handler object, and call it with an Unicode Exception argument. See the code: [3] But the error-handler object is not cached, it needs to be looked up from a dict every time, which is very inefficient. Another possible optimization is to write fast-path for common error-handlers, Python has these built-in error-handlers: strict replace ignore backslashreplace xmlcharrefreplace namereplace surrogateescape surrogatepass (only for utf-8/utf-16/utf-32 family) For example, maybe `xmlcharrefreplace` is heavily used in Web application, it can be implemented as a fast-path, so that no need to call the error-handler object every time. Just like the `xmlcharrefreplace` fast-path in `PyUnicode_EncodeCharmap` [4]. [1] encode function: https://github.com/python/cpython/blob/v3.9.0b4/Modules/cjkcodecs/multibytecodec.c#L192 [2] decode function: https://github.com/python/cpython/blob/v3.9.0b4/Modules/cjkcodecs/multibytecodec.c#L347 [3] `call_error_callback` function: https://github.com/python/cpython/blob/v3.9.0b4/Modules/cjkcodecs/multibytecodec.c#L82 [4] `xmlcharrefreplace` fast-path in `PyUnicode_EncodeCharmap`: https://github.com/python/cpython/blob/v3.9.0b4/Objects/unicodeobject.c#L8662 ---------- components: Unicode messages: 373871 nosy: ezio.melotti, malin, vstinner priority: normal severity: normal status: open title: Inefficient error-handle for CJK encodings type: performance versions: Python 3.10 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue41330> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com