STINNER Victor <victor.stin...@haypocalc.com> added the comment: utf_8_java.patch: Implement "utf-8-java" encoding. * It has no alias * 'a\0b'.encode('utf-8-java') returns b'a\xc0\x80b' * b'a\xc0\x80b'.decode('utf-8-java') returns 'a\x00b' * I added some tests to utf-8 codec (test_invalid, test_null_byte) * I added many tests for utf-8-java codec * I choosed to copy utf8_code_length as utf8java_code_length instead of adding some if to not slow down UTF-8 codec * Decoder: 2 byte sequences may be *a little bit* slower for UTF-8: "if ((s[1] & 0xc0) != 0x80)" is replaced by "if ((ch <= 0x007F && (ch != 0x0000 || !java)) || ch > 0x07FF)" * Encoder: encode chars in U+0000-U+007F may be *a little bit* slower for UTF-8: I added (ch == 0x00 && java) test
For the doc, I just added a line "utf-8-java" in the codec list, but I did not add a paragraph to explain how this codec is different to utf-8. Does anyone have a suggestion? ---------- keywords: +patch Added file: http://bugs.python.org/file21965/utf_8_java.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue2857> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com