Tobias Oberstein <[email protected]> added the comment: The JSON produced by Python's `json.dumps` is invalid. It is not valid UTF8. So reproducing this in PyPy is proliferating that bug.
It is invalid, since `\xc0` is only legal as a continuation octet in certain multibyte encoded Unicode characters (see the DFA in http://bjoern.hoehrmann.de/utf-8/decoder/dfa/). Here is the proof: $ python Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from autobahn.utf8validator import Utf8Validator >>> v = Utf8Validator() >>> v.validate("hello") (True, True, 5, 5) >>> v.reset() >>> v.validate("\xc0") (False, False, 0, 0) >>> import json >>> json.dumps("\xc0", ensure_ascii = False) '"\xc0"' >>> "\xc0".decode("utf8") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "c:\Python27\lib\encodings\utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xc0 in position 0: invalid start byte >>> json.dumps("hello", ensure_ascii = False) '"hello"' >>> ---------- nosy: +oberstet ________________________________________ PyPy bug tracker <[email protected]> <https://bugs.pypy.org/issue1627> ________________________________________ _______________________________________________ pypy-issue mailing list [email protected] https://mail.python.org/mailman/listinfo/pypy-issue
