On 22/03/18 20:46, Tobiah wrote: > I was reading though, that JSON files must be encoded with UTF-8. So > should I be doing string.decode('latin-1').encode('utf-8')? Or does > the json module do that for me when I give it a unicode object?
Definitely not. In fact, that won't even work. >>> import json >>> s = 'déjà vu'.encode('latin1') >>> s b'd\xe9j\xe0 vu' >>> json.dumps(s.decode('latin1').encode('utf8')) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.6/json/__init__.py", line 231, in dumps return _default_encoder.encode(obj) File "/usr/lib/python3.6/json/encoder.py", line 199, in encode chunks = self.iterencode(o, _one_shot=True) File "/usr/lib/python3.6/json/encoder.py", line 257, in iterencode return _iterencode(o, 0) File "/usr/lib/python3.6/json/encoder.py", line 180, in default o.__class__.__name__) TypeError: Object of type 'bytes' is not JSON serializable >>> You should make sure that either the file you're writing to is opened as UTF-8 text, or the ensure_ascii parameter of dumps() or dump() is set to True (the default) – and then write the data in ASCII or any ASCII-compatible encoding (e.g. UTF-8). Basically, the default behaviour of the json module means you don't really have to worry about encodings at all once your original data is in unicode strings. -- Thomas -- https://mail.python.org/mailman/listinfo/python-list