Rémi Lapeyre <remi.lape...@henki.fr> added the comment: > in real-life that b-prefixed string is just not readable by another program > in an easy way
If another program opens this CSV file, it will read the string "b'A'" which is what this field actually contains. Everything that is not a number or a string gets converted to a string: In [1]: import collections, dataclasses, random, secrets, io, csv ...: ...: Point = collections.namedtuple('Point', 'x y') ...: ...: @dataclasses.dataclass ...: class Valar: ...: name: str ...: age: int ...: ...: a = Point(1, 2) ...: b = Valar('Melkor', 2900) ...: c = secrets.token_bytes(4) ...: ...: out = io.StringIO() ...: f = csv.writer(out) ...: f.writerow((a, b, c)) ...: ...: out.seek(0) ...: print(out.read()) ...: "Point(x=1, y=2)","Valar(name='Melkor', age=2900)",b'\x95g6\xa2' Here another would find three fields, all strings: "Point(x=1, y=2)", "Valar(name='Melkor', age=2900)" and "b'\x95g6\xa2'". Would you expect to get actual objects instead of strings when reading the two first fields? > Incase it fails to decode using that, then it will throw a UnicodeDecodeError I read your PR, but succeeding to decode it does not mean it's correct: In [4]: b'r\xc3\xa9sum\xc3\xa9'.decode('latin') Out[4]: 'résumé' It worked, but is it the appropriate encoding? Probably not In [5]: b'r\xc3\xa9sum\xc3\xa9'.decode('utf8') Out[5]: 'résumé' If you want to be able to save bytes, the best way is to use a format that can roundtrip bytes like parquet: In [18]: df = pd.DataFrame.from_dict({'a': [b'a']}) In [19]: df.to_parquet('foo.parquet') In [20]: type(pd.read_parquet('foo.parquet')['a'][0]) Out[20]: bytes ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue40762> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com