> Are you sure that "strings in an unknown encoding" are conceptually > strings and not rather bytes?
For file names, most definitely. For command line arguments, I am fairly sure: the argc/argv calling convention does not allow for arbitrary bytes. > And what if we skillfully conserve unknown bytes in a private use or > surrogate area and the application author actually knows the encoding > and wants correctly decoded strings? They can easily roundtrip that then to the encoding that it should have: good_string = sys.argv[bad_string_index].\ encode(sys.argv_encoding, "pua-replace").decode(real_encoding) However, we are talking about borderline cases here - in most cases, Python will just do the right thing. Special cases aren't special enough to break the rules. Regards, Martin _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
