> Are you sure that "strings in an unknown encoding" are conceptually
> strings and not rather bytes?

For file names, most definitely. For command line arguments, I am
fairly sure: the argc/argv calling convention does not allow for
arbitrary bytes.

> And what if we skillfully conserve unknown bytes in a private use or
> surrogate area and the application author actually knows the encoding
> and wants correctly decoded strings?

They can easily roundtrip that then to the encoding that it should have:

good_string = sys.argv[bad_string_index].\
   encode(sys.argv_encoding, "pua-replace").decode(real_encoding)

However, we are talking about borderline cases here - in most cases,
Python will just do the right thing. Special cases aren't special enough
to break the rules.

Regards,
Martin
_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to