R. David Murray <rdmur...@bitdance.com> added the comment: Pretty close. I'd do the check for us_ascii first, and only do the encode test/switch to utf-8 if that's the charset. The reason is that that if a charset has been specified, we don't waste time doing an unnecessary encoding (and the ascii codec is very fast, which you can't say about all the codecs).
Now, what would be *really* nice is to also try latin-1 before falling back to utf-8, but I wouldn't want to make that the default behavior for performance reasons. I'm planning to add support for that at some point, but I haven't decided exactly how (policy setting? New optional setting in the alias structure?) There seem to be unrelated changes to torture_test in your patch? ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue14380> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com