On Tue, Dec 7, 2010 at 12:06 AM, Nick Coghlan <[email protected]> wrote:
> On Tue, Dec 7, 2010 at 2:46 PM, Alexander Belopolsky
> <[email protected]> wrote:
>> Having all encodings accessible in a str method only promotes a
>> programming style where bytes objects can contain differently encoded
>> strings in different parts of the program. Instead, well-written
>> programs should decode bytes on input, do all processing with str type
>> and decode on output. When strings need to be passed to char* C APIs,
>> they should be encoded in UTF-8. Many C APIs originally designed for
>> ASCII actually produce meaningful results when given UTF-8 bytes.
>> (Supporting such usage was one of the design goals of UTF-8.)
>
> This world sounds nice, but it isn't the one that exists right now.
> Practicality beats purity and all that :)
.. and default encoding being fixed as UTF-8 already goes 99% of the
way to that world. As long as I can use encode/decode without an
argument, it does not bother me much that they can take one. These
methods are also much easier to ignore than the transform/untransform
pair simply because it is only one method per class.
transform/untransform have much larger mental footprint not only
because there are two of them in both str and bytes, but also because
both str and bytes have a synonymously named translate method. With
43 non-special methods, str interface is already huge. The
transform() method with a suitable set of codecs could possibly
replace things like expandtabs() or swapcase(), but that would be like
writing x.transform('exp') and x.unstransform('exp') instead of
math.exp(x) and math.log(x).
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com