On Wed, 16 Jul 2014 13:46:45 +0300, Marko Rauhamaa wrote: > Python 3 really is on a mission to elevate text into the mainstream at > the expense of bytes. I'm guessing this is done primarily to promote the > cross-platform transparency of Python code.
Ahead of bytes? Possibly. At the expense of bytes? Certainly not. If there is anything that you cannot conveniently do with bytes, that you could do in Python 2, it's likely a bug, or at least an obviously missing feature. The core devs recognise that they missed some use-cases (e.g. mixed bytes and text) which is now harder than it should be, and are on a mission to rectify that as much as possible within the constraints of backward compatibility. E.g. having b"abc"[0] return 97 instead of b"a" was probably a mistake, but there are four versions of Python 3.x that do it that way and it's too late to change until Python 5000. (Python 4 is unlikely to break backwards compatibility in a big way.) > For me, a linux system and network programmer, that layer of frosting > only gets in my way and I need to wash it off. Linux, like all Unixes, is primarily a text-based platform. With a few exceptions, /etc is filled with text files, not binary files, and half the executables on the system are text (Python, Perl, bash, sh, awk, etc.). www.catb.org/esr/writings/taoup/html/ch05s01.html To say that *dealing with text* gets in your way on a Linux system is rather like saying that you love Mac OS X except for its gosh-awful GUI and APIs. Of course, as a network programmer, you have to deal with bytes, so I'll give you a bit of leeway. >> Most programming languages I know of default to opening files in text >> mode, not binary mode, and I don't see any strong reason for Python to >> go against the tide there. > > In unix and linux, there never was a separate text mode for files. When > you open a file, you open a file -- and stuff bytes in it. There is no > commonly accepted text file encoding. UTF-8 comes close to being a > standard, but I know somebody who sticks to an ISO-8859-1 locale. And they should be dragged out into the street and beaten with a Clue Stick. They're the sort of people who are holding us back from the shining utopia of UTF-8 everywhere! (only half joking) But seriously, I cannot imagine any *rational* reason for using a legacy encoding, but I'm willing to give this person the benefit of the doubt that he's not a raving lunatic or old West European-centric curmudgeon trying to deny the existence of the rest of the world. http://i.imgur.com/UeZan.jpg That being the case, then good luck to him. As far as everyone else: http://www.utf8everywhere.org/ >> Having len('λ') == 1 is not an advanced text processing feature. > > There are (relative rare) occasions where you'd like to treat text as > text. o_O Relatively rare. Like, um, email, news, html, Unix config files, Windows ini files, source code in just about every language ever, SMSes, XML, JSON, YAML, instant messenger apps, word processors... even *graphic* applications invariably have a text tool. Now, it may be true that some of those things may not use text under the hood, but even so, text is ubiquitous. Even binary protocols often include chunks of recognisable human-readable text in them: [steve@ando Pictures]$ hexdump -n 64 -C picture.jpg 00000000 ff d8 ff e0 00 10 4a 46 49 46 00 01 01 00 00 01 |......JFIF......| 00000010 00 01 00 00 ff e2 0f 38 49 43 43 5f 50 52 4f 46 |.......8ICC_PROF| 00000020 49 4c 45 00 01 01 00 00 0f 28 61 70 70 6c 02 10 |ILE......(appl..| 00000030 00 00 6d 6e 74 72 52 47 42 20 58 59 5a 20 07 de |..mntrRGB XYZ ..| 00000040 > Then, it's nice to be able to move the data on the operating table > with .decode() and when the patient has been sewn back together, you can > release them with .encode(). > > More often, len(b'λ') is what I want. Oh really? Are you sure? What exactly is b'λ'? I couldn't have made up a better example of the confusion between bytes and text if I had tried. Thank you. -- Steven -- https://mail.python.org/mailman/listinfo/python-list