"Stephen J. Turnbull" <[EMAIL PROTECTED]> wrote: > > >>>>> "Josiah" == Josiah Carlson <[EMAIL PROTECTED]> writes: > > Josiah> The question remains: is str.decode() returning a string > Josiah> or unicode depending on the argument passed, when the > Josiah> argument quite literally names the codec involved, > Josiah> difficult to understand? I don't believe so; am I the > Josiah> only one? > > Do you do any of the user education *about codec use* that you > recommend? The people I try to teach about coding invariably find it > difficult to understand. The problem is that the near-universal > intuition is that for "human-usable text" is pretty much anything *but > Unicode* will do. This is a really hard block to get them past. > There is very good reason why Unicode is plain text ("original" in > MAL's terms) and everything else is encoded ("derived"), but students > new to the concept often take a while to "get" it.
I've not been teaching Python; when I was still a TA, it was strictly algorithms and data structures. Of those people who I have had the opportunity to entice into Python, I've not followed up on their progress to know if they had any issues. I try to internalize it by not thinking of strings as encoded data, but as binary data, and unicode as text. I then remind myself that unicode isn't native on-disk or cross-network (which stores and transports bytes, not characters), so one needs to encode it as binary data. It's a subtle difference, but it has worked so far for me. In my experience, at least for only-English speaking users, most people don't even get to unicode. I didn't even touch it until I had been well versed with the encoding and decoding of all different kinds of binary data, when a half-dozen international users (China, Japan, Russia, ...) requested its support in my source editor; so I added it. Supporting it properly hasn't been very difficult, and the only real nit I have experienced is supporting the encoding line just after the #! line for arbitrary codecs (sometimes saving a file in a particular encoding dies). I notice that you seem to be in Japan, so teaching unicode is a must. If you are using the "unicode is text" and "strings are data", and they aren't getting it; then I don't know. > Maybe it's just me, but whether it's the teacher or the students, I am > *not* excited about the education route. Martin's simple rule *is* > simple, and the exceptions for using a "nonexistent" method mean I > don't have to reinforce---the students will be able to teach each > other. The exceptions also directly help reinforce the notion that > text == Unicode. Are you sure that they would help? If .encode() and .decode() drop from strings and unicode (respectively), they get an AttributeError. That's almost useless. Raising a better exception (with more information) would be better in that case, but losing the functionality that either would offer seems unnecessary; which is why I had suggested some of the other method names. Perhaps a "This method was removed because it confused users. Use help(str.encode) (or unicode.decode) to find out how you can do the equivalent, or do what you *really* wanted to do." > I grant the point that .decode('base64') is useful, but I also believe > that "education" is a lot more easily said than done in this case. What I meant by "education" is 'better documentation' and 'better exception messages'. I didn't learn Python by sitting in a class; I learned it by going through the tutorial over a weekend as a 2nd year undergrad and writing software which could do what I wanted/needed. Compared to the compiler messages I'd been seeing from Codewarrior and MSVC 6, Python exceptions were like an oracle. I can understand how first-time programmers can have issues with *some* Python exception messages, which is why I think that we could use better ones. There is also the other issue that sometimes people fail to actually read the messages. Again, I don't believe that an AttributeError is any better than an "ordinal not in range(128)", but "You are trying to encode/decode to/from incompatible types. expected: a->b got: x->y" is better. Some of those can be done *very soon*, given the capabilities of the encodings module, and they could likely be easily migrated, regardless of the decisions with .encode()/.decode() . - Josiah _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com