Martin Blais wrote: > Hi. > > Like a lot of people (or so I hear in the blogosphere...), I've been > experiencing some friction in my code with unicode conversion > problems. Even when being super extra careful with the types of str's > or unicode objects that my variables can contain, there is always some > case or oversight where something unexpected happens which results in > a conversion which triggers a decode error. str.join() of a list of > strs, where one unicode object appears unexpectedly, and voila! > exception galore. Sometimes the problem shows up late because your > test code doesn't always contain accented characters. I'm sure many > of you experienced that or some variant at some point. > > I came to realize recently that this problem shares strong similarity > with the problem of implicit type conversions in C++, or at least it > feels the same: Stuff just happens implicitly, and it's hard to track > down where and when it happens by just looking at the code. Part of > the problem is that the unicode object acts a lot like a str, which is > convenient, but...
I agree. I think it was a mistake to implicitly convert mixed string expressions to unicode. > What if we could completely disable the implicit conversions between > unicode and str? In other words, if you would ALWAYS be forced to > call either .encode() or .decode() to convert between one and the > other... wouldn't that help a lot deal with that issue? Perhaps. > How hard would that be to implement? Not hard. We considered doing it for Zope 3, but ... > Would it break a lot of code? Yes. > Would some people want that? No, I wouldn't want lots of code to break. ;) > (I know I would, at least for some of my > code.) It seems to me that this would make the code more explicit and > force the programmer to become more aware of those conversions. Any > opinions welcome. I think it's too late to change this. I wish it had been done differently. (OTOH, I'm very happy we have Unicode support, so I'm not really complaining. :) I'll note that this hasn't been that much of a problem for us in Zope. We follow the strategy: Antoine Pitrou wrote: ... > A good rule of thumb is to convert to unicode everything that is > semantically textual, and to only use str for what is to be semantically > treated as a string of bytes (network packets, identifiers...). This is > also, AFAIU, the semantic model which is favoured for a hypothetical > future version of Python. This approach has worked pretty well for us. Still, when there is a problem, it's a real pain to debug because the error occurs too late, as you point out. Jim -- Jim Fulton mailto:[EMAIL PROTECTED] Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com