Hi Chris, On Mittwoch, 4. Dezember 2013 10:20:31 Chris Angelico wrote: > On Wed, Dec 4, 2013 at 9:32 AM, Hans-Peter Jansen <h...@urpla.net> wrote: > > I'm experiencing strange behavior with attached code, that differs > > depending on sys.setdefaultencoding being set or not. If it is set, the > > code works as expected, if not - what should be the usual case - the code > > fails with some non-sensible traceback. > > Interesting. You're mixing str and unicode objects a lot here. The > cleanest solution, IMO, would be to either switch to Python 3 or add > this to the top of your code: > > from __future__ import unicode_literals > > Either way, you'll have all your quoted strings be Unicode, rather > than byte, strings. Then take away the requirement that Unicode > strings contain non-ASCII characters, and let everything go through > that code branch. > > Looking at this line in reprstr(): > > s = "u'%s'" % s.replace("'", "\\'") > > Two potential problems with that. Firstly, the representation is > flawed: a backslash in the input string won't be changed, so it's not > a true repr; but if this is just for debugging output, that's not a > big deal. Secondly, this code might produce either a str or a unicode, > depending on the type of s. That may cause messes later; since you > seem to be mostly working with the unicode type after that, it'd > probably be simpler/safer to make that always return one:
The code serves three purposes: make simple strings more readable, document the others as being unicode, and display those correctly ;) > s = u"u'%s'" % s.replace("'", "\\'") > > But the actual problem, I think, is that repr() guarantees to return a > str, and you're trying to return a unicode. Here's an illustration: > > # -*- coding: utf-8 -*- > class Foo(object): > def __repr__(self): > return u'äöü' > > foo = Foo() > print(foo.__repr__()) > print(repr(foo)) > > The first one succeeds, because building up that string isn't at all a > problem. The second one then tries to turn the return value of > __repr__ into a string using the default encoding - which defaults to > 'ascii', hence the problem you're seeing. > > Solution 1: Switch to Python 3, in which this will work fine (because > repr() in Py3 returns a Unicode string, since _everything_ is > Unicode). > > Solution 2: Explicitly encode in frec, or at the end of Record.__repr__(): > > def __repr__(self): > s = u'%s(\n%s\n)' % (self.__class__.__name__, > frec(self.__dict__)) return s.encode("utf-8") > > (that could be a one-liner, but it's already pushing 80-chars, so if > you have a length limit, breaking it helps) > > Solution 3: Don't use __repr__ here, but simply have your frec > function intelligently handle Record types. Effectively, you have your > own method of generating a debug description of a Record, which could > then return a unicode instead of a str. Thanks for all your considerations, they are very helpful indeed. Even more helpful, that I understand the issue in question now. I will take some rest and then decide, what to do about this with your precious help. > I personally recommend switching to Python 3 :) But presumably that's > not an option, or you'd already have considered it. You nailed it ;) Given the amount of special unicode handling code, that is necessary to keep Python 2 happy, makes proceeding with it no real fun on a longer term.. And the biggest proponent for hacking in Python IS the fun part of it. Then productivity, elegance, ..., you name it. Have-a-good-day-ly y'rs, Pete -- https://mail.python.org/mailman/listinfo/python-list