On Sun, Feb 24, 2013 at 2:45 AM, Paul Moore <p.f.mo...@gmail.com> wrote: > At the moment, I'm using > > encoded = json.dumps([ord(c) for c in json.dumps(obj)]) > decoded = json.loads(''.join([chr(n) for n in json.loads(encoded)])) > > The double-encoding ensures that non-ASCII characters don't make it into the > result. > > This works fine, but is there something simpler (i.e., less of a hack!) that > I could use? (Base64 and the like don't work because they encode > bytes->strings, not strings->strings).
Hmm. How likely is it that you'll have non-ASCII characters in the input? If they're fairly uncommon, you could use UTF-7 - it's fairly space-efficient when the input is mostly ASCII, but inefficient on other characters. Not sure what the problem is with bytes vs strings; you can always do an encode("ascii") or decode("ascii") to convert 7-bit strings between those types. With that covered, I'd just go with a single JSON packaging, and work with the resulting Unicode string. Python 2.6: >>> s=u"asdf\u1234zxcv" >>> s.encode("utf-7").decode("ascii") u'asdf+EjQ-zxcv' Python 3.3: >>> s=u"asdf\u1234zxcv" >>> s.encode("utf-7").decode("ascii") 'asdf+EjQ-zxcv' Another option would be to JSON-encode in pure-ASCII mode: >>> json.dumps([s],ensure_ascii=True) '["asdf\\u1234zxcv"]' Would that cover it? ChrisA -- http://mail.python.org/mailman/listinfo/python-list