Hi, I am rather new to python, and am currently struggling with some encoding issues. I have some utf-8-encoded text which I need to encode as iso-2022-jp before sending it out to the world. I am using python's encode functions: -- var = var.encode("iso-2022-jp", "replace") print var --
I am using the 'replace' argument because there seem to be a couple of utf-8 japanese characters which python can't correctly convert to iso-2022-jp. The output looks like this: ↓東京???日比谷線?北千住行 However if use perl's encode module to re-encode the exact same bit of text: -- $var = encode("iso-2022-jp", decode("utf8", $var)) print $var -- I get proper output (no unsightly question-marks): ↓東京メトロ日比谷線・北千住行 So, what's the deal? Why can't python properly encode some of these characters? I know there are a host of different iso-2022-jp variants, could it be using a different one than I think (the default)? I'm quite liking python at the moment for a variety of different reasons (I suspect perl will forever win when it comes to regular expressions but everything else is pretty darn nice), but this is a bit worrying. -Joe -- http://mail.python.org/mailman/listinfo/python-list