Aha! A superset. That explains it! I've changed my call to pass 'cp1252' instead of 'iso-8859-1' and gotten rid of the replace call, and it seems to be working right now.
Thanks so much!!! -Joshua On May 18, 2011, at 11:58 PM, Geoffrey Spear wrote: > I'd guess the original encoding is CP-1252, although \222 shouldn't > correspond to an em dash in that encoding (of course, it *does* > correspond to the U+2019, a right single quote mark, as you map it). > The iso-8859-1 conversion "works" except for the characters you're > replacing (well; you're missing some...) because CP-1252 is a superset > of iso-8859-1. > > On May 17, 6:00 pm, Joshua Smith <joshuaesm...@charter.net> wrote: >> I have a file which is full of codes like \222 (for an em dash), produced by >> a Windows program. >> >> I'm uploading this to my app using curl (on my mac) so the bytes are not >> being transcoded by my uploader, and I've found that I can get the right >> unicode version if I do this nastiness: >> >> def replaceWindowsCodes(s): >> s = re.sub('\205',u'\u2026',s) >> s = re.sub('\221',u'\u2018',s) >> s = re.sub('\222',u'\u2019',s) >> s = re.sub('\223',u'\u201c',s) >> s = re.sub('\224',u'\u201d',s) >> s = re.sub('\225',u'\u2022',s) >> s = re.sub('\226',u'\u2013',s) >> s = re.sub('\227',u'\u2014',s) >> s = re.sub('\240',u'\u00a0',s) >> return s >> >> in my post handler: >> >> t.text = replaceWindowsCodes(unicode(self.request.get("text"), >> 'iso-8859-1')) >> >> I'm guessing that there is a better way to do this. Can someone who >> understands text encodings and python's various string types get me pointed >> in the right direction? >> >> -Joshua > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to google-appengine@googlegroups.com. > To unsubscribe from this group, send email to > google-appengine+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.