I have a file which is full of codes like \222 (for an em dash), produced by a 
Windows program.

I'm uploading this to my app using curl (on my mac) so the bytes are not being 
transcoded by my uploader, and I've found that I can get the right unicode 
version if I do this nastiness:

def replaceWindowsCodes(s):
  s = re.sub('\205',u'\u2026',s)
  s = re.sub('\221',u'\u2018',s)
  s = re.sub('\222',u'\u2019',s)
  s = re.sub('\223',u'\u201c',s)
  s = re.sub('\224',u'\u201d',s)
  s = re.sub('\225',u'\u2022',s)
  s = re.sub('\226',u'\u2013',s)
  s = re.sub('\227',u'\u2014',s)
  s = re.sub('\240',u'\u00a0',s)
  return s

in my post handler:

    t.text = replaceWindowsCodes(unicode(self.request.get("text"), 
'iso-8859-1'))

I'm guessing that there is a better way to do this.  Can someone who 
understands text encodings and python's various string types get me pointed in 
the right direction?

-Joshua

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to