Aha!  A superset.  That explains it!  I've changed my call to pass 'cp1252' 
instead of 'iso-8859-1' and gotten rid of the replace call, and it seems to be 
working right now.

Thanks so much!!!

-Joshua

On May 18, 2011, at 11:58 PM, Geoffrey Spear wrote:

> I'd guess the original encoding is CP-1252, although \222 shouldn't
> correspond to an em dash in that encoding (of course, it *does*
> correspond to the U+2019, a right single quote mark, as you map it).
> The iso-8859-1 conversion "works" except for the characters you're
> replacing (well; you're missing some...) because CP-1252 is a superset
> of iso-8859-1.
> 
> On May 17, 6:00 pm, Joshua Smith <joshuaesm...@charter.net> wrote:
>> I have a file which is full of codes like \222 (for an em dash), produced by 
>> a Windows program.
>> 
>> I'm uploading this to my app using curl (on my mac) so the bytes are not 
>> being transcoded by my uploader, and I've found that I can get the right 
>> unicode version if I do this nastiness:
>> 
>> def replaceWindowsCodes(s):
>>   s = re.sub('\205',u'\u2026',s)
>>   s = re.sub('\221',u'\u2018',s)
>>   s = re.sub('\222',u'\u2019',s)
>>   s = re.sub('\223',u'\u201c',s)
>>   s = re.sub('\224',u'\u201d',s)
>>   s = re.sub('\225',u'\u2022',s)
>>   s = re.sub('\226',u'\u2013',s)
>>   s = re.sub('\227',u'\u2014',s)
>>   s = re.sub('\240',u'\u00a0',s)
>>   return s
>> 
>> in my post handler:
>> 
>>     t.text = replaceWindowsCodes(unicode(self.request.get("text"), 
>> 'iso-8859-1'))
>> 
>> I'm guessing that there is a better way to do this.  Can someone who 
>> understands text encodings and python's various string types get me pointed 
>> in the right direction?
>> 
>> -Joshua
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To post to this group, send email to google-appengine@googlegroups.com.
> To unsubscribe from this group, send email to 
> google-appengine+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/google-appengine?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to