Re: Devnagari Unicode Conversion Issues

Dave Angel Thu, 27 Jun 2013 09:33:10 -0700

On 06/27/2013 11:39 AM, darpan6aya wrote:

That worked out. I was trying to encode it the entire time.
Now I realise how silly I am.


Thanks MRAB. Once Again. :D

you're not silly, it's a complex question. MRAB is good at guessingwhich part is messing you up.

However, when you're writing a real Python program with a real texteditor, and when you're not using a newsgroup in between to mangle orunmangle things, you have a few things to match up to get it right.

The file is just a bunch of bytes. Those bytes are being inserted inthere by your editor, and interpreted by the compiler. So if you have anon-ASCII character on your keyboard and you hit it, the editor willdecode it (from Unicode to byte(s)) and put it in the file. If you tellthe editor to use utf-8, then you also want to tell the compiler todecode it using utf-8.


The most polite way to do that looks something like:
# -*- coding: <encoding-name> -*-
# -*- coding: <utf-8> -*-

http://docs.python.org/release/2.7.5/reference/lexical_analysis.html#encoding-declarations

Once you've got that straight, you don't need to explicitly decode bytestrings. You can just use

  u"This is my string"

with whatever characters you need. As long as the declarations match,this should "just work." If the data comes from a byte string otherthan a literal string, you might need the more verbose form.

Your original message was sent in Western (ISO 8859-1), and MRAB'sresponse was in utf-8, and my mail program decoded the string the sameway. However, I don't know anything about Devnagari, so I can't say ifit looked reasonable here.



--
DaveA
--
http://mail.python.org/mailman/listinfo/python-list

Re: Devnagari Unicode Conversion Issues

Reply via email to