Evren Esat Ozkan wrote: > Ok, I think this code snippet enough to show what i said; > > =================================== > > #!/usr/bin/env python > # -*- coding: utf-8 -*- > #Change utf-8 to latin-1 > #Or move variable decleration to another file than import it > > val='00090±NO:±H±H±H±H±' > > from urllib import urlencode > > data={'key':val} > > print urlencode(data) > > ===================================
did you cut and paste this into your mail program? because the file I got was ISO-8859-1 encoded: Content-Type: text/plain; charset="iso-8859-1" and uses a single byte to store each "±", and produces key=00090%B1NO%3A%B1H%B1H%B1H%B1H%B1 when I run it, which is the expected result. I think you're still not getting what's going on here, so let's try again: - the urlencode function doesn't care about encodings; it translates the bytes it gets one by one. if you pass in chr(0xB1), you get %B1 in the output. - it's your editor that decides how that "±" you typed in the original script are stored on disk; it may use one ISO-8859-1 bytes, two UTF-8 bytes, or something else. - the coding directive doesn't affect non-Unicode string literals in Python. in an 8-bit string, Python only sees a number of bytes. - the urlencode function only cares about the bytes. since you know that you want to use ISO-8859-1 encoding for your URL, and you seem to insist on typing the "±" characters in your code, the most portable (and editor-independent) way to write your code is to use Unicode literals when building the string, and explicitly convert to ISO-8859-1 on the way out. # build the URL as a Unicode string val = u'00090±NO:±H±H±H±H±' # encode as 8859-1 (latin-1) val = val.encode("iso-8859-1") from urllib import urlencode data={'key':val} print urlencode(data) key=00090%B1NO%3A%B1H%B1H%B1H%B1H%B1 this will work the same way no matter what character set you use to store the Python source file, as long as the coding directive matches what your editor is actually doing. if you want to make your code 100% robust, forget the idea of putting non-ascii characters in string literals, and use \xB1 instead: val = '00090\xb1NO:\xb1H\xb1H\xb1H\xb1H\xb1' # no need to encode, since the byte string is already iso-8859-1 from urllib import urlencode data={'key':val} print urlencode(data) key=00090%B1NO%3A%B1H%B1H%B1H%B1H%B1 hope this helps! </F>
-- http://mail.python.org/mailman/listinfo/python-list