[EMAIL PROTECTED] wrote: > Hi, > > Im totally new to Python so please bare with me. > > Data is entered into my program using the folling code - > > str = raw_input(command) > words = str.split() > > for word in words: > word = unicode(word,'latin-1') > word.encode('utf8')
The above statement produces a string in utf8 and then throws it away. It does not update "word". To retain the utf8 string, you would have to do word = word.encode('utf8') and in any case that won't update the original list. *** missing source code line(s) here *** > > This gives an error: *** missing traceback lines here *** > File "C:\Python25\lib\encodings\cp850.py", line 12, in encode > return codecs.charmap_encode(input,errors,encoding_map) > UnicodeEncodeError: 'charmap' codec can't encode character u'\x94' in > position 0 > : character maps to <undefined> No, it doesn't. You must have put "print word" to get the error that you did. *Please* when you are asking a question, copy/paste (1) the exact source code that you ran (2) the exact traceback that you got. > > but the following works. What do you mean by "works"? It may not have triggered an error, but on the other hand it doesn't do anything useful. > > str = raw_input(command) > words = str.split() > > for word in words: > uni = u"" Above line is pointless. Removing it will have no effect > uni = unicode(word,'latin-1') > uni.encode('utf8') Same problem as above -- utf8 string is produced and then thrown away. > > so the problem is that I want replace my list with unicode variables. > Or maybe I should create a new list. > > I also tried this: > > for word in words[:]: > word = u"" > word = unicode(word,'latin-1') You got the error on the above statement because you are trying (pointlessly) to decode the value u"". Decoding means to convert from some encoding to unicode. > word.encode('utf8') Again, utf8 straight down the gurgler. > print word This (if executed) will try to print the UNICODE version, and die [as in the 1st example] encoding the unicode in cp950, which is the encoding for your Windows command console. > > but got TypeError: decoding Unicode is not supported. > > What should I be doing? (1) Reading the Unicode howto: http://www.amk.ca/python/howto/ (2) Writing some code like this: | >>> strg = "\x94 foo bar zot" | >>> words = strg.split() | >>> words | ['\x94', 'foo', 'bar', 'zot'] | >>> utf8words = [unicode(word, 'latin1').encode('utf8') for word in words] | >>> utf8words | ['\xc2\x94', 'foo', 'bar', 'zot'] | >>> HTH, John -- http://mail.python.org/mailman/listinfo/python-list