On 12月29日, 下午5时06分, "Mark Tolonen" <metolone+gm...@gmail.com> wrote: > "zxo102" <zxo...@gmail.com> wrote in message > > news:2560a6e0-c103-46d2-aa5a-8604de4d1...@b38g2000prf.googlegroups.com... > > > I have a list in a dictionary and want to insert it into the html > > file. I test it with following scripts of CASE 1, CASE 2 and CASE 3. I > > can see "中文" in CASE 1 but that is not what I want. CASE 2 does not > > show me correct things. > > So, in CASE 3, I hacked the script of CASE 2 with a function: > > conv_list2str() to 'convert' the list into a string. CASE 3 can show > > me "中文". I don't know what is wrong with CASE 2 and what is right with > > CASE 3. > > > Without knowing why, I have just hard coded my python application > > following CASE 3 for displaying Chinese characters from a list in a > > dictionary in my web application. > > > Any ideas? > > See below each case...新年快乐! > > > > > Happy a New Year: 2009 > > > ouyang > > > CASE 1: > > ######################################################## > > f=open('test.html','wt') > > f.write('''<html><head> > > <META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312"> > > <title>test</title> > > <script language=javascript> > > var test = ['\xd6\xd0\xce\xc4', '\xd6\xd0\xce\xc4', '\xd6\xd0\xce > > \xc4'] > > alert(test[0]) > > alert(test[1]) > > alert(test[2]) > > </script> > > </head> > > <body></body></html>''') > > f.close() > > In CASE 1, the *4 bytes* D6 D0 CE C4 are written to the file, which is the > correct gb2312 encoding for 中文. > > > > > CASE 2: > > ####################################################### > > mydict = {} > > mydict['JUNK'] = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce > > \xc4'] > > f_str = '''<html><head> > > <META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312"> > > <title>test</title> > > <script language=javascript> > > var test = %(JUNK)s > > alert(test[0]) > > alert(test[1]) > > alert(test[2]) > > </script> > > </head> > > <body></body></html>''' > > > f_str = f_str%mydict > > f=open('test02.html','wt') > > f.write(f_str) > > f.close() > > In CASE 2, the *16 characters* "\xd6\xd0\xce\xc4" are written to the file, > which is NOT the correct gb2312 encoding for 中文, and will be interpreted > however javascript pleases. This is because the str() representation of > mydict['JUNK'] in Python 2.x is the characters "['\xd6\xd0\xce\xc4', > '\xd6\xd0\xce\xc4', '\xd6\xd0\xce\xc4']". > > > > > CASE 3: > > ################################################### > > mydict = {} > > mydict['JUNK'] = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce > > \xc4'] > > > f_str = '''<html><head> > > <META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312"> > > <title>test</title> > > <script language=javascript> > > var test = %(JUNK)s > > alert(test[0]) > > alert(test[1]) > > alert(test[2]) > > </script> > > </head> > > <body></body></html>''' > > > import string > > > def conv_list2str(value): > > list_len = len(value) > > list_str = "[" > > for ii in range(list_len): > > list_str += '"'+string.strip(str(value[ii])) + '"' > > if ii != list_len-1: > > list_str += "," > > list_str += "]" > > return list_str > > > mydict['JUNK'] = conv_list2str(mydict['JUNK']) > > > f_str = f_str%mydict > > f=open('test03.html','wt') > > f.write(f_str) > > f.close() > > CASE 3 works because you build your own, correct, gb2312 representation of > mydict['JUNK'] (value[ii] above is the correct 4-byte sequence for 中文). > > That said, learn to use Unicode strings by trying the following program, but > set the first line to the encoding *your editor* saves files in. You can > use the actual Chinese characters instead of escape codes this way. The > encoding used for the source code and the encoding used for the html file > don't have to match, but the charset declared in the file and the encoding > used to write the file *do* have to match. > > # coding: utf8 > > import codecs > > mydict = {} > mydict['JUNK'] = [u'中文',u'中文',u'中文'] > > def conv_list2str(value): > return u'["' + u'","'.join(s for s in value) + u'"]' > > f_str = u'''<html><head> > <META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312"> > <title>test</title> > <script language=javascript> > var test = %s > alert(test[0]) > alert(test[1]) > alert(test[2]) > </script> > </head> > <body></body></html>''' > > s = conv_list2str(mydict['JUNK']) > f=codecs.open('test04.html','wt',encoding='gb2312') > f.write(f_str % s) > f.close() > > -Mark > > P.S. Python 3.0 makes this easier for what you want to do, because the > representation of a dictionary changes. You'll be able to skip the > conv_list2str() function and all strings are Unicode by default.
Thanks for your comments, Mark. I understand it now. The list(escape codes): ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4'] is from a postgresql database with "select" statement.I will postgresql database configurations and see if it is possible to return ['中文','中 文','中文'] directly with "select" statement. ouyang -- http://mail.python.org/mailman/listinfo/python-list