Dear newsgroup, I have written a cgi script in Python, and it has worked fine for some time. Now the installed Python version has been upgraded to 2.4.1 and I am having problems with non ascii characters.
The core of the problem I have is as follows: 1. The webpage contains a text field where the user enters her/his name. 2. My cgi script uses the 'cgi' class to extract the name the user has entered. 3. The cgi script writes the name the user has given to a file. Now, the webpage in question is in Norway; and many Norwegian names are not 8bit clean, i.e. they contain characters which can not be represented with a 7bit ascii encoding. As a consequence I get *either* a UnicodeDecodeError *or* just blanks when writing it to file. Simplest case: -------------- name = "Åse" fileH = open("/tmp/namelist.txt","w") fileH.write(name) In this case the first character in the name variable is not in the plain 7bit ascii encoding. The code written above runs without errors or warnings, but the problematic character is simple replaced by a space in the file '/tmp/namelist.txt'. More complicated case: ---------------------- The application uses the SOAP protocol via the ZSI module to communicate with some other site. The SOAP call returns a variable, and when this variable is combined with the name variable above I get the UnicodeDecodeError: name = "ÅSE" ref = SOAP_return_value() fileH = open("/tmp/namelist.txt","a") fileH.write("name:%s ref:%s \n" % (name,ref)) fileH.close() This bombs with: UnicodeEncodeError: 'ascii' codec can't encode character u'\xc5' in position 45: ordinal not in range(128) The variable 'ref' returned from the SOAP interaction is (seemingly ...) pure 7bit ascii. Any suggestions greatly appreciated. Joakim Hove -- Joakim Hove hove AT ntnu.no / Tlf: +47 (73 5)9 34 27 / Stabburveien 18 Fax: ................. / N-5231 Paradis http://www.ift.uib.no/~hove/ / 55 91 28 18 / 92 68 57 04 -- http://mail.python.org/mailman/listinfo/python-list