> > There is some amount of data in a database (PG) that must be inserted > > into Excel sheet and emailed. Nothing special, everything works. > > Except that non-ascii characters are not displayed properly. > > The data is stored as XML into a text field. > > This sentence doesn't make much sense. Explain.
Sorry, I meant: I use field of the type 'text' in a Postgres table to store my data. The data is a XML string. > Instead of "print data", do "print repr(data)" and show us what you > get. What *you* see on the screen is not much use for diagnosis; it's > the values of the bytes in the file that matter. Thanks for this valuable tip. I take letter "Ã" as an example. "print repr(data)" shows this: u'\xe9' > Open the spreadsheet with Microsoft Excel, copy-and-paste some data to > a Notepad window, save the Notepad file as Unicode type named (say) > "junk.u16" then at the Python interactive prompt do this: > > file("junk.u16", "rb").read().decode("utf16") > > and show us what you get. (I am on a Mac so I used Textedit to create a UTF-16 encoded file, right?) The result from Python is: u'\u0439' In Excel sheet it is shown as: Ð (Russian again?!) > > > and Python email package to email it... and the resulting sheet > > is not good: > > E-mailed how? To whom? [I.e. what country / what cultural background / > on what machine / what operating system / viewed using what software] Emailed with Python, please see the code at the end of the message. The receiving system is OS X with languages priority: Dutch, English, German, Russian and Hebrew. Viewer: MS Office 2004. > You are saying (in effect) U+0413 (Cyrillic upper case letter GHE) is > displayed instead of U+00FC (Latin small letter U with diaeresis). > > OK, we'd already guessed your background from your name :-) :-) > > However, what you see isn't necessarily what you've got. How do you > know it's not U+0393 (Greek capital letter GAMMA) or something else > that looks the same? Could even be from a line-drawing set (top left > corner of a box). What you need to do is find out the ordinal of the > character being displayed. > > This type of problem arises when a character is written in one encoding > and viewed using another. I've had a quick look through various > likely-suspect 8-bit character sets (e.g. Latin1, KOI-8, cp1251, > cp1252, various DOS (OEM) code-pages) and I couldn't see a pair of > encodings that would reproduce anything like your "umlauted-u becomes > gamma-or-similar" problem. Please supply more than 1 example. Thank you very much for your help and explanation! The "Ã" letter is what I could find till now, see examples above.. The following code fragment is used for creating Excel sheet and sending email: # ################## # Create Excel sheet # f = StringIO() workbook = xl.Writer(f) worksheet = workbook.add_worksheet() bold = workbook.add_format(bold=1) border = workbook.add_format(border=1) worksheet.write_row('A1', ['First Name', 'Last Name'], border) i = 1 for row in result: print repr(row['firstname']) datarow = [row.get('firstname'), row.get('surname')] i += 1 worksheet.write_row('A%s' % i, datarow) workbook.close() #################### # Create email message # Create the container (outer) email message. msg = MIMEMultipart() # Attach Excel sheet xls = MIMEBase('application', 'vnd.ms-excel') xls.set_payload(f.getvalue()) Encoders.encode_base64(xls) xls.add_header('Content-Disposition', 'attachment', filename='some file name %s-%s-%s.xls' % (today.day, today.month, today.year)) msg.attach(xls) msg['Subject'] = subject msg['From'] = fromaddr msg['To'] = toaddr # Guarantees the message ends in a newline msg.epilogue = '' # Send message s = smtplib.SMTP(smtp_host) s.sendmail(fromaddr, to_list, msg.as_string()) -- Ksenia -- http://mail.python.org/mailman/listinfo/python-list