mysterious unicode
I'm using pyExcelerator and xlrd to read and write data from and to two spreadsheets. I created the read spreadsheet by importing a text file - and I had no unicode aspirations. When I read a cell, it appears to be unicode u'Q1, say. I can try cleaning it, like this: try: s.encode(ascii, replace) except AttributeError: pass which seems to work. Here's the mysterious part (aside from why anything was unicode in the first place): print debug, c=, col, r=, row, v=, value, qno=, qno tuple = (qno, family) try: data[tuple].append(value) except: data[tuple] = [value] print debug, !!!, col, row, qno, family, tuple, value, data[tuple] which produces: c= 1 r= 3 v= 4 qno= Q1 !!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4] where qno seems to be a vanilla Q1, but a tuple using qno is (u'Q1', ...). Can somebody help me out? -- http://mail.python.org/mailman/listinfo/python-list
Re: mysterious unicode
En Tue, 20 Mar 2007 19:35:00 -0300, Gerry [EMAIL PROTECTED] escribió: which seems to work. Here's the mysterious part (aside from why anything was unicode in the first place): print debug, c=, col, r=, row, v=, value, qno=, qno tuple = (qno, family) try: data[tuple].append(value) except: data[tuple] = [value] print debug, !!!, col, row, qno, family, tuple, value, data[tuple] which produces: c= 1 r= 3 v= 4 qno= Q1 !!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4] where qno seems to be a vanilla Q1, but a tuple using qno is (u'Q1', ...). I bet qno was unicode from start. When you print an unicode object, you get the unadorned contents. When you print a tuple, it uses repr() on each item. py qno = uQ1 py qno u'Q1' py print qno Q1 py print (qno,2) (u'Q1', 2) -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
Re: mysterious unicode
On Mar 20, 7:29 pm, Gabriel Genellina [EMAIL PROTECTED] wrote: En Tue, 20 Mar 2007 19:35:00 -0300, Gerry [EMAIL PROTECTED] escribió: Thanks! - that helps a lot. I'm still mystified why: qno was ever unicode, and why qno.encode(ascii, replace) is still unicode. Gerry py qno = uQ1 py qno u'Q1' py print qno Q1 py print (qno,2) (u'Q1', 2) -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
Re: mysterious unicode
En Tue, 20 Mar 2007 20:47:22 -0300, Gerry [EMAIL PROTECTED] escribió: Thanks! - that helps a lot. I'm still mystified why: qno was ever unicode, and why I can't tell... qno.encode(ascii, replace) is still unicode. That *returns* a string, but you are discarding the return value. Should be qno = qno.encode(...) It's similar to lower(), by example. -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
Re: mysterious unicode
On Tuesday 20 March 2007 18:35, Gerry wrote: I'm using pyExcelerator and xlrd to read and write data from and to two spreadsheets. I created the read spreadsheet by importing a text file - and I had no unicode aspirations. When I read a cell, it appears to be unicode u'Q1, say. I can try cleaning it, like this: try: s.encode(ascii, replace) except AttributeError: pass which seems to work. Here's the mysterious part (aside from why anything was unicode in the first place): print debug, c=, col, r=, row, v=, value, qno=, qno tuple = (qno, family) try: data[tuple].append(value) except: data[tuple] = [value] print debug, !!!, col, row, qno, family, tuple, value, data[tuple] which produces: c= 1 r= 3 v= 4 qno= Q1 !!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4] where qno seems to be a vanilla Q1, but a tuple using qno is (u'Q1', ...). Can somebody help me out? I have been getting the same thing using SQLite3 when extracting data fron an SQLite3 database. I take the database info which is in a list and do name = str.record[0] rather than name = record[0] So far, I havn't had any problems. For some reason the unicode u is removed. I havn't wanted to spend the time to figure out why. jim-on-linux http://www.inqvista.com -- http://mail.python.org/mailman/listinfo/python-list
Re: mysterious unicode
On Tue, 2007-03-20 at 16:47 -0700, Gerry wrote: I'm still mystified why: qno was ever unicode, Thus quoth http://www.lexicon.net/sjmachin/xlrd.html This module presents all text strings as Python unicode objects. -Carsten -- http://mail.python.org/mailman/listinfo/python-list
Re: mysterious unicode
On Tue, 2007-03-20 at 20:26 -0400, jim-on-linux wrote: I have been getting the same thing using SQLite3 when extracting data fron an SQLite3 database. Many APIs that exchange data choose to exchange text in Unicode because that eliminates encoding uncertainty. Whether an API uses Unicode would probably be noted somewhere in its documentation. I take the database info which is in a list and do name = str.record[0] You probably mean str(record[0]) . rather than name = record[0] So far, I havn't had any problems. For some reason the unicode u is removed. I havn't wanted to spend the time to figure out why. As a software engineer, I'd get worried if I didn't know why the code I wrote works. Maybe that's just me. Unicode is not rocket science. I suggest you read http://www.amk.ca/python/howto/unicode to demystify what Unicode objects are and do. With str(), you're asking the Unicode object for its byte string interpretation, which causes the Unicode object to give you its encoding in the system default encoding. The default encoding is normally ascii. That can be tweaked for your particular Python installation, but if you need an encoding other than ascii it's recommended that you explicitly encode and decode from and to Unicode, lest you risk writing non-portable code. Using str() coercion of Unicode objects will work well enough until you run into a string that contains characters that can't be represented in the default encoding. Once that happens, you're better off explicitly encoding the Unicode object into a well-defined encoding on input, or, even better, just work with Unicode objects internally and only encode to byte strings when absolutely necessary, such as when outputting to a file or to the console. Hope this helps, Carsten. -- http://mail.python.org/mailman/listinfo/python-list
Re: mysterious unicode
On Tuesday 20 March 2007 21:17, Carsten Haese wrote: On Tue, 2007-03-20 at 20:26 -0400, jim-on-linux wrote: I have been getting the same thing using SQLite3 when extracting data fron an SQLite3 database. Many APIs that exchange data choose to exchange text in Unicode because that eliminates encoding uncertainty. Whether an API uses Unicode would probably be noted somewhere in its documentation. I take the database info which is in a list and do name = str.record[0] You probably mean str(record[0]) . Yes, rather than name = record[0] So far, I havn't had any problems. For some reason the unicode u is removed. I havn't wanted to spend the time to figure out why. As a software engineer, I'd get worried if I didn't know why the code I wrote works. Maybe that's just me. I don't disagree, but sometime depending on the situation, time to investigate is a luxury. However, ( If you don't have the time to do it right the first time when will you have the time to fix it.) Unicode is not rocket science. I suggest you read http://www.amk.ca/python/howto/unicode to demystify what Unicode objects are and do. With str(), you're asking the Unicode object for its byte string interpretation, which causes the Unicode object to give you its encoding in the system default encoding. The default encoding is normally ascii. That can be tweaked for your particular Python installation, but if you need an encoding other than ascii it's recommended that you explicitly encode and decode from and to Unicode, lest you risk writing non-portable code. Using str() coercion of Unicode objects will work well enough until you run into a string that contains characters that can't be represented in the default encoding. Right, even though None or null are not strings they are common enough to cause a problem. Try to run a loop through a list with None or null in it. Example, x = str(list[2]) when list[2] = null or None, problems. Easy to fix but more work. I'll check the web site out. Thanks for the update, Jim-on-linux Once that happens, you're better off explicitly encoding the Unicode object into a well-defined encoding on input, or, even better, just work with Unicode objects internally and only encode to byte strings when absolutely necessary, such as when outputting to a file or to the console. Hope this helps, Carsten. -- http://mail.python.org/mailman/listinfo/python-list