mysterious unicode

2007-03-20 Thread Gerry
I'm using pyExcelerator and xlrd to read and write data from and to
two spreadsheets.

I created the read spreadsheet by importing a text file - and I had
no unicode aspirations.

When I read a cell, it appears to be unicode u'Q1, say.

I can try cleaning it, like this:


try:
s.encode(ascii, replace)
except AttributeError:
pass


which seems to work.  Here's the mysterious part (aside from why
anything was unicode in the first place):

print  debug, c=, col, r=, row, v=, value,
qno=, qno
tuple = (qno, family)
try:
data[tuple].append(value)
except:
data[tuple] = [value]
print  debug, !!!, col, row, qno, family, tuple,
value, data[tuple]

which produces:

c= 1 r= 3 v= 4 qno= Q1
!!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4]

where qno seems to be a vanilla Q1, but a tuple using qno is
(u'Q1', ...).

Can somebody help me out?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: mysterious unicode

2007-03-20 Thread Gabriel Genellina
En Tue, 20 Mar 2007 19:35:00 -0300, Gerry [EMAIL PROTECTED]  
escribió:

 which seems to work.  Here's the mysterious part (aside from why
 anything was unicode in the first place):

 print  debug, c=, col, r=, row, v=, value,
 qno=, qno
 tuple = (qno, family)
 try:
 data[tuple].append(value)
 except:
 data[tuple] = [value]
 print  debug, !!!, col, row, qno, family, tuple,
 value, data[tuple]

 which produces:

 c= 1 r= 3 v= 4 qno= Q1
 !!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4]

 where qno seems to be a vanilla Q1, but a tuple using qno is
 (u'Q1', ...).

I bet qno was unicode from start. When you print an unicode object, you  
get the unadorned contents. When you print a tuple, it uses repr() on  
each item.

py qno = uQ1
py qno
u'Q1'
py print qno
Q1
py print (qno,2)
(u'Q1', 2)

-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: mysterious unicode

2007-03-20 Thread Gerry
On Mar 20, 7:29 pm, Gabriel Genellina [EMAIL PROTECTED]
wrote:
 En Tue, 20 Mar 2007 19:35:00 -0300, Gerry [EMAIL PROTECTED]
 escribió:

Thanks! - that helps a lot.

I'm still mystified why:
   qno was ever unicode, and why
   qno.encode(ascii, replace) is still unicode.

Gerry






 py qno = uQ1
 py qno
 u'Q1'
 py print qno
 Q1
 py print (qno,2)
 (u'Q1', 2)

 --
 Gabriel Genellina




-- 
http://mail.python.org/mailman/listinfo/python-list


Re: mysterious unicode

2007-03-20 Thread Gabriel Genellina
En Tue, 20 Mar 2007 20:47:22 -0300, Gerry [EMAIL PROTECTED]  
escribió:

 Thanks! - that helps a lot.

 I'm still mystified why:
qno was ever unicode, and why

I can't tell...

qno.encode(ascii, replace) is still unicode.

That *returns* a string, but you are discarding the return value. Should  
be qno = qno.encode(...)
It's similar to lower(), by example.

-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: mysterious unicode

2007-03-20 Thread jim-on-linux
On Tuesday 20 March 2007 18:35, Gerry wrote:
 I'm using pyExcelerator and xlrd to read and
 write data from and to two spreadsheets.

 I created the read spreadsheet by importing a
 text file - and I had no unicode aspirations.

 When I read a cell, it appears to be unicode
 u'Q1, say.

 I can try cleaning it, like this:


 try:
 s.encode(ascii, replace)
 except AttributeError:
 pass


 which seems to work.  Here's the mysterious
 part (aside from why anything was unicode in
 the first place):

 print  debug, c=, col,
 r=, row, v=, value, qno=, qno
 tuple = (qno, family)
 try:
 data[tuple].append(value)
 except:
 data[tuple] = [value]
 print  debug, !!!, col,
 row, qno, family, tuple, value, data[tuple]

 which produces:

 c= 1 r= 3 v= 4 qno= Q1
 !!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4]

 where qno seems to be a vanilla Q1, but a tuple
 using qno is (u'Q1', ...).

 Can somebody help me out?


I have been getting the same thing using SQLite3 
when extracting data fron an SQLite3 database.  I 
take the database info which is in a list and do

name = str.record[0]
rather than 
name = record[0]

So far, I havn't had any problems.
For some reason the unicode u is removed.
I havn't wanted to spend the time to figure out 
why.

jim-on-linux
http://www.inqvista.com




















-- 
http://mail.python.org/mailman/listinfo/python-list


Re: mysterious unicode

2007-03-20 Thread Carsten Haese
On Tue, 2007-03-20 at 16:47 -0700, Gerry wrote:
 I'm still mystified why:
qno was ever unicode,

Thus quoth http://www.lexicon.net/sjmachin/xlrd.html This module
presents all text strings as Python unicode objects.

-Carsten


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: mysterious unicode

2007-03-20 Thread Carsten Haese
On Tue, 2007-03-20 at 20:26 -0400, jim-on-linux wrote:
 I have been getting the same thing using SQLite3 
 when extracting data fron an SQLite3 database.

Many APIs that exchange data choose to exchange text in Unicode because
that eliminates encoding uncertainty. Whether an API uses Unicode would
probably be noted somewhere in its documentation.

  I take the database info which is in a list and do
 
 name = str.record[0]

You probably mean str(record[0]) .

 rather than 
 name = record[0]
 
 So far, I havn't had any problems.
 For some reason the unicode u is removed.
 I havn't wanted to spend the time to figure out 
 why.

As a software engineer, I'd get worried if I didn't know why the code I
wrote works. Maybe that's just me.

Unicode is not rocket science. I suggest you read
http://www.amk.ca/python/howto/unicode to demystify what Unicode objects
are and do.

With str(), you're asking the Unicode object for its byte string
interpretation, which causes the Unicode object to give you its encoding
in the system default encoding. The default encoding is normally ascii.
That can be tweaked for your particular Python installation, but if you
need an encoding other than ascii it's recommended that you explicitly
encode and decode from and to Unicode, lest you risk writing
non-portable code.

Using str() coercion of Unicode objects will work well enough until you
run into a string that contains characters that can't be represented in
the default encoding. Once that happens, you're better off explicitly
encoding the Unicode object into a well-defined encoding on input, or,
even better, just work with Unicode objects internally and only encode
to byte strings when absolutely necessary, such as when outputting to a
file or to the console.

Hope this helps,

Carsten.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: mysterious unicode

2007-03-20 Thread jim-on-linux
On Tuesday 20 March 2007 21:17, Carsten Haese 
wrote:
 On Tue, 2007-03-20 at 20:26 -0400, jim-on-linux 
wrote:
  I have been getting the same thing using
  SQLite3 when extracting data fron an SQLite3
  database.

 Many APIs that exchange data choose to exchange
 text in Unicode because that eliminates
 encoding uncertainty. Whether an API uses
 Unicode would probably be noted somewhere in
 its documentation.

   I take the database info which is in a list
  and do
 
  name = str.record[0]

 You probably mean str(record[0]) .

Yes, 



  rather than
  name = record[0]
 
  So far, I havn't had any problems.
  For some reason the unicode u is removed.
  I havn't wanted to spend the time to figure
  out why.

 As a software engineer, I'd get worried if I
 didn't know why the code I wrote works. Maybe
 that's just me.

I don't disagree, but sometime depending on the 
situation, time to investigate is a luxury.
However, 
( If you don't have the time to do it right the 
first time when will you have the time to fix 
it.)


 Unicode is not rocket science. I suggest you
 read http://www.amk.ca/python/howto/unicode to
 demystify what Unicode objects are and do.

 With str(), you're asking the Unicode object
 for its byte string interpretation, which
 causes the Unicode object to give you its
 encoding in the system default encoding. The
 default encoding is normally ascii. That can be
 tweaked for your particular Python
 installation, but if you need an encoding other
 than ascii it's recommended that you explicitly
 encode and decode from and to Unicode, lest you
 risk writing non-portable code.

 Using str() coercion of Unicode objects will
 work well enough until you run into a string
 that contains characters that can't be
 represented in the default encoding. 
Right,
even though None or null are not strings they are 
common enough to cause a problem.
Try to run a loop through a list with None  or 
null in it. 
Example,
x = str(list[2]) 
when list[2] = null or None, problems.  
Easy to fix but more work.

I'll check the web site out.

Thanks for the update,
Jim-on-linux

 Once that 
 happens, you're better off explicitly encoding
 the Unicode object into a well-defined encoding
 on input, or, even better, just work with
 Unicode objects internally and only encode to
 byte strings when absolutely necessary, such as
 when outputting to a file or to the console.

 Hope this helps,

 Carsten.
-- 
http://mail.python.org/mailman/listinfo/python-list