Re: [sqlite] Unicode Again... Sti ll Stuck... A Challenge... Store and retrieve the word résumé with out using a unicode string literal

2007-07-31 Thread wcmadness

Folks:

Thanks so much for your replies.  I have absorbed a lot of information about
code pages and unicode in the last couple of days.  My understanding is far
from complete, but I'm ahead of where I was...

In the end, my best answer was to set the text_factory property of the
connection object to str.  That way, there is no translation of the bytes in
memory to the database and back out of the database.  But, it also assumes
(I suppose) that the code page in use by the workstation machine that posted
the page will match the code page selected by my server machine.  Since that
should almost always be the case for me (at least in the near term), I have
decided to just set text_factory to str.  I am now able to store résumé to
my Sqlite database and faithfully have it returned in the same condition
(which was the challenge that brought me to you in the first place).

Thanks again for all your help and insight!

Doug

wcmadness wrote:
> 
> Surely there is an answer to this question...
> 
> I'm using Python and PySqlite.  I'm trying to store the word résumé to a
> text field.  I'm really doing this as a test to see how to handle
> diacritical letters, such as umlaut characters (from German) or accented
> characters (from French).  I can produce French é on my keyboard with
> Alt-130...
> 
> If I were coding a string literal, I would send through the data as
> unicode, as in: u'résumé'.  But, I'm not that lucky.  The data is coming
> from an HTML form or from a flat file.  It will take on the default codec
> used on my machine (latin-1).  If I just send it through as is, it has
> problems either when I fetchall or when I try to print what I've fetched. 
> So, for example:
> 
> Insert Into tblTest (word) values ('résumé')
> 
> will cause problems.
> 
> I know that Sqlite stores text data as utf-8.  I know that in Python (on
> my machine, at least) strings are stored as latin-1.  So, for example, in
> Python code:
> 
> v = 'résumé'
> 
> v would be of type str, using latin-1 encoding.
> 
> So, I have tried sending through my data as follows:
> 
> cur.execute("Insert Into tblTest (word) values (?)",
> ("résumé".decode("latin-1").encode("utf-8"),))
> 
> That stores the data just fine, but when I fetchall, I still have
> problems.  Say, I select * from tblTest and then do:
> 
> l = cur.fetchall()
> 
> Doing print l[0][1]  (to print the word résumé) will give a nasty message
> about ascii codec can't convert character \x082 (or some variation of that
> message).
> 
> I've tried doing:
> 
> print l[0][1].decode('utf-8').encode('latin-1')
> 
> But to no avail.
> 
> The simple question is this:
> 
> How do I store the word résumé to a Sqlite DB without using a unicode
> literal (e.g. u'résumé'), such that printing the results retrieved from
> fetchall will not crash?
> 
> Surely someone is doing this... Say you get data from an HTML page that
> contains diacritical characters.  You need to store it to Sqlite and
> retrieve it back out for display.  What do you do???
> 
> I'm stuck!
> 
> Doug
> 

-- 
View this message in context: 
http://www.nabble.com/Unicode-Again...-Still-Stuck...-A-Challenge...-Store-and-retrieve-the-word-r%C3%A9sum%C3%A9-without-using-a-unicode-string-literal-tf4190926.html#a11938827
Sent from the SQLite mailing list archive at Nabble.com.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Unicode Again... Sti ll Stuck... A Challenge... Store and retrieve the word résumé with out using a unicode string literal

2007-07-31 Thread wcmadness

Well, I have a solution to my own problem, and I wanted to post it for two
reasons: First, it might help someone; second, I'm wondering if someone can
explain it to me...

Here's the scoop...

I'm on a Windows machine.  It turns out that the default code page on
Windows is cp437.  So, in my Python code, if I type:

s = 'résumé' (with the French e s), it is stored as: 'r\x82sum\x82' because
hex 82 (decimal 130) is the code for French e in code page 437 (used by
Windows)...

OK.  So, now that I now my data comes to me from the HTML form (or in a flat
file) in code page 437 on a Windows machine, I can do the following when I
send the data to the database:

f = cgi.FieldStorage()
cur.execute("insert into test values (?,?)",
(f['txtName'].value.decode('cp437')))

The decode method after the incoming form data will force a translation from
code page 437 to unicode (from 1 byte per character according to extended
ascii set code page 437 to 2 bytes per character -- unicode).  That's all
fine.

Now, when I get the data with:

cur.execute("select * from test")
mylist = cur.fetchall()

I would expect that I would need to encode the unicode data coming from
Sqlite to get back to my original code page 437 (of course, I could also
just use the data as unicode).  So, I would expect to do this:

(say that row one, column one has the value of résumé)

In that case, the following should return me exactly to the original
'r\x82sum\x82'

mylist[0][0].encode('cp437')

But it doesn't!!! (Wacky)!

Rather, it gives me this: 'r\xe9sum\xe9'

Interestingly, that's almost the same as what I get with a unicode literal. 
In other words, if I write this Python code:

x = u'résumé'

and then type x in the shell to see what it is, I get this:

u'r\xe9sum\xe9'

The only difference is that the latter is unicode and the former
('r\xe9sum\xe9') is not.

So, to get back where I started, I do the fetchall and then this wacky
thing:

eval("u'" + mylist[0][0].encode('cp437') + "'").encode('cp437')

In other words, I say: OK, you're almost there.  Now, convert to unicode by
evaluating the string as a unicode literal and then encode the unicode back
to the code page 437.

What a kludge.  It seems like an awefully lot of work to get back to the
original data that was stored to the database.  And why?  Does anyone know
what's going on here???

Thanks.


wcmadness wrote:
> 
> Surely there is an answer to this question...
> 
> I'm using Python and PySqlite.  I'm trying to store the word résumé to a
> text field.  I'm really doing this as a test to see how to handle
> diacritical letters, such as umlaut characters (from German) or accented
> characters (from French).  I can produce French é on my keyboard with
> Alt-130...
> 
> If I were coding a string literal, I would send through the data as
> unicode, as in: u'résumé'.  But, I'm not that lucky.  The data is coming
> from an HTML form or from a flat file.  It will take on the default codec
> used on my machine (latin-1).  If I just send it through as is, it has
> problems either when I fetchall or when I try to print what I've fetched. 
> So, for example:
> 
> Insert Into tblTest (word) values ('résumé')
> 
> will cause problems.
> 
> I know that Sqlite stores text data as utf-8.  I know that in Python (on
> my machine, at least) strings are stored as latin-1.  So, for example, in
> Python code:
> 
> v = 'résumé'
> 
> v would be of type str, using latin-1 encoding.
> 
> So, I have tried sending through my data as follows:
> 
> cur.execute("Insert Into tblTest (word) values (?)",
> ("résumé".decode("latin-1").encode("utf-8"),))
> 
> That stores the data just fine, but when I fetchall, I still have
> problems.  Say, I select * from tblTest and then do:
> 
> l = cur.fetchall()
> 
> Doing print l[0][1]  (to print the word résumé) will give a nasty message
> about ascii codec can't convert character \x082 (or some variation of that
> message).
> 
> I've tried doing:
> 
> print l[0][1].decode('utf-8').encode('latin-1')
> 
> But to no avail.
> 
> The simple question is this:
> 
> How do I store the word résumé to a Sqlite DB without using a unicode
> literal (e.g. u'résumé'), such that printing the results retrieved from
> fetchall will not crash?
> 
> Surely someone is doing this... Say you get data from an HTML page that
> contains diacritical characters.  You need to store it to Sqlite and
> retrieve it back out for display.  What do you do???
> 
> I'm stuck!
> 
> Doug
> 

-- 
View this message in context: 
http://www.nabble.com/Unicode-Again...-Still-Stuck...-A-Challenge...-Store-and-retrieve-the-word-r%C3%A9sum%C3%A9-without-using-a-unicode-string-literal-tf4190926.html#a11918870
Sent from the SQLite mailing list archive at Nabble.com.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Unicode

2007-07-29 Thread wcmadness

I'm stuck on this.  I'm writing a data layer that potentially needs to handle
diacritical (sp?) characters, such a French accented é characters or German
umlauted characters (sp?).  It should be rare that I would run into
something like this, but the data layer should handle it nevertheless.  For
example, it would certainly be expected to handle something as simple as the
word résumé or the name Réggé.

I've tried quite a few things now, and I just can't get to a solid solution. 
The data gets stored to Sqlite, but when I try to select it, I have
problems.  Here's a sample of the error I get from the Python shell trying
to select data with accented characters:

>>> import sqlite3
>>> con = sqlite3.connect('test.db')
>>> cur = con.cursor()
>>> cur.execute("select * from test order by name")

>>> l = cur.fetchall()
Traceback (most recent call last):
  File "", line 1, in 
sqlite3.OperationalError: Could not decode to UTF-8 column 'name' with text
'RΘs
umΘ'
>>>

Now, I'll post the code that created it.  It tried to store résumé as name
(even though that's not really a name)... The data will typically be
collected from an HTML page, so I am posting the HTML first and then the
Python code that handles it.  I have tried using the Python unicode()
function and the Python decode() function before the data goes into the
database...  In any event, though, storing the data is not the problem...
And, indeed, the data ALWAYS seems to get stored to Sqlite as unicode.  But,
even stored as unicode, pysqlite has problems fetching the data.  In fact,
my problem may lay with pysqlite...  In any case, here's the code.  Any
insight would be most welcome.  Please reach me by e-mail at
[EMAIL PROTECTED]  Thanks...

Oh, the structure of the table (called test) is simply (name, birthdate).

Here's the HTML:






Name: 
B-Dt: 





Here's cptest2.py (the program that the HTML posts to):

import cgi
import sqlite3
f = cgi.FieldStorage()

def StripNonAlpha(pstrValue):
  lstrRetVal = ''
  for s in pstrValue:
if 'a' <= s.lower() <= 'z':
  lstrRetVal += s
  return lstrRetVal

TypeName = StripNonAlpha(str(type(f['txtName'].value)))
TypeBirthDate = StripNonAlpha(str(type(f['txtBirthDate'].value)))

#uName = unicode(f['txtName'].value,"Latin-1")
#uBirthDate = unicode(f['txtBirthDate'].value,"Latin-1")
#uTypeName = StripNonAlpha(str(type(unicode(f['txtName'].value,"Latin-1"
#uTypeBirthDate =
StripNonAlpha(str(type(unicode(f['txtBirthDate'].value,"Latin-1"

print """Content-type: text/html





name: %s
name type: %s
b-dt: %s
b-dt type: %s


""" % (f['txtName'].value,TypeName,f['txtBirthDate'].value,TypeBirthDate)

con = sqlite3.connect('test.db')
cur = con.cursor()
cur.execute("insert into test values (?,?)",
(f['txtName'].value,f['txtBirthDate'].value))
con.commit()
cur.close()
con.close()
-- 
View this message in context: 
http://www.nabble.com/Unicode-tf4167305.html#a11856263
Sent from the SQLite mailing list archive at Nabble.com.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Custom Aggregation Not Working

2007-06-18 Thread wcmadness

Hey, Folks: 

I'm writing a financial application and MUST have exact math decimals (no
floats). So, I'm using Python's decimal module. 

My database is Sqlite (and my language is Python with Pysqlite); Sqlite
doesn't offer a non-floating point decimal type. But, it does have adapters
and converters to store data as a native Sqlite type (string / text) in the
database and bring it out of the database and into memory as something else
(Python Decimal, in my case). That works great, but it does NOT seem to
apply to aggregation operations. I need it to. 

So, I tried using another Sqlite feature, custom aggregation functions, but
to no avail. Does anyone know how to fix this? What am I doing wrong? 

I am submitting all of my test code, below. Following that, I am submitting
my results. 

Thanks for your time in helping with this! 

Here's the code: 

import sqlite3 
import decimal 

# This way will store the value as float, potentially losing
precision. 

print '-' * 25 
print 'Testing native data types, no adapters / converters.' 
con = sqlite3.connect('test1.db') 
cur = con.cursor() 
cur.execute("create table test (pkey integer primary key,somenumber
Decimal);") 
cur.execute("insert into test values (null,.1);") 
cur.execute("insert into test values (null,.2);") 
cur.execute("select * from test;") 
rows = cur.fetchall() 
for row in rows: 
print row[0], type(row[0]) 
print row[1], type(row[1]) 
cur.close() 
con.close() 
print '-' * 25 

# This way will store the value as decimal, keeping exact precision. 

def AdaptDecimal(pdecValue): 
return str(pdecValue) 
def ConvertDecimal(pstrValue): 
return decimal.Decimal(pstrValue) 

decimal.getcontext().precision = 50 
sqlite3.register_adapter(decimal.Decimal, AdaptDecimal) 
sqlite3.register_converter("Decimal", ConvertDecimal) 

print 'Testing data type with adapters / converters. Decimal numbers should
be Python Decimal types.' 
con = sqlite3.connect('test2.db',detect_types = sqlite3.PARSE_DECLTYPES) 
cur = con.cursor() 
cur.execute("create table test (pkey integer primary key,somenumber
Decimal);") 
cur.execute("insert into test values (null,.1);") 
cur.execute("insert into test values (null,.2);") 
cur.execute("select * from test;") 
rows = cur.fetchall() 
for row in rows: 
print row[0], type(row[0]) 
print row[1], type(row[1]) 
cur.close() 
con.close() 
print '-' * 25 

# OK. That works. Now for the real test. Let's try an equality test. 
# Classic float equality failure .1 + .1... 10 times should NOT
equal 1. 
# As predicted, this will FAIL the equality test 

print 'Testing Sum aggregation on native data types. Should be float and
should fail equality test.' 
con = sqlite3.connect('test3.db') 
cur = con.cursor() 
cur.execute("create table test (pkey integer primary key,somenumber
Decimal);") 
for x in range(10): 
cur.execute("insert into test values (null,.1);") 
cur.execute("select sum(somenumber) as total from test;") 
rows = cur.fetchall() 
print rows[0][0], type(rows[0][0]) 
if rows[0][0] == 1: 
print 'equal' 
else: 
print 'NOT equal' 
cur.close() 
con.close() 
print '-' * 25 

# Now, we try the exact same equality test, using adapters and
converters, substituting 
# the Python exact precision decimal type for float. 

# Probably don't need to re-register. We did that above. We probably just
need to parse declared types when 
# we open the connection. 
# H... This fails whether I re-register or not. 
# sqlite3.register_adapter(decimal.Decimal, AdaptDecimal) 
# sqlite3.register_converter("Decimal", ConvertDecimal) 
print "Testing Sum aggregation with adapters / converters registered. Result
SHOULD BE Python Decimal type, but is NOT. Should PASS equality test, but
doesn't." 
con = sqlite3.connect('test4.db',detect_types = sqlite3.PARSE_DECLTYPES) 
cur = con.cursor() 
cur.execute("create table test (pkey integer primary key,somenumber
Decimal);") 
for x in range(10): 
cur.execute("insert into test values (null,.1);") 
cur.execute("select sum(somenumber) as total from test;") 
rows = cur.fetchall() 
print rows[0][0], type(rows[0][0]) 
if rows[0][0] == 1: 
print 'equal' 
else: 
print 'NOT equal' 
cur.close() 
con.close() 
print '-' * 25 

## OK. Let's try the exact same equality test, using manual
summation. First for floats. 

print 'Testing manual summation against native data types, no adapters
converters. Should FAIL equality test, because the sum is a float.' 
con = sqlite3.connect('test5.db') 
cur = con.cursor() 
cur.execute("create table test (pkey integer primary key,somenumber
Decimal);") 
for x in range(10): 
cur.execute("insert into test values (null,.1);") 
cur.execute("select * from test;") 
rows = cur.fetchall() 
total = 0.0 
for row in rows: 
total += row[1] 
print total,type(total) 
if total == 1: 
print 'equal' 
else: 
print 'NOT equal' 
cur.close() 
con.close() 
print '-' * 25 

# Now, using adapters and converters with manual summation. 

print 'Testing manual