Re: [sqlite] Unicode Again... Sti ll Stuck... A Challenge... Store and retrieve the word résumé with out using a unicode string literal
Folks: Thanks so much for your replies. I have absorbed a lot of information about code pages and unicode in the last couple of days. My understanding is far from complete, but I'm ahead of where I was... In the end, my best answer was to set the text_factory property of the connection object to str. That way, there is no translation of the bytes in memory to the database and back out of the database. But, it also assumes (I suppose) that the code page in use by the workstation machine that posted the page will match the code page selected by my server machine. Since that should almost always be the case for me (at least in the near term), I have decided to just set text_factory to str. I am now able to store résumé to my Sqlite database and faithfully have it returned in the same condition (which was the challenge that brought me to you in the first place). Thanks again for all your help and insight! Doug wcmadness wrote: > > Surely there is an answer to this question... > > I'm using Python and PySqlite. I'm trying to store the word résumé to a > text field. I'm really doing this as a test to see how to handle > diacritical letters, such as umlaut characters (from German) or accented > characters (from French). I can produce French é on my keyboard with > Alt-130... > > If I were coding a string literal, I would send through the data as > unicode, as in: u'résumé'. But, I'm not that lucky. The data is coming > from an HTML form or from a flat file. It will take on the default codec > used on my machine (latin-1). If I just send it through as is, it has > problems either when I fetchall or when I try to print what I've fetched. > So, for example: > > Insert Into tblTest (word) values ('résumé') > > will cause problems. > > I know that Sqlite stores text data as utf-8. I know that in Python (on > my machine, at least) strings are stored as latin-1. So, for example, in > Python code: > > v = 'résumé' > > v would be of type str, using latin-1 encoding. > > So, I have tried sending through my data as follows: > > cur.execute("Insert Into tblTest (word) values (?)", > ("résumé".decode("latin-1").encode("utf-8"),)) > > That stores the data just fine, but when I fetchall, I still have > problems. Say, I select * from tblTest and then do: > > l = cur.fetchall() > > Doing print l[0][1] (to print the word résumé) will give a nasty message > about ascii codec can't convert character \x082 (or some variation of that > message). > > I've tried doing: > > print l[0][1].decode('utf-8').encode('latin-1') > > But to no avail. > > The simple question is this: > > How do I store the word résumé to a Sqlite DB without using a unicode > literal (e.g. u'résumé'), such that printing the results retrieved from > fetchall will not crash? > > Surely someone is doing this... Say you get data from an HTML page that > contains diacritical characters. You need to store it to Sqlite and > retrieve it back out for display. What do you do??? > > I'm stuck! > > Doug > -- View this message in context: http://www.nabble.com/Unicode-Again...-Still-Stuck...-A-Challenge...-Store-and-retrieve-the-word-r%C3%A9sum%C3%A9-without-using-a-unicode-string-literal-tf4190926.html#a11938827 Sent from the SQLite mailing list archive at Nabble.com. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Unicode Again... Sti ll Stuck... A Challenge... Store and retrieve the word résumé with out using a unicode string literal
Well, I have a solution to my own problem, and I wanted to post it for two reasons: First, it might help someone; second, I'm wondering if someone can explain it to me... Here's the scoop... I'm on a Windows machine. It turns out that the default code page on Windows is cp437. So, in my Python code, if I type: s = 'résumé' (with the French e s), it is stored as: 'r\x82sum\x82' because hex 82 (decimal 130) is the code for French e in code page 437 (used by Windows)... OK. So, now that I now my data comes to me from the HTML form (or in a flat file) in code page 437 on a Windows machine, I can do the following when I send the data to the database: f = cgi.FieldStorage() cur.execute("insert into test values (?,?)", (f['txtName'].value.decode('cp437'))) The decode method after the incoming form data will force a translation from code page 437 to unicode (from 1 byte per character according to extended ascii set code page 437 to 2 bytes per character -- unicode). That's all fine. Now, when I get the data with: cur.execute("select * from test") mylist = cur.fetchall() I would expect that I would need to encode the unicode data coming from Sqlite to get back to my original code page 437 (of course, I could also just use the data as unicode). So, I would expect to do this: (say that row one, column one has the value of résumé) In that case, the following should return me exactly to the original 'r\x82sum\x82' mylist[0][0].encode('cp437') But it doesn't!!! (Wacky)! Rather, it gives me this: 'r\xe9sum\xe9' Interestingly, that's almost the same as what I get with a unicode literal. In other words, if I write this Python code: x = u'résumé' and then type x in the shell to see what it is, I get this: u'r\xe9sum\xe9' The only difference is that the latter is unicode and the former ('r\xe9sum\xe9') is not. So, to get back where I started, I do the fetchall and then this wacky thing: eval("u'" + mylist[0][0].encode('cp437') + "'").encode('cp437') In other words, I say: OK, you're almost there. Now, convert to unicode by evaluating the string as a unicode literal and then encode the unicode back to the code page 437. What a kludge. It seems like an awefully lot of work to get back to the original data that was stored to the database. And why? Does anyone know what's going on here??? Thanks. wcmadness wrote: > > Surely there is an answer to this question... > > I'm using Python and PySqlite. I'm trying to store the word résumé to a > text field. I'm really doing this as a test to see how to handle > diacritical letters, such as umlaut characters (from German) or accented > characters (from French). I can produce French é on my keyboard with > Alt-130... > > If I were coding a string literal, I would send through the data as > unicode, as in: u'résumé'. But, I'm not that lucky. The data is coming > from an HTML form or from a flat file. It will take on the default codec > used on my machine (latin-1). If I just send it through as is, it has > problems either when I fetchall or when I try to print what I've fetched. > So, for example: > > Insert Into tblTest (word) values ('résumé') > > will cause problems. > > I know that Sqlite stores text data as utf-8. I know that in Python (on > my machine, at least) strings are stored as latin-1. So, for example, in > Python code: > > v = 'résumé' > > v would be of type str, using latin-1 encoding. > > So, I have tried sending through my data as follows: > > cur.execute("Insert Into tblTest (word) values (?)", > ("résumé".decode("latin-1").encode("utf-8"),)) > > That stores the data just fine, but when I fetchall, I still have > problems. Say, I select * from tblTest and then do: > > l = cur.fetchall() > > Doing print l[0][1] (to print the word résumé) will give a nasty message > about ascii codec can't convert character \x082 (or some variation of that > message). > > I've tried doing: > > print l[0][1].decode('utf-8').encode('latin-1') > > But to no avail. > > The simple question is this: > > How do I store the word résumé to a Sqlite DB without using a unicode > literal (e.g. u'résumé'), such that printing the results retrieved from > fetchall will not crash? > > Surely someone is doing this... Say you get data from an HTML page that > contains diacritical characters. You need to store it to Sqlite and > retrieve it back out for display. What do you do??? > > I'm stuck! > > Doug > -- View this message in context: http://www.nabble.com/Unicode-Again...-Still-Stuck...-A-Challenge...-Store-and-retrieve-the-word-r%C3%A9sum%C3%A9-without-using-a-unicode-string-literal-tf4190926.html#a11918870 Sent from the SQLite mailing list archive at Nabble.com. - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Unicode
I'm stuck on this. I'm writing a data layer that potentially needs to handle diacritical (sp?) characters, such a French accented é characters or German umlauted characters (sp?). It should be rare that I would run into something like this, but the data layer should handle it nevertheless. For example, it would certainly be expected to handle something as simple as the word résumé or the name Réggé. I've tried quite a few things now, and I just can't get to a solid solution. The data gets stored to Sqlite, but when I try to select it, I have problems. Here's a sample of the error I get from the Python shell trying to select data with accented characters: >>> import sqlite3 >>> con = sqlite3.connect('test.db') >>> cur = con.cursor() >>> cur.execute("select * from test order by name") >>> l = cur.fetchall() Traceback (most recent call last): File "", line 1, in sqlite3.OperationalError: Could not decode to UTF-8 column 'name' with text 'RΘs umΘ' >>> Now, I'll post the code that created it. It tried to store résumé as name (even though that's not really a name)... The data will typically be collected from an HTML page, so I am posting the HTML first and then the Python code that handles it. I have tried using the Python unicode() function and the Python decode() function before the data goes into the database... In any event, though, storing the data is not the problem... And, indeed, the data ALWAYS seems to get stored to Sqlite as unicode. But, even stored as unicode, pysqlite has problems fetching the data. In fact, my problem may lay with pysqlite... In any case, here's the code. Any insight would be most welcome. Please reach me by e-mail at [EMAIL PROTECTED] Thanks... Oh, the structure of the table (called test) is simply (name, birthdate). Here's the HTML: Name: B-Dt: Here's cptest2.py (the program that the HTML posts to): import cgi import sqlite3 f = cgi.FieldStorage() def StripNonAlpha(pstrValue): lstrRetVal = '' for s in pstrValue: if 'a' <= s.lower() <= 'z': lstrRetVal += s return lstrRetVal TypeName = StripNonAlpha(str(type(f['txtName'].value))) TypeBirthDate = StripNonAlpha(str(type(f['txtBirthDate'].value))) #uName = unicode(f['txtName'].value,"Latin-1") #uBirthDate = unicode(f['txtBirthDate'].value,"Latin-1") #uTypeName = StripNonAlpha(str(type(unicode(f['txtName'].value,"Latin-1" #uTypeBirthDate = StripNonAlpha(str(type(unicode(f['txtBirthDate'].value,"Latin-1" print """Content-type: text/html name: %s name type: %s b-dt: %s b-dt type: %s """ % (f['txtName'].value,TypeName,f['txtBirthDate'].value,TypeBirthDate) con = sqlite3.connect('test.db') cur = con.cursor() cur.execute("insert into test values (?,?)", (f['txtName'].value,f['txtBirthDate'].value)) con.commit() cur.close() con.close() -- View this message in context: http://www.nabble.com/Unicode-tf4167305.html#a11856263 Sent from the SQLite mailing list archive at Nabble.com. - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Custom Aggregation Not Working
Hey, Folks: I'm writing a financial application and MUST have exact math decimals (no floats). So, I'm using Python's decimal module. My database is Sqlite (and my language is Python with Pysqlite); Sqlite doesn't offer a non-floating point decimal type. But, it does have adapters and converters to store data as a native Sqlite type (string / text) in the database and bring it out of the database and into memory as something else (Python Decimal, in my case). That works great, but it does NOT seem to apply to aggregation operations. I need it to. So, I tried using another Sqlite feature, custom aggregation functions, but to no avail. Does anyone know how to fix this? What am I doing wrong? I am submitting all of my test code, below. Following that, I am submitting my results. Thanks for your time in helping with this! Here's the code: import sqlite3 import decimal # This way will store the value as float, potentially losing precision. print '-' * 25 print 'Testing native data types, no adapters / converters.' con = sqlite3.connect('test1.db') cur = con.cursor() cur.execute("create table test (pkey integer primary key,somenumber Decimal);") cur.execute("insert into test values (null,.1);") cur.execute("insert into test values (null,.2);") cur.execute("select * from test;") rows = cur.fetchall() for row in rows: print row[0], type(row[0]) print row[1], type(row[1]) cur.close() con.close() print '-' * 25 # This way will store the value as decimal, keeping exact precision. def AdaptDecimal(pdecValue): return str(pdecValue) def ConvertDecimal(pstrValue): return decimal.Decimal(pstrValue) decimal.getcontext().precision = 50 sqlite3.register_adapter(decimal.Decimal, AdaptDecimal) sqlite3.register_converter("Decimal", ConvertDecimal) print 'Testing data type with adapters / converters. Decimal numbers should be Python Decimal types.' con = sqlite3.connect('test2.db',detect_types = sqlite3.PARSE_DECLTYPES) cur = con.cursor() cur.execute("create table test (pkey integer primary key,somenumber Decimal);") cur.execute("insert into test values (null,.1);") cur.execute("insert into test values (null,.2);") cur.execute("select * from test;") rows = cur.fetchall() for row in rows: print row[0], type(row[0]) print row[1], type(row[1]) cur.close() con.close() print '-' * 25 # OK. That works. Now for the real test. Let's try an equality test. # Classic float equality failure .1 + .1... 10 times should NOT equal 1. # As predicted, this will FAIL the equality test print 'Testing Sum aggregation on native data types. Should be float and should fail equality test.' con = sqlite3.connect('test3.db') cur = con.cursor() cur.execute("create table test (pkey integer primary key,somenumber Decimal);") for x in range(10): cur.execute("insert into test values (null,.1);") cur.execute("select sum(somenumber) as total from test;") rows = cur.fetchall() print rows[0][0], type(rows[0][0]) if rows[0][0] == 1: print 'equal' else: print 'NOT equal' cur.close() con.close() print '-' * 25 # Now, we try the exact same equality test, using adapters and converters, substituting # the Python exact precision decimal type for float. # Probably don't need to re-register. We did that above. We probably just need to parse declared types when # we open the connection. # H... This fails whether I re-register or not. # sqlite3.register_adapter(decimal.Decimal, AdaptDecimal) # sqlite3.register_converter("Decimal", ConvertDecimal) print "Testing Sum aggregation with adapters / converters registered. Result SHOULD BE Python Decimal type, but is NOT. Should PASS equality test, but doesn't." con = sqlite3.connect('test4.db',detect_types = sqlite3.PARSE_DECLTYPES) cur = con.cursor() cur.execute("create table test (pkey integer primary key,somenumber Decimal);") for x in range(10): cur.execute("insert into test values (null,.1);") cur.execute("select sum(somenumber) as total from test;") rows = cur.fetchall() print rows[0][0], type(rows[0][0]) if rows[0][0] == 1: print 'equal' else: print 'NOT equal' cur.close() con.close() print '-' * 25 ## OK. Let's try the exact same equality test, using manual summation. First for floats. print 'Testing manual summation against native data types, no adapters converters. Should FAIL equality test, because the sum is a float.' con = sqlite3.connect('test5.db') cur = con.cursor() cur.execute("create table test (pkey integer primary key,somenumber Decimal);") for x in range(10): cur.execute("insert into test values (null,.1);") cur.execute("select * from test;") rows = cur.fetchall() total = 0.0 for row in rows: total += row[1] print total,type(total) if total == 1: print 'equal' else: print 'NOT equal' cur.close() con.close() print '-' * 25 # Now, using adapters and converters with manual summation. print 'Testing manual