Re: [sqlalchemy] SQLite: OperationalError when decoding 0x92 in TEXT column

2014-02-09 Thread Erich Blume
Thanks for the help on this, everyone! I found two ways to deal with this
and figured I should share in case it comes up in the future.

The first approach, the one I went with (because in my case, fidelity was
not as important) was to alter the 'text_factory' the sqlite3 uses. One
trick here was that to access this parameter for a sqlalchemy connection
object 'conn' as returned from an engine, I had to do it like so:

conn.connection.connection.text_factory

The extra indirection on .connection* was because the connection's internal
dbapi connection was in fact a proy '_ConnectionFairy' instance. A small
issue but one that's difficult to notice at first as it raises no error to
set a new property on an object that will never use that property.

The second approach I found was more involved but would be more appropriate
for large projects that didn't want to have this behavior on every column.
In this approach you still change the text_factory of sqlite3's connection,
but instead change it to `bytes` (or perhaps an identity? ie lambda x: x -
not sure what is best). This will cause sqlite3 to return encoded utf-8
bytes instead of unicode strings.

Then, you have to tell SQLAlchemy to convert these strings to unicode. I
did not persue this approach far enough to find the right set of arguments
but I imagine this would be very simple - set 'force_unicode' to True, I
suspect, would be all you would need.

Finally, for the column with the invalid utf-8 sequences, just also set the
`unicode_error` to your preferred resolution strategy - usually 'ignore' or
'replace'.

I suppose it is possible that this could incur a performance penalty - the
sqlite3 de/encoding process is done in a compiled C module and as such
could possibly be faster than using native python for the task. I suspect
though that the module is just calling to the usual Python library
functions for encoding/decoding (but did not check). So that may be of a
concern if you follow this approach.

Thanks again!


On Thu, Feb 6, 2014 at 4:19 PM, Michael Bayer mike...@zzzcomputing.comwrote:


 On Feb 6, 2014, at 6:59 PM, Erich Blume blume.er...@gmail.com wrote:

  Hmm, this one has me stumped. As best I can tell after poking at it
 using the column_reflect event, a custom dialect, etc. - the issue here is
 that in pysqlite.py we (in my Python 3.3 install) are selecting
 `sqlite3.dbapi2` as the dbapi interface, but we aren't telling sqlite3
 anything about how to treat unicode errors. From what I am reading (but it
 seems inconsistent, maybe?) sqlite3 automatically decodes all database
 retrieved values from their bytes for text fields, returning unicode
 strings. Except... that doesn't always seem to be true. I hex-edited a db
 file to change the utf-8 string hello to hell + 0x92 and sqlite3
 switched from returning hello to bhell\x92, or something like that -
 I've been poking at this for so long I've lost track of that transcript.
 
  One can override sqlite3's text factory, apparently, with (for instance)
 `sqlite3.text_factory = lambda x: x.decode('utf-8', errors='ignore')`.
 Maybe the key is to try and find a way to trigger that from sqlalchemy? I
 tried and failed, maybe someone else can point me back to the path?
 
  Just to re-summarize the problem: In python 3, I'm getting errors trying
 to read a row from a sqlite database that has a TEXT column with an invalid
 utf-8 sequence (specifically, the singleton bye '0x92'). I'd love to just
 have sqlalchemy move along and ignore the byte, but I'm not clear how to do
 that.

 Pysqlite (e.g. sqlite3 module) returns TEXT as Python unicode out of the
 gate.  That exception message is being raised by sqlite3 itself, SQLAlchemy
 is just a pass-through, as the string type knows on sqlite that the value
 is already unicode.

 you might need to CAST() the value as BINARY perhaps, not sure.   You’d
 first want to get a plain sqlite3 script to do what you want.   Setting a
 “text_factory” at the module level of sqlite3 is certainly easy enough but
 that seems way too broad.   Ideally you’d want to be able to get the value
 on a per-column basis.




-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [sqlalchemy] SQLite: OperationalError when decoding 0x92 in TEXT column

2014-02-06 Thread Erich Blume
Hmm, this one has me stumped. As best I can tell after poking at it using 
the column_reflect event, a custom dialect, etc. - the issue here is that 
in pysqlite.py we (in my Python 3.3 install) are selecting `sqlite3.dbapi2` 
as the dbapi interface, but we aren't telling sqlite3 anything about how to 
treat unicode errors. From what I am reading (but it seems inconsistent, 
maybe?) sqlite3 automatically decodes all database retrieved values from 
their bytes for text fields, returning unicode strings. Except... that 
doesn't always seem to be true. I hex-edited a db file to change the utf-8 
string hello to hell + 0x92 and sqlite3 switched from returning hello 
to bhell\x92, or something like that - I've been poking at this for so 
long I've lost track of that transcript.

One can override sqlite3's text factory, apparently, with (for instance) 
`sqlite3.text_factory = lambda x: x.decode('utf-8', errors='ignore')`. 
Maybe the key is to try and find a way to trigger that from sqlalchemy? I 
tried and failed, maybe someone else can point me back to the path?

Just to re-summarize the problem: In python 3, I'm getting errors trying to 
read a row from a sqlite database that has a TEXT column with an invalid 
utf-8 sequence (specifically, the singleton bye '0x92'). I'd love to just 
have sqlalchemy move along and ignore the byte, but I'm not clear how to do 
that.

On Tuesday, February 4, 2014 4:33:46 AM UTC-8, Simon King wrote:

 I've not done much with reflection, but perhaps you could use the 
 column_reflect event: 

snip 

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.


[sqlalchemy] SQLite: OperationalError when decoding 0x92 in TEXT column

2014-02-04 Thread Erich Blume
I am working on a binding to a SQLite database that I do not control the 
creation of, with the aid of reflection. I'm running in to what I believe 
are very basic UTF-8 decoding errors. For instance, a TEXT cell has the 
byte '0x92' in it and is causing an OperationalError. Presumably, this is 
because 0x92 (by itself) is not a valid encoding for any Unicode code 
point. I would prefer that the decoding from UTF-8 to be forced, perhaps by 
dropping the bad byte. How can I do this?

The database has a table with a column called 'description', which is of 
type TEXT. The PRAGMA encoding is left at 'UTF-8', thank goodness. One of 
the rows, however, contains within its otherwise ascii byte contents the 
singleton byte '0x92'. Based on the context of the sentence, it seems that 
this was intended to be encoded as a single quotation mark, some googling 
suggests 'RIGHT SINGLE QUOTATION MARK' in unicode, which is '0xE2 0x80 
0x99'. I gather that MSSQL (which was the original source of the data in 
this database) uses Microsofts' infernal web encodings sometimes and that 
is probably the source of this byte.

The issue is this: I really need to read this data! It would be *ideal* to 
have the aid of something like python's 'replace' decoding handler but 
failing that just eliding the byte would do fine in a pinch.

When fetching this row in Python 3.3 with SQLAlchemy 0.9.1 my session looks 
vaguely like this (with the text and stack trace truncated out for brevity).

  File 
/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/sqlalchemy/engine/result.py,
 
line 760, in listcomp
return [process_row(metadata, row, processors, keymap)
  sqlalchemy.exc.OperationalError: (OperationalError) Could not decode 
to UTF-8 column 'description' with text ...

Is there some way to accomplish this?

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [sqlalchemy] SQLite: OperationalError when decoding 0x92 in TEXT column

2014-02-04 Thread Erich Blume
Thanks Simon,

Do you know how I might use that with reflection? There's several hundred
of these columns, I'd hate to have to override each one individually - that
sort of defeats the purpose of reflection.

One thought I just had was perhaps I could subclass the Text type and then
override the ischema_names for SQLite for TEXT type. That'd do the trick, I
suspect!


On Tue, Feb 4, 2014 at 3:26 AM, Simon King si...@simonking.org.uk wrote:

 On Tue, Feb 4, 2014 at 10:15 AM, Erich Blume blume.er...@gmail.com
 wrote:
  I am working on a binding to a SQLite database that I do not control the
  creation of, with the aid of reflection. I'm running in to what I believe
  are very basic UTF-8 decoding errors. For instance, a TEXT cell has the
 byte
  '0x92' in it and is causing an OperationalError. Presumably, this is
 because
  0x92 (by itself) is not a valid encoding for any Unicode code point. I
 would
  prefer that the decoding from UTF-8 to be forced, perhaps by dropping the
  bad byte. How can I do this?
 
  The database has a table with a column called 'description', which is of
  type TEXT. The PRAGMA encoding is left at 'UTF-8', thank goodness. One
 of
  the rows, however, contains within its otherwise ascii byte contents the
  singleton byte '0x92'. Based on the context of the sentence, it seems
 that
  this was intended to be encoded as a single quotation mark, some googling
  suggests 'RIGHT SINGLE QUOTATION MARK' in unicode, which is '0xE2 0x80
  0x99'. I gather that MSSQL (which was the original source of the data in
  this database) uses Microsofts' infernal web encodings sometimes and
 that is
  probably the source of this byte.
 
  The issue is this: I really need to read this data! It would be *ideal*
 to
  have the aid of something like python's 'replace' decoding handler but
  failing that just eliding the byte would do fine in a pinch.
 
  When fetching this row in Python 3.3 with SQLAlchemy 0.9.1 my session
 looks
  vaguely like this (with the text and stack trace truncated out for
 brevity).
 
File
 
 /usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/sqlalchemy/engine/result.py,
  line 760, in listcomp
  return [process_row(metadata, row, processors, keymap)
sqlalchemy.exc.OperationalError: (OperationalError) Could not
 decode
  to UTF-8 column 'description' with text ...
 
  Is there some way to accomplish this?
 

 The String-related column types have a unicode_error parameter which
 sounds like it might be what you want:


 http://docs.sqlalchemy.org/en/rel_0_9/core/types.html#sqlalchemy.types.String.params.unicode_error

 Note the various warnings around it though...

 Hope that helps,

 Simon

 --
 You received this message because you are subscribed to a topic in the
 Google Groups sqlalchemy group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/sqlalchemy/T--Ftk5EVZg/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 sqlalchemy+unsubscr...@googlegroups.com.
 To post to this group, send email to sqlalchemy@googlegroups.com.
 Visit this group at http://groups.google.com/group/sqlalchemy.
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.


[sqlalchemy] SQLite dialect missing data type affinities, causes problems with reflection (DOUBLE, LONGTEXT, etc.)

2014-01-28 Thread Erich Blume
I am somewhat new to SQLAlchemy, but as far as I can tell there is no 
existing facility in the SA SQLite dialect 
(sqlalchemy.dialects.sqlite.pysqlite.SQLiteDialect_pysqlite) to resolve 
SQLite's data type affinities, as shown in section 2.2 of the SQLite data 
type docs: http://www.sqlite.org/datatype3.html - and this omission is 
causing reflection to fail for me on a specific sqlite database file.

Here is an example session (apologies if this is poorly formatted, I'm not 
used to using the google groups UI for technical discussion):

 import sqlalchemy as sa
  eng = sa.create_engine('sqlite:tmp/eve-asset-db.sqlite')
  meta = sa.MetaData()
  meta.reflect(bind=eng)
 [OMITTED]/python3.3/site-packages/sqlalchemy/dialects/sqlite/base.py:808: 
 SAWarning: Did not recognize type 'DOUBLE' of column 'sizeFactor'
   default, primary_key))

... (Many additional similar warnings suppressed)

By inspecting base.py I see that indeed only the core types are supported, 
and 'Affinity Types' like DOUBLE and LONGTEXT will cause this error. 
One question I have that I can't find an answer to is why does this SQLite 
database have a column marked as a DOUBLE? Shouldn't it have been converted 
to a REAL when it was created? I'm not familiar enough with SQLite to know 
the answer to that question.

Regardless of the answer to that question though, it seems reasonable to me 
that the reflection should succeed and automatically convert the columns to 
REAL/TEXT/etc.

Here is a link to a third-party website which has created the sqlite 
database I am trying to reflect. Please note that the remote source updates 
this file every few months and so viewers from the future might not get the 
same behavior: https://www.fuzzwork.co.uk/dump/sqlite-latest.sqlite.bz2 
[341MB uncompressed]

Here are my environment details:

Python 3.3
SQLAlchemy 0.9.1

I don't know what version of SQLite was used to create the database. If 
there's a way to find out, let me know and I will update.

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.


[sqlalchemy] Re: SQLite dialect missing data type affinities, causes problems with reflection (DOUBLE, LONGTEXT, etc.)

2014-01-28 Thread Erich Blume
Some additional information - a GUI tool for inspecting sqlite databases 
tells me that the following is the CREATE syntax for such a table as I 
mention above:

CREATE TABLE invTypes (
   typeID integer NOT NULL,
   groupID integer DEFAULT NULL,
   typeName varchar(200) DEFAULT NULL,
   description varchar(6000) DEFAULT NULL,
   mass double DEFAULT NULL,
   volume double DEFAULT NULL,
   capacity double DEFAULT NULL,
   portionSize integer DEFAULT NULL,
   raceID integer  DEFAULT NULL,
   basePrice decimal(19,4) DEFAULT NULL,
   published integer DEFAULT NULL,
   marketGroupID integer DEFAULT NULL,
   chanceOfDuplicating double DEFAULT NULL,
   PRIMARY KEY (typeID)
 )


I should stress that I don't know how the interface got this information 
other than that it opened the database file.

On Tuesday, January 28, 2014 6:36:26 PM UTC-8, Erich Blume wrote:

 I am somewhat new to SQLAlchemy, but as far as I can tell there is no 
 existing facility in the SA SQLite dialect 
 (sqlalchemy.dialects.sqlite.pysqlite.SQLiteDialect_pysqlite) to resolve 
 SQLite's data type affinities, as shown in section 2.2 of the SQLite data 
 type docs: http://www.sqlite.org/datatype3.html - and this omission is 
 causing reflection to fail for me on a specific sqlite database file.

 Here is an example session (apologies if this is poorly formatted, I'm not 
 used to using the google groups UI for technical discussion):

  import sqlalchemy as sa
  eng = sa.create_engine('sqlite:tmp/eve-asset-db.sqlite')
  meta = sa.MetaData()
  meta.reflect(bind=eng)
 [OMITTED]/python3.3/site-packages/sqlalchemy/dialects/sqlite/base.py:808: 
 SAWarning: Did not recognize type 'DOUBLE' of column 'sizeFactor'
   default, primary_key))

 ... (Many additional similar warnings suppressed)

 By inspecting base.py I see that indeed only the core types are supported, 
 and 'Affinity Types' like DOUBLE and LONGTEXT will cause this error. 
 One question I have that I can't find an answer to is why does this SQLite 
 database have a column marked as a DOUBLE? Shouldn't it have been converted 
 to a REAL when it was created? I'm not familiar enough with SQLite to know 
 the answer to that question.

 Regardless of the answer to that question though, it seems reasonable to 
 me that the reflection should succeed and automatically convert the columns 
 to REAL/TEXT/etc.

 Here is a link to a third-party website which has created the sqlite 
 database I am trying to reflect. Please note that the remote source updates 
 this file every few months and so viewers from the future might not get the 
 same behavior: https://www.fuzzwork.co.uk/dump/sqlite-latest.sqlite.bz2[341MB 
 uncompressed]

 Here are my environment details:

 Python 3.3
 SQLAlchemy 0.9.1

 I don't know what version of SQLite was used to create the database. If 
 there's a way to find out, let me know and I will update.

 Thanks!


-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [sqlalchemy] SQLite dialect missing data type affinities, causes problems with reflection (DOUBLE, LONGTEXT, etc.)

2014-01-28 Thread Erich Blume
Section 2.1 ( http://www.sqlite.org/datatype3.html ) has a deterministic 
algorithm for transforming an unknown column type in to it's 
inverse-affinity real type. It seems to me that we could modify the code in 
base.py around 820 (I'm at home and working from memory) to implement that 
algorithm if the chosen schema is sqlite and if the column type is not 
found in ischema_names. I'd be happy to cook up a patch for that if it's 
not too objectionable to anyone.

I'll try and see if I can find out how the author created this database 
with the incorrect column type names. If s/he used an official binary to 
create it then I feel it's probably important for SQLAlchemy to support it.

Thanks for the workaround in the mean time!

On Tuesday, January 28, 2014 7:53:02 PM UTC-8, Michael Bayer wrote:

 these types as far as sqlite are concerned are kind of semi-“fake”, 
 they’re just lots of names that all end up having the same effect of an 
 “affinity”, and even that is not very strong as you can still put a text 
 string into these columns.  Sqlite gives us no way to just get at the 
 simple “pragma rules” associated with the type names, the docs just have 
 this vague statement This table shows only a small subset of the datatype 
 names that SQLite will accept.” - OK great wheres *all* the names, or a 
 function that can give us the affinity rule number?  there is none.  so we 
 are stuck guessing (unless someone wants to read the C source).

 so anyway, “DOUBLE” should be added however you can add this on your own 
 like this:

 from sqlalchemy.dialects.sqlite import base
 from sqlalchemy.types import DOUBLE

 base.ischema_names[‘DOUBLE’] = DOUBLE

 On Jan 28, 2014, at 9:38 PM, Erich Blume blume...@gmail.com javascript: 
 wrote:

 Some additional information - a GUI tool for inspecting sqlite databases 
 tells me that the following is the CREATE syntax for such a table as I 
 mention above:

 CREATE TABLE invTypes (
   typeID integer NOT NULL,
   groupID integer DEFAULT NULL,
   typeName varchar(200) DEFAULT NULL,
   description varchar(6000) DEFAULT NULL,
   mass double DEFAULT NULL,
   volume double DEFAULT NULL,
   capacity double DEFAULT NULL,
   portionSize integer DEFAULT NULL,
   raceID integer  DEFAULT NULL,
   basePrice decimal(19,4) DEFAULT NULL,
   published integer DEFAULT NULL,
   marketGroupID integer DEFAULT NULL,
   chanceOfDuplicating double DEFAULT NULL,
   PRIMARY KEY (typeID)
 )


 I should stress that I don't know how the interface got this information 
 other than that it opened the database file.

 On Tuesday, January 28, 2014 6:36:26 PM UTC-8, Erich Blume wrote:

 I am somewhat new to SQLAlchemy, but as far as I can tell there is no 
 existing facility in the SA SQLite dialect 
 (sqlalchemy.dialects.sqlite.pysqlite.SQLiteDialect_pysqlite) to resolve 
 SQLite's data type affinities, as shown in section 2.2 of the SQLite data 
 type docs: http://www.sqlite.org/datatype3.html - and this omission is 
 causing reflection to fail for me on a specific sqlite database file.

 Here is an example session (apologies if this is poorly formatted, I'm 
 not used to using the google groups UI for technical discussion):

  import sqlalchemy as sa
  eng = sa.create_engine('sqlite:tmp/eve-asset-db.sqlite')
  meta = sa.MetaData()
  meta.reflect(bind=eng)
 [OMITTED]/python3.3/site-packages/sqlalchemy/dialects/sqlite/base.py:808: 
 SAWarning: Did not recognize type 'DOUBLE' of column 'sizeFactor'
   default, primary_key))

 ... (Many additional similar warnings suppressed)

 By inspecting base.py I see that indeed only the core types are 
 supported, and 'Affinity Types' like DOUBLE and LONGTEXT will cause 
 this error. One question I have that I can't find an answer to is why does 
 this SQLite database have a column marked as a DOUBLE? Shouldn't it have 
 been converted to a REAL when it was created? I'm not familiar enough with 
 SQLite to know the answer to that question.

 Regardless of the answer to that question though, it seems reasonable to 
 me that the reflection should succeed and automatically convert the columns 
 to REAL/TEXT/etc.

 Here is a link to a third-party website which has created the sqlite 
 database I am trying to reflect. Please note that the remote source updates 
 this file every few months and so viewers from the future might not get the 
 same behavior: 
 https://www.fuzzwork.co.uk/dump/sqlite-latest.sqlite.bz2[341MB uncompressed]

 Here are my environment details:

 Python 3.3
 SQLAlchemy 0.9.1

 I don't know what version of SQLite was used to create the database. If 
 there's a way to find out, let me know and I will update.

 Thanks!


 -- 
 You received this message because you are subscribed to the Google Groups 
 sqlalchemy group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to sqlalchemy+...@googlegroups.com javascript:.
 To post to this group, send email to sqlal...@googlegroups.comjavascript:
 .
 Visit this group at http