Re: [sqlalchemy] Performance questions on semi-large result sets

2015-10-22 Thread Gaëtan de Menten
On Wed, Oct 21, 2015 at 11:43 PM, Mike Bayer  wrote:

> If we OTOH use native_unicode=False and use the Unicode type for the
> columns we care about, that seems a lot faster, e.g. changing the
> mapping to:

[...]

> When we use the Unicode type and set use_native_unicode=False,
> SQLAlchemy's own use of Python unicode codecs takes place; this takes
> place within the C extensions and has been highly optimized for speed in
> the way that the codec is invoked.  We've already observed that
> cx_Oracle's native unicode methodologies turn out to be much slower than
> SQLAlchemy's, which was unexpected since cx_Oracle is pure C code, and
> in this case psycopg2 is also pure C code so it is again surprising
> SQLAlchemy's approach is a little faster.

Back then when I created the C extensions, I found that out as well:
virtually all backends were slower at decoding unicode than our own
unicode handling. At that time, there was no easy way (that I knew of,
at least) to turn off the native decoding of the DBAPI via SQLAlchemy.
I am glad to see that it now exists !

Congratulations to Mike for his dedication and endurance in the
project, You never cease to impress.

Gaëtan

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Performance questions on semi-large result sets

2015-10-21 Thread Mike Bayer


On 10/21/15 4:52 PM, Uri Okrent wrote:
> Hello, I'm trying to understand the various performance implications of
> working with the ORM layer, vs the SQL language layer, vs a raw dbapi
> connection.  Ideally, I would like to stick with the ORM layer
> 
> First, the set up:
> CentOS 6
> SQLAlchemy 0.9.4
> Psycopg2 2.5.3
> Postgresql 9.3.9
> 
> My test is a simple query including a join on three columns, guid, name,
> and index -- guid and name are Text and index is an Integer.  Note, in
> all cases, I'm querying for individual columns, not ORM objects.
> 
> connection setup:
> |
engine =sqlalchemy.engine.create_engine("postgresql:///xmsdb",echo=True)
Session=sqlalchemy.orm.sessionmaker(bind=engine)
session=Session()
> |
> 
> 
> the query:
> 
> |
q =session.query(Volume.guid,Volume.name,Volume.index)
> |
> 
> 
> First, query using the ORM layer:
> 
> |
withprofiled():
> ...  r =q.all()
> ...
> 2015-10-2120:24:20,539INFO sqlalchemy.engine.base.EngineSELECT
> volumes.guid AS volumes_guid,mom_objects.name AS
> mom_objects_name,mom_objects.index AS mom_objects_index
> FROM mom_objects JOIN volumes ON mom_objects.guid =volumes.guid
> 2015-10-2120:24:20,539INFO sqlalchemy.engine.base.Engine{}
> 
>  147901functioncalls (147890primitive calls)in0.396CPU seconds
> 
>Orderedby:cumulative time
> 
>ncalls  tottime  percall  cumtime  percall filename:lineno(function)
> 
> 1   0.007   0.007   0.396 
>  0.396/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py:2286(all)
>  8191   0.055   0.000   0.357 
>  
> 0.000/usr/lib64/python2.6/site-packages/sqlalchemy/orm/loading.py:26(instances)
> 24570   0.060   0.000   0.123 
>  0.000/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py:3468(proc)
> 1   0.000   0.000   0.093 
>  
> 0.093/usr/lib64/python2.6/site-packages/sqlalchemy/engine/result.py:778(fetchall)
> 1   0.000   0.000   0.089 
>  
> 0.089/usr/lib64/python2.6/site-packages/sqlalchemy/engine/result.py:747(_fetchall_impl)
> 1   0.028   0.028   0.089   0.089{method 'fetchall'of
> 'psycopg2._psycopg.cursor'objects}
>  8190   0.043   0.000   0.085 
>  
> 0.000/usr/lib64/python2.6/site-packages/sqlalchemy/util/_collections.py:56(__new__)
> 24579   0.045   0.000   0.063 
>  
> 0.000/usr/lib64/python2.6/site-packages/sqlalchemy/sql/annotation.py:85(__hash__)
> 16380   0.033   0.000   0.061 
>  0.000/usr/lib64/python2.6/encodings/utf_8.py:15(decode)
> 1   0.000   0.000   0.032 
>  
> 0.032/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py:2399(__iter__)
> 1   0.000   0.000   0.031 
>  
> 0.031/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py:2413(_execute_and_instances)
> 1   0.000   0.000   0.031 
>  
> 0.031/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py:652(execute)
> 1   0.000   0.000   0.031 
>  
> 0.031/usr/lib64/python2.6/site-packages/sqlalchemy/sql/elements.py:316(_execute_on_connection)
> 1   0.000   0.000   0.031 
>  
> 0.031/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py:781(_execute_clauseelement)
> 1   0.000   0.000   0.030 
>  
> 0.030/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py:868(_execute_context)
> 1   0.000   0.000   0.030 
>  
> 0.030/usr/lib64/python2.6/site-packages/sqlalchemy/engine/default.py:434(do_execute)
> 1   0.030   0.030   0.030   0.030{method 'execute'of
> 'psycopg2._psycopg.cursor'objects}
> 16380   0.028   0.000   0.028   0.000{_codecs.utf_8_decode}
> 24579   0.018   0.000   0.018   0.000{hash}
>  8195   0.016   0.000   0.016   0.000{method 'update'of 'dict'objects}
>  8191   0.015   0.000   0.015   0.000{zip}
> |
> 
> 
> 
> The same query executed directly on the engine:
> 
> |
withprofiled():
> ...  r =engine.execute(str(q)).fetchall()
> ...
> 2015-10-2120:29:32,336INFO sqlalchemy.engine.base.EngineSELECT
> volumes.guid AS volumes_guid,mom_objects.name AS
> mom_objects_name,mom_objects.index AS mom_objects_index
> FROM mom_objects JOIN volumes ON mom_objects.guid =volumes.guid
> 2015-10-2120:29:32,337INFO sqlalchemy.engine.base.Engine{}
> 
>  33314functioncalls (33303primitive calls)in0.116CPU seconds
> 
>Orderedby:cumulative time
> 
>ncalls  tottime  percall  cumtime  percall filename:lineno(function)
> 1   0.000   0.000   0.092 
>  
> 0.092/usr/lib64/python2.6/site-packages/sqlalchemy/engine/result.py:778(fetchall)
> 1   0.000   0.000   0.087 
>  
> 0.087/usr/lib64/python2.6/site-packages/sqlalchemy/engine/result.py:747(_fetchall_impl)
> 1   0.028   0.028   0.087   0.087{method 'fetchall'of
> 'psycopg2._psycopg.cursor'objects}
> 16380   0.032   0.000   0.059 
>  0.000/usr/lib64/python2.6/encodings/utf_8.py:15(decode)
> 16380   0.027   0.000   0.027   0.000{_codecs.utf_8_decode}
> 1   0.000   0.000   0.023 
>  
> 0.023/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py:1665(execute)
> 

Re: [sqlalchemy] Performance questions on semi-large result sets

2015-10-21 Thread Mike Bayer


On 10/21/15 7:27 PM, Uri Okrent wrote:
> Interesting...
> 
> On Wednesday, October 21, 2015 at 5:43:22 PM UTC-4, Michael Bayer wrote:
> 
> class Customer(Base):
> __tablename__ = "customer"
> id = Column(Integer, primary_key=True)
> name = Column(Unicode(255))
> description = Column(Unicode(255))
> 
> 
> My declarative classes use Text() for all string columns.  This is
> because I *know* my backend is postgres and that's sort of what they
> recommend (the "tip":
> http://www.postgresql.org/docs/9.3/static/datatype-character.html).
>  However, I neglected to consider the consequences of that decision on
> the ORM.

use UnicodeText() then.  on PG, VARCHAR /CHAR / TEXT are all identical,
it doesn't matter.   Also the only difference between Unicode and
UnicodeText on that platform is what DDL is emitted in CREATE TABLE.  At
the level of DML and queries, the data is all the same across these types.


> 
> It sounds like your recommendation is to disable unicode decoding at the
> engine level with native_unicode=False, and instead explicitly call out
> only those columns that contain unicode for sqlalchemy to handle the
> decoding of only those columns, using a mixture of (in the postgres
> case) Text() and UnicodeText() columns.

If you're really looking to save 15 ms per 100K rows, it seems to have
that effect for now.


> 
> Although, reading the docs, I got the feeling that you were discouraging
> people from using sqlalchemy's built-in decoding facilities in favor of
> the native facilities provided by the dbapi.

Well, originally we did it all ourselves when we didn't have C
extensions, and DBAPIs barely did it.  Then the DBAPIs started supplying
it natively, and especially with the coming of Python 3, they all had
to; compared to SQLAlchemy doing it all in pure Python, there was no
contest.  But then our C extensions came along and sped things up, and
then we started doing things like caching the codec object which is
probably what the DBAPIs aren't doing yet and gained even more speed, so
it seems like we've surpassed the DBAPIs in just this one area, which is
ridiculous because the pure C DBAPIs are always so much faster in every
way, it's quite annoying that this weird incongruity seems to be present.

Under Python 3 we generally don't have an option here, the DBAPIs now
all do unicode encoding automatically.  So the architectures have been
pushed to assume that's the default, but in the case of cx_Oracle and
now apparently psycopg2 we're seeing that in Py2X their unicode
facilities still seem to perform worse than those of SQLAlchemy's.
There's not a clear answer.  I'd prefer if the DBAPIs that are written
in pure C anyway like psycopg2 could just allow their performance to be
faster here, I'd maybe report it to them.



> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+unsubscr...@googlegroups.com
> .
> To post to this group, send email to sqlalchemy@googlegroups.com
> .
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Performance questions on semi-large result sets

2015-10-21 Thread Uri Okrent
Interesting...

On Wednesday, October 21, 2015 at 5:43:22 PM UTC-4, Michael Bayer wrote:
>
> class Customer(Base): 
> __tablename__ = "customer" 
> id = Column(Integer, primary_key=True) 
> name = Column(Unicode(255)) 
> description = Column(Unicode(255)) 
>

My declarative classes use Text() for all string columns.  This is because 
I *know* my backend is postgres and that's sort of what they recommend (the 
"tip": http://www.postgresql.org/docs/9.3/static/datatype-character.html). 
 However, I neglected to consider the consequences of that decision on the 
ORM.

It sounds like your recommendation is to disable unicode decoding at the 
engine level with native_unicode=False, and instead explicitly call out 
only those columns that contain unicode for sqlalchemy to handle the 
decoding of only those columns, using a mixture of (in the postgres case) 
Text() and UnicodeText() columns.

Although, reading the docs, I got the feeling that you were discouraging 
people from using sqlalchemy's built-in decoding facilities in favor of the 
native facilities provided by the dbapi.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.