Re: [sqlalchemy] Baked queries vs before_execute hook

2017-10-12 Thread Martijn van Oosterhout
On Thursday, October 12, 2017 at 5:47:53 PM UTC+2, Mike Bayer wrote:
>
>
> It sounds like you are getting back inconsistent SQL for the same 
> query based on some external context that is not being considered as 
> part of the cache key.  This would indicate that you are probably 
> modifying the select() object *in place* inside your before_execute 
> hook.If your before_execute() hook returns a *new* select() 
> object, it would not pollute the cache with your late-modified value 
> against the cache keys. 
>
> That is, it's the difference between calling 
> select.append_whereclause() and select.where().The 
> before_execute() hook would need to be set up with retval=True and 
> return the new statement and parameters. 
>
>
Bingo! Looking at the code it has append_from() and append_whereclause() 
calls, so it's probably modifying in place. Sigh. That probably means this 
is going to break the caching in even more spectacular ways which we 
haven't yet spotted. The action of the hook is indeed dependant on 
something that is not part of the query, namely the "perms" field which 
only exists in our own CheckedQuery class.

The concept of the hook is pretty simple. It looks through the query for 
which tables it uses and if it finds a table marked as "special" it adds a 
filter and possibly some joins. I'm fairly sure this could be done safely 
using the Visitor pattern, in practice it's one big ball of spaghetti 
no-one wants to touch. Essentially it looks for a table and replaces it 
with a subquery, but it works by looping/recursing through the fields of 
the query itself. Ugh.

I think we're going to have to drop the idea of the hook in the long term, 
and at least short-circuit it for baked-queries, putting the query 
rewriting in our wrapper, then it can be cached like the rest. Especially 
since in 1.2 lazy loading is going to trigger this (though probably disable 
lazy loading in most places).

Caching the query rewriting isn't a bad plan either. But it looks like our 
query rewriting is more of a liability than I thought.

Thanks for the help!

Have a nice day,
-- 
Martijn

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Baked queries vs before_execute hook

2017-10-12 Thread Mike Bayer
On Thu, Oct 12, 2017 at 10:54 AM, Martijn van Oosterhout
 wrote:
> Hi,
>
> Recently we've been looking into the baked query feature as a method of
> speeding up query compilation. We also use a before_execute hook to modify
> the query before execution to handle permission related stuff. One thing
> that turned up was that when using a baked query that it cached the final
> string version.
>
> What this means is that the baked query framework caches the results of the
> before_execute hook meaning that queries occasionally produce the wrong
> output in situations where the before_execute hook would do something
> different. I'm not clear if this is a bug or a "you break it you get to keep
> both pieces".
>
> We worked around this (yes, before_execute hooks are evil) but this became
> more urgent when an old product accidentally got SQLAlchemy 1.2.0b where
> baked queries are used for lazy loading, which caused all sorts of funky
> errors. Whoops!

So there are two hooks where the pre-compiled SQL can be modified such
that it will still get cached, there's the Query level
"before_compile" hook, and the Engine-level "before_execute" hook.
Both of these operate before the SQL string is generated, which
ultimately is cached based on the identity of the Core select() object
itself.

It sounds like you are getting back inconsistent SQL for the same
query based on some external context that is not being considered as
part of the cache key.  This would indicate that you are probably
modifying the select() object *in place* inside your before_execute
hook.If your before_execute() hook returns a *new* select()
object, it would not pollute the cache with your late-modified value
against the cache keys.

That is, it's the difference between calling
select.append_whereclause() and select.where().The
before_execute() hook would need to be set up with retval=True and
return the new statement and parameters.

This would of course defeat part of the caching, unless you could
organize your before_execute() hook such that the *same* select()
object is returned each time given the same input select().  That is,
you might want to build your own local "cache of our modified
select()" objects so that the caching of generated SQL still takes
place, if that makes sense.If not, provide a short runnable
example of how your before_execute() hook works and I can attempt to
demonstrate.





>
> I'm wondering if there is a way of at least detecting this? Such that if a
> before_execute hook changes a query that the result is automatically not
> cached. That would at least prevent things from breaking unexpectedly. But
> long term, caching the compilation is really nice and so we'd like to be
> able to keep that feature. Our hook is predictable such that with the same
> input query and a parameters which is stored in the Query object you always
> get the same result. So it would in theory be possible to work with the
> baked query framework, but I'm totally not clear how that would work.
>
> Any ideas?
>
> As an aside, we worked around a few things by creating a WrappedBakedQuery
> class, which allowed us to do thing like:
>
> baked_query += lambda q: q.filter(Table.col == bind_param('foo'))
> baked_query.set_param('foo', 1)
>
> Which worked better in our setup.
>
> Have a nice day,
> --
> Maritjn
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] Baked queries vs before_execute hook

2017-10-12 Thread Martijn van Oosterhout
Hi,

Recently we've been looking into the baked query feature as a method of 
speeding up query compilation. We also use a before_execute hook to modify 
the query before execution to handle permission related stuff. One thing 
that turned up was that when using a baked query that it cached the final 
string version.

What this means is that the baked query framework caches the results of the 
before_execute hook meaning that queries occasionally produce the wrong 
output in situations where the before_execute hook would do something 
different. I'm not clear if this is a bug or a "you break it you get to 
keep both pieces".

We worked around this (yes, before_execute hooks are evil) but this became 
more urgent when an old product accidentally got SQLAlchemy 1.2.0b where 
baked queries are used for lazy loading, which caused all sorts of funky 
errors. Whoops!

I'm wondering if there is a way of at least detecting this? Such that if a 
before_execute hook changes a query that the result is automatically not 
cached. That would at least prevent things from breaking unexpectedly. But 
long term, caching the compilation is really nice and so we'd like to be 
able to keep that feature. Our hook is predictable such that with the same 
input query and a parameters which is stored in the Query object you always 
get the same result. So it would in theory be possible to work with the 
baked query framework, but I'm totally not clear how that would work.

Any ideas?

As an aside, we worked around a few things by creating a WrappedBakedQuery 
class, which allowed us to do thing like:

baked_query += lambda q: q.filter(Table.col == bind_param('foo'))
baked_query.set_param('foo', 1)

Which worked better in our setup.

Have a nice day,
-- 
Maritjn

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] OpenStack Glance upgrade script breaks with SQLAlchemy 1.1.11 but not 1.0.12

2017-10-12 Thread Byron Yi
Thanks Mike. Glad to know it’s been fixed.

Best,
Bairen

> On 12 Oct 2017, at 21:54, Mike Bayer  wrote:
> 
> Your migrate is out of date.  this was fixed one year ago:
> 
> https://github.com/openstack/sqlalchemy-migrate/commit/e9175a37ce0b0b0e87ad728c8a6a10bed100065b
> 
> 
> 
> On Thu, Oct 12, 2017 at 9:51 AM, Mike Bayer  wrote:
>> On Thu, Oct 12, 2017 at 8:32 AM, Byron Yi  wrote:
>>> It appears that autoincrement=True is set by default for primary key when
>>> upgrading from 1.0 to 1.1, no matter if it is integer (which is not in
>>> glance; they use VARCHAR(36)).
>>> 
>>> See https://bugs.launchpad.net/glance/+bug/1723097
>> 
>> It defaults to True in 1.0, the change in 1.1 is that it defaults to "auto":
>> 
>> http://docs.sqlalchemy.org/en/rel_1_1/changelog/migration_11.html#no-more-generation-of-an-implicit-key-for-composite-primary-key-w-auto-increment
>> 
>> if that migrate operation is altering a column from Integer to String,
>> then it's likely this is confusing things.   So this is a
>> sqlalchemy-migrate bug.
>> 
>> 
>> 
>> 
>> 
>>> 
>>> --
>>> SQLAlchemy -
>>> The Python SQL Toolkit and Object Relational Mapper
>>> 
>>> http://www.sqlalchemy.org/
>>> 
>>> To post example code, please provide an MCVE: Minimal, Complete, and
>>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
>>> description.
>>> ---
>>> You received this message because you are subscribed to the Google Groups
>>> "sqlalchemy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to sqlalchemy+unsubscr...@googlegroups.com.
>>> To post to this group, send email to sqlalchemy@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/sqlalchemy.
>>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> SQLAlchemy - 
> The Python SQL Toolkit and Object Relational Mapper
> 
> http://www.sqlalchemy.org/
> 
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example.  See  http://stackoverflow.com/help/mcve for a full 
> description.
> --- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "sqlalchemy" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/sqlalchemy/xoBcc6UnpWo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] OpenStack Glance upgrade script breaks with SQLAlchemy 1.1.11 but not 1.0.12

2017-10-12 Thread Mike Bayer
Your migrate is out of date.  this was fixed one year ago:

https://github.com/openstack/sqlalchemy-migrate/commit/e9175a37ce0b0b0e87ad728c8a6a10bed100065b



On Thu, Oct 12, 2017 at 9:51 AM, Mike Bayer  wrote:
> On Thu, Oct 12, 2017 at 8:32 AM, Byron Yi  wrote:
>> It appears that autoincrement=True is set by default for primary key when
>> upgrading from 1.0 to 1.1, no matter if it is integer (which is not in
>> glance; they use VARCHAR(36)).
>>
>> See https://bugs.launchpad.net/glance/+bug/1723097
>
> It defaults to True in 1.0, the change in 1.1 is that it defaults to "auto":
>
> http://docs.sqlalchemy.org/en/rel_1_1/changelog/migration_11.html#no-more-generation-of-an-implicit-key-for-composite-primary-key-w-auto-increment
>
> if that migrate operation is altering a column from Integer to String,
> then it's likely this is confusing things.   So this is a
> sqlalchemy-migrate bug.
>
>
>
>
>
>>
>> --
>> SQLAlchemy -
>> The Python SQL Toolkit and Object Relational Mapper
>>
>> http://www.sqlalchemy.org/
>>
>> To post example code, please provide an MCVE: Minimal, Complete, and
>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
>> description.
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "sqlalchemy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to sqlalchemy+unsubscr...@googlegroups.com.
>> To post to this group, send email to sqlalchemy@googlegroups.com.
>> Visit this group at https://groups.google.com/group/sqlalchemy.
>> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] OpenStack Glance upgrade script breaks with SQLAlchemy 1.1.11 but not 1.0.12

2017-10-12 Thread Mike Bayer
On Thu, Oct 12, 2017 at 8:32 AM, Byron Yi  wrote:
> It appears that autoincrement=True is set by default for primary key when
> upgrading from 1.0 to 1.1, no matter if it is integer (which is not in
> glance; they use VARCHAR(36)).
>
> See https://bugs.launchpad.net/glance/+bug/1723097

It defaults to True in 1.0, the change in 1.1 is that it defaults to "auto":

http://docs.sqlalchemy.org/en/rel_1_1/changelog/migration_11.html#no-more-generation-of-an-implicit-key-for-composite-primary-key-w-auto-increment

if that migrate operation is altering a column from Integer to String,
then it's likely this is confusing things.   So this is a
sqlalchemy-migrate bug.





>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] How do Insert statements for single-table inheritance tables get batched together?

2017-10-12 Thread Mike Bayer
I'm actually happy with those results.  execute_batch is just a
drop-in, while execute_values is more complicated since we have to
intercept specific kinds of statements (INSERTS that have exactly one
VALUES clause invoked against multiple parameter sets), and still may
be more error prone.

a 7% slowdown for the 10 rows w/ ORM case (13.9 to 14.9 seconds)
is fairly negligible considering the speedup from 29 seconds.


On Thu, Oct 12, 2017 at 1:07 AM,   wrote:
> I ran some tests using raw psycopg2 on my local computer, and also ran some
> tests with SQLAlchemy. I misunderstood a few things in my tests above, but
> I've explained the tests below and think they are more accurate.
>
> Context:
> - Each test was run 5 times and averaged.
> - Both the python and database were running on the same computer (Macbook
> Pro (2015), using Postgres 9.5 and Docker)
>
> Raw psycopg2:
> # Rough code:
> extras.execute_values(cursor, 'INSERT INTO table_a (field1, field2) VALUES
> %s', [...])
> extras.execute_batch(cursor, 'INSERT INTO table_a (field1, field2) VALUES
> (%s, %s)', [...])
> cursor.executemany('INSERT INTO table_a (field1, field2) VALUES (%s, %s)',
> [...])
>
> # Inserting 1000 rows
> execute_values, elapsed time: 0.023967s
> execute_batch, elapsed time: 0.051530s
> executemany, elapsed time: 0.173563s
>
>
> # Inserting 1 rows
> execute_values, elapsed time: 0.268656s
> execute_batch, elapsed time: 0.488736s
> executemany, elapsed time: 2.017565s
>
>
> # Inserting 10 rows
> execute_values, elapsed time: 1.858675s
> execute_batch, elapsed time: 4.062823s
> executemany, elapsed time: 19.900875s
>
>
> SQLAlchemy layer:
> for field1, field2 in genome_infos(rows):
> db.session.add(TableA(field1, field2))
>
> # Time this part:
> db.session.flush()
>
> I used the code you provided above (instead of the flag you recently
> pushed), which allowed me to test both execute_batch and execute_values.
> Here was what that I used:
> @event.listens_for(Engine, "do_executemany")
> def do_executemany(cursor, statement, parameters, context):
> context.dialect.supports_sane_multi_rowcount = False
>
> # Option: executemany
> cursor.executemany(statement, parameters)
>
> # Option: execute_batch
> extras.execute_batch(cursor, statement, parameters)
>
> # Option: execute_batch
> statement = re.sub('VALUES.*', 'VALUES %s', statement)
> parameters = [(info['field1'], info['field2']) for info in parameters]
> extras.execute_values(cursor, statement, parameters)
>
> return True
>
> Obviously only one option was used at a time :)
>
> SQLAlchemy Results:
> Inserting 1000 rows:
> execute_values: 0.083958s
> execute_batch: 0.110223s
> executemany: 0.276129s
>
> Inserting 1 rows:
> execute_values: 1.243230s
> execute_batch: 1.388278s
> executemany: 3.131808s
>
> Inserting 10 rows:
> execute_values: 13.909975s
> execute_batch: 14.942507s
> executemany: 29.671092s
>
>
> Conclusions:
>
> execute_batch is a significant improvement over executemany (10x at the
> pyscopg2 layer)
> Subtracting the 11-12 seconds of SQLAlchemy/Python overhead for inserting
> 100,000 rows gives roughly the psycopg2 times for each execute_ option.
> The 5000% improvements stated in
> https://github.com/psycopg/psycopg2/issues/491#issuecomment-276551038 are
> probably exaggerated for most users - that was for "transatlantic network
> connections", whereas I imagine most users have databases much closer to
> their servers. However, it's still a 10x speed up over executemany at the
> pscyopg2 layer, and a ~2-3x speed up including SQLAlchemy overhead.
> execute_values is still twice as fast as execute_batch at the psycopg2 layer
> (for this specific table*), so incorporating that would be even better!
> execute_batch only helps with inserts to tables where all primary keys are
> defined (as you noted). Thus, if users want to see improvements for tables
> with auto-incrementing primary keys and relationships, they'll likely need
> to combine this with something like the suggestion in
> https://groups.google.com/forum/#!topic/sqlalchemy/GyAZTThJi2I
>
> Hopefully that was helpful :) Happy to help test in any other ways as well!
>
> On Wednesday, October 11, 2017 at 7:25:06 AM UTC-7, Mike Bayer wrote:
>>
>> On Wed, Oct 11, 2017 at 3:02 AM,   wrote:
>> > Hey Mike,
>> >
>> > Thanks again for the detailed explanations!
>> >
>> > I went ahead and tested the code snippet you gave me - I'm not sure I
>> > totally understand when this will change behavior though.
>> >
>> > I tried the following code snippet:
>> > (Snippet #1)
>> > for i in xrange(10):
>> > db.session.add(A(id=i))
>> > db.session.flush()
>> >
>> > This calls through to the "do_executemany" handler and only executes 1
>> > insert statement with multiple VALUES. However, this was already the
>> > existing behavior right?
>>
>> There's multiple forms of "one insert statement".
>>
>> There is traditional 

[sqlalchemy] OpenStack Glance upgrade script breaks with SQLAlchemy 1.1.11 but not 1.0.12

2017-10-12 Thread Byron Yi
It appears that autoincrement=True is set by default for primary key when 
upgrading from 1.0 to 1.1, no matter if it is integer (which is not in 
glance; they use VARCHAR(36)).

See https://bugs.launchpad.net/glance/+bug/1723097

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.