subject:"Re\: \[sqlalchemy\] Caching intermediate results in baked queries"

Re: [sqlalchemy] Caching intermediate results in baked queries

2018-11-16 Thread Martijn van Oosterhout

On Thu, 15 Nov 2018 at 16:41, Mike Bayer  wrote:

> >
> > But then you lose the benefits of the cache, and the Bakery is only
> useful if you use the same Querys over and over. Unless I've completely
> misunderstood how it works.
>
> well the "anonymous" bindparam() still has a name.   you just use that
> name, and it gets converted to the "real" name when the statement is
> executed:
>
> >>> from sqlalchemy import *
> >>> b1 = bindparam(None)
> >>> b1.key
> '%(140062366895800 param)s'
> >>> s1 = select([b1])
> >>> print(s1)
> SELECT :param_1 AS anon_1
> >>> s1.compile().construct_params({'%(140062366895800 param)s': 5})
> {'param_1': 5}
>
>
>
Ah, understood. That's a pretty neat trick actually, might see if we can
incorporate that somewhere.

Thanks.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Caching intermediate results in baked queries

2018-11-15 Thread Mike Bayer

On Thu, Nov 15, 2018 at 10:17 AM Martijn van Oosterhout
 wrote:
>
>
>
> On Thu, 15 Nov 2018 at 14:55, Mike Bayer  wrote:
>>
>> > And have all the query compilation cached. All the query stuff can be put 
>> > near the model so your code isn't covered with fragments of SQLAlchemy. I 
>> > have no idea if this pattern is common, but it works well for us.
>>
>> filter_by_id and filter_by_name seem to be just occurrences of
>> "filter_by" with a single hardcoded keyword.   I'm not sure what the
>> bindparam() part is doing either because a Core binary expression "col
>> = 'name'" already genertes the bindparam for you.This helper does
>> not seem to be specific to "baked" query, as far as the calling API at
>> least ?
>>
>
> What it acheives is that:
>
> FooQuery(session).filter_by_id(1).all()
>
> and
>
> FooQuery(session).filter_by_id(2).all()
>
> Both compile to the same (cached) Query and thus the same query is sent to 
> the database server. Since the parameters are rarely all the same the 
> anonymous bind is not useful here. In fact, if you don't use a bindparam() 
> here it will give wrong results because the Bakery keys off the f.func_code 
> and that doesn't take into account the dependancy. You can deal with this by 
> using:
>
> self += ((lambda: q: Foo.id == id_), id)
>
> But then you lose the benefits of the cache, and the Bakery is only useful if 
> you use the same Querys over and over. Unless I've completely misunderstood 
> how it works.

well the "anonymous" bindparam() still has a name.   you just use that
name, and it gets converted to the "real" name when the statement is
executed:

>>> from sqlalchemy import *
>>> b1 = bindparam(None)
>>> b1.key
'%(140062366895800 param)s'
>>> s1 = select([b1])
>>> print(s1)
SELECT :param_1 AS anon_1
>>> s1.compile().construct_params({'%(140062366895800 param)s': 5})
{'param_1': 5}






>
> So this gives us all the benefits of baking while not being tied to constant 
> parameters. The API for the user does resemble what you do with normally 
> subclassing Query, which is nice because it means the users are getting baked 
> queries, without actually knowing they're getting them. The saved parameters 
> are automatically bound to the Result object on execution.
>
> Hope this is clearer.
>
> Have a nice day,
> --
> Martijn van Oosterhout  http://svana.org/kleptog/
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> ---
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Caching intermediate results in baked queries

2018-11-15 Thread Martijn van Oosterhout

On Thu, 15 Nov 2018 at 14:55, Mike Bayer  wrote:

> > And have all the query compilation cached. All the query stuff can be
> put near the model so your code isn't covered with fragments of SQLAlchemy.
> I have no idea if this pattern is common, but it works well for us.
>
> filter_by_id and filter_by_name seem to be just occurrences of
> "filter_by" with a single hardcoded keyword.   I'm not sure what the
> bindparam() part is doing either because a Core binary expression "col
> = 'name'" already genertes the bindparam for you.This helper does
> not seem to be specific to "baked" query, as far as the calling API at
> least ?
>
>
What it acheives is that:

FooQuery(session).filter_by_id(1).all()

and

FooQuery(session).filter_by_id(2).all()

Both compile to the same (cached) Query and thus the same query is sent to
the database server. Since the parameters are rarely all the same the
anonymous bind is not useful here. In fact, if you don't use a bindparam()
here it will give wrong results because the Bakery keys off the f.func_code
and that doesn't take into account the dependancy. You can deal with this
by using:

self += ((lambda: q: Foo.id == id_), id)

But then you lose the benefits of the cache, and the Bakery is only useful
if you use the same Querys over and over. Unless I've completely
misunderstood how it works.

So this gives us all the benefits of baking while not being tied to
constant parameters. The API for the user does resemble what you do with
normally subclassing Query, which is nice because it means the users are
getting baked queries, without actually knowing they're getting them. The
saved parameters are automatically bound to the Result object on execution.

Hope this is clearer.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Caching intermediate results in baked queries

2018-11-15 Thread Mike Bayer

On Thu, Nov 15, 2018 at 12:27 AM Martijn van Oosterhout
 wrote:
>
> Hoi Mike,
>
> On Wednesday, November 14, 2018 at 11:30:18 PM UTC+1, Mike Bayer wrote:
>>
>> I have some big ideas for the baked concept which may or may not
>> impact this area as well.   But they are only ideas for now, I have
>> many explicit SQLAlchemy tasks that I have to get done before I can
>> try to explore that stuff.
>>
>
> Interesting, do you have anything written down?
>
> The baked query concept is a nice starting point but has some shortcomings, 
> in particular with respect to parameters. So we created a WrappedBakedQuery 
> to deal with this. To give an idea how this works:
>
> class FooQuery(WrappedBakedQuery):
> def initial_query(self, session):
> return session.query(Foo)
>
> def filter_by_id(self, id_):
> self += lambda q: q.filter(Foo.id==bindparam('foo_id'))
> self |= {'foo_id': id_}
> # Details omitted, we actually return a copy here
> return self
>
> def filter_by_name(self, name):
> self += lambda q: q.filter(Foo.name==bindparam('foo_name'))
> self |= {'foo_name': name}
> return self
>
>
> This allows us to write code like:
>
> query = FooQuery(session)
>
> if params['id'] is not None:
> query = query.filter_by_id(params['id'])
>
> if params['name'] is not None:
> query = query.filter_by_name(params['name'])
>
> query = query.join_to_bar()
>
> result = query.all()
>
> And have all the query compilation cached. All the query stuff can be put 
> near the model so your code isn't covered with fragments of SQLAlchemy. I 
> have no idea if this pattern is common, but it works well for us.

filter_by_id and filter_by_name seem to be just occurrences of
"filter_by" with a single hardcoded keyword.   I'm not sure what the
bindparam() part is doing either because a Core binary expression "col
= 'name'" already genertes the bindparam for you.This helper does
not seem to be specific to "baked" query, as far as the calling API at
least ?

>
> The naming of the bind parameters is a bit clunky though, haven't thought of 
> a good way around that. And you end up with lots of steps which may be a 
> performance issue (which motivated by original question) but we're going to 
> measure that before claiming it's an actual problem. The Bakery can cache 
> actual Query objects, that's what spoil() does, so it seems like a small step.

bindparam() has an anonymous naming feature that is usually what's
used, SQLAlchemy Core does this all for you when you pass a literal
value in.

>
> And exception handling is a bit annoying, with all errors appearing at the 
> point the query is executed, rather than where the fragment is added. OTOH, 
> it doesn't seem smart to try and to magic with tracebacks for something that 
> shouldn't really happen in production code.

that's going to be more specific to the "baked" part and that issue is
unfortunately not avoidable, if we're trying to defer the execution of
all those methods into a single pre-cache step.

>
> I'm not sure what the state of the art is w.r.t. baked queries, but I hope 
> this gives an idea of how we use it.
>
> Have a nice day,
> Martijn
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> ---
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Caching intermediate results in baked queries

2018-11-14 Thread Martijn van Oosterhout

Hoi Mike,

On Wednesday, November 14, 2018 at 11:30:18 PM UTC+1, Mike Bayer wrote:
>
> I have some big ideas for the baked concept which may or may not 
> impact this area as well.   But they are only ideas for now, I have 
> many explicit SQLAlchemy tasks that I have to get done before I can 
> try to explore that stuff. 
>
>
Interesting, do you have anything written down? 

The baked query concept is a nice starting point but has some shortcomings, 
in particular with respect to parameters. So we created a WrappedBakedQuery 
to deal with this. To give an idea how this works:

class FooQuery(WrappedBakedQuery):
def initial_query(self, session):
return session.query(Foo)

def filter_by_id(self, id_):
self += lambda q: q.filter(Foo.id==bindparam('foo_id'))
self |= {'foo_id': id_}
# Details omitted, we actually return a copy here
return self

def filter_by_name(self, name):
self += lambda q: q.filter(Foo.name==bindparam('foo_name'))
self |= {'foo_name': name}
return self

This allows us to write code like:

query = FooQuery(session)

if params['id'] is not None:
query = query.filter_by_id(params['id'])

if params['name'] is not None:
query = query.filter_by_name(params['name'])

query = query.join_to_bar()

result = query.all()

And have all the query compilation cached. All the query stuff can be put 
near the model so your code isn't covered with fragments of SQLAlchemy. I 
have no idea if this pattern is common, but it works well for us.

The naming of the bind parameters is a bit clunky though, haven't thought 
of a good way around that. And you end up with lots of steps which may be a 
performance issue (which motivated by original question) but we're going to 
measure that before claiming it's an actual problem. The Bakery can cache 
actual Query objects, that's what spoil() does, so it seems like a small 
step.

And exception handling is a bit annoying, with all errors appearing at the 
point the query is executed, rather than where the fragment is added. OTOH, 
it doesn't seem smart to try and to magic with tracebacks for something 
that shouldn't really happen in production code.

I'm not sure what the state of the art is w.r.t. baked queries, but I hope 
this gives an idea of how we use it.

Have a nice day,
Martijn

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Caching intermediate results in baked queries

2018-11-14 Thread Mike Bayer

I have some big ideas for the baked concept which may or may not
impact this area as well.   But they are only ideas for now, I have
many explicit SQLAlchemy tasks that I have to get done before I can
try to explore that stuff.


On Wed, Nov 14, 2018 at 4:34 PM Martijn van Oosterhout
 wrote:
>
> Hoi Mike,
>
> You're right, we should probably measure it first. You state that the simple 
> joins and filters are probably cheap which I don't doubt. We also do 
> correlated subqueries and stuff. Actually, we also use the baking mechanism 
> to cache some of our own processing which goes into the construction of the 
> query.
>
> However, if the QueryContext is the important part then it may not matter. 
> We'll time it and see what happens. My question was mostly whether the "spoil 
> but not spoil" approach might be a quickwin.
>
> Have a nice day,
> Martijn
>
> On Wed, 14 Nov 2018 at 14:54, Mike Bayer  wrote:
>>
>>
>> The two things that the BakedQuery caches are the QueryContext, which
>> is the thing that Query creates when everything it needs is ready to
>> go and is by far the most expensive part of the Query, and then
>> indirectly it caches the string SQL which also cannot be generated
>> until everything is ready to go.   That is, there's not much to cache
>> before you're ready to generate the SQL.If you're looking to just
>> cache the overhead of calling the join() or filter() method, that is
>> something, but this would need some new mechanism of having that be
>> cached as well within the bakery, e.g.  a Query by itself that's the
>> result of some series of steps.   It might be something you'd want
>> to play with if you really want to hack on the internals of the
>> BakedQuery but I'd recommend doing some code profiling up front to see
>> potentially how much savings you'd really get, because I'm not sure
>> it's that much.   If you have a limited number of Ns, then it's
>> probably not worth it.
>>
>>
>>
> --
> Martijn van Oosterhout  http://svana.org/kleptog/
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> ---
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Caching intermediate results in baked queries

2018-11-14 Thread Martijn van Oosterhout

Hoi Mike,

You're right, we should probably measure it first. You state that the
simple joins and filters are probably cheap which I don't doubt. We also do
correlated subqueries and stuff. Actually, we also use the baking mechanism
to cache some of our own processing which goes into the construction of the
query.

However, if the QueryContext is the important part then it may not matter.
We'll time it and see what happens. My question was mostly whether the
"spoil but not spoil" approach might be a quickwin.

Have a nice day,
Martijn

On Wed, 14 Nov 2018 at 14:54, Mike Bayer  wrote:

>
> The two things that the BakedQuery caches are the QueryContext, which
> is the thing that Query creates when everything it needs is ready to
> go and is by far the most expensive part of the Query, and then
> indirectly it caches the string SQL which also cannot be generated
> until everything is ready to go.   That is, there's not much to cache
> before you're ready to generate the SQL.If you're looking to just
> cache the overhead of calling the join() or filter() method, that is
> something, but this would need some new mechanism of having that be
> cached as well within the bakery, e.g.  a Query by itself that's the
> result of some series of steps.   It might be something you'd want
> to play with if you really want to hack on the internals of the
> BakedQuery but I'd recommend doing some code profiling up front to see
> potentially how much savings you'd really get, because I'm not sure
> it's that much.   If you have a limited number of Ns, then it's
> probably not worth it.
>
>
>
> --
Martijn van Oosterhout  http://svana.org/kleptog/

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Caching intermediate results in baked queries

2018-11-14 Thread Mike Bayer

On Tue, Nov 13, 2018 at 3:49 PM Martijn van Oosterhout
 wrote:
>
> Hoi,
>
> We're starting to do some pretty complicated things with the SQLAlchemy 
> bakery (because we have some pretty complicated queries) and I have something 
> I'd like to verify.
>
> We have queries that are constructed like so (each letter is a step):
>
> A B C .. M N0
> A B C .. M N1
> A B C .. M N2
> A B C .. M N3
>
> So we have lots of steps and only at the end does something change. Not all 
> the time, but much more often than the steps A to M. The way the bakery is 
> set up only the last step is actually cached, meaning that steps A to M get 
> done each time the last step changes.
>
> However, suppose we could, after step M, do a kind of checkpoint so that at 
> least that part is done only once. AFAICT it would work like the "spoil" 
> method, except leave spoil as False. So something like:
>
> class BakedQuery(object):
> def checkpoint(self):
> _check_point = self._clone()
> _check_point._cache_key += ('_query_only', )
> self.steps = [_check_point._retrieve_baked_query]
> return self
>
>
> It's not a great name I agree, but the alternatives I can think of are mark, 
> save, stuff like that.
>
> Anyway, AFAICT this should just work. The assumption is that the query 
> construction takes significant time, but I think we are sure of that. When an 
> N4 comes, after executing the checkpoint after M, the last step simply 
> extracts the cached intermediate step, applies N4 and we're done.

The two things that the BakedQuery caches are the QueryContext, which
is the thing that Query creates when everything it needs is ready to
go and is by far the most expensive part of the Query, and then
indirectly it caches the string SQL which also cannot be generated
until everything is ready to go.   That is, there's not much to cache
before you're ready to generate the SQL.If you're looking to just
cache the overhead of calling the join() or filter() method, that is
something, but this would need some new mechanism of having that be
cached as well within the bakery, e.g.  a Query by itself that's the
result of some series of steps.   It might be something you'd want
to play with if you really want to hack on the internals of the
BakedQuery but I'd recommend doing some code profiling up front to see
potentially how much savings you'd really get, because I'm not sure
it's that much.   If you have a limited number of Ns, then it's
probably not worth it.



>
> Am I missing something? Is this something that could be considered an 
> submittable improvement?
>
> Thanks in advance,
> Martijn
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> ---
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Caching intermediate results in baked queries

Re: [sqlalchemy] Caching intermediate results in baked queries

Re: [sqlalchemy] Caching intermediate results in baked queries

Re: [sqlalchemy] Caching intermediate results in baked queries

Re: [sqlalchemy] Caching intermediate results in baked queries

Re: [sqlalchemy] Caching intermediate results in baked queries

Re: [sqlalchemy] Caching intermediate results in baked queries

Re: [sqlalchemy] Caching intermediate results in baked queries

8 matches

Site Navigation

Mail list logo

Footer information