Re: [sqlalchemy] Caching intermediate results in baked queries

2018-11-16 Thread Martijn van Oosterhout
On Thu, 15 Nov 2018 at 16:41, Mike Bayer  wrote:

> >
> > But then you lose the benefits of the cache, and the Bakery is only
> useful if you use the same Querys over and over. Unless I've completely
> misunderstood how it works.
>
> well the "anonymous" bindparam() still has a name.   you just use that
> name, and it gets converted to the "real" name when the statement is
> executed:
>
> >>> from sqlalchemy import *
> >>> b1 = bindparam(None)
> >>> b1.key
> '%(140062366895800 param)s'
> >>> s1 = select([b1])
> >>> print(s1)
> SELECT :param_1 AS anon_1
> >>> s1.compile().construct_params({'%(140062366895800 param)s': 5})
> {'param_1': 5}
>
>
>
Ah, understood. That's a pretty neat trick actually, might see if we can
incorporate that somewhere.

Thanks.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Caching intermediate results in baked queries

2018-11-15 Thread Martijn van Oosterhout
On Thu, 15 Nov 2018 at 14:55, Mike Bayer  wrote:

> > And have all the query compilation cached. All the query stuff can be
> put near the model so your code isn't covered with fragments of SQLAlchemy.
> I have no idea if this pattern is common, but it works well for us.
>
> filter_by_id and filter_by_name seem to be just occurrences of
> "filter_by" with a single hardcoded keyword.   I'm not sure what the
> bindparam() part is doing either because a Core binary expression "col
> = 'name'" already genertes the bindparam for you.This helper does
> not seem to be specific to "baked" query, as far as the calling API at
> least ?
>
>
What it acheives is that:

FooQuery(session).filter_by_id(1).all()

and

FooQuery(session).filter_by_id(2).all()

Both compile to the same (cached) Query and thus the same query is sent to
the database server. Since the parameters are rarely all the same the
anonymous bind is not useful here. In fact, if you don't use a bindparam()
here it will give wrong results because the Bakery keys off the f.func_code
and that doesn't take into account the dependancy. You can deal with this
by using:

self += ((lambda: q: Foo.id == id_), id)

But then you lose the benefits of the cache, and the Bakery is only useful
if you use the same Querys over and over. Unless I've completely
misunderstood how it works.

So this gives us all the benefits of baking while not being tied to
constant parameters. The API for the user does resemble what you do with
normally subclassing Query, which is nice because it means the users are
getting baked queries, without actually knowing they're getting them. The
saved parameters are automatically bound to the Result object on execution.

Hope this is clearer.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Caching intermediate results in baked queries

2018-11-14 Thread Martijn van Oosterhout
Hoi Mike,

On Wednesday, November 14, 2018 at 11:30:18 PM UTC+1, Mike Bayer wrote:
>
> I have some big ideas for the baked concept which may or may not 
> impact this area as well.   But they are only ideas for now, I have 
> many explicit SQLAlchemy tasks that I have to get done before I can 
> try to explore that stuff. 
>
>
Interesting, do you have anything written down? 

The baked query concept is a nice starting point but has some shortcomings, 
in particular with respect to parameters. So we created a WrappedBakedQuery 
to deal with this. To give an idea how this works:

class FooQuery(WrappedBakedQuery):
def initial_query(self, session):
return session.query(Foo)

def filter_by_id(self, id_):
self += lambda q: q.filter(Foo.id==bindparam('foo_id'))
self |= {'foo_id': id_}
# Details omitted, we actually return a copy here
return self

def filter_by_name(self, name):
self += lambda q: q.filter(Foo.name==bindparam('foo_name'))
self |= {'foo_name': name}
return self


This allows us to write code like:

query = FooQuery(session)

if params['id'] is not None:
query = query.filter_by_id(params['id'])

if params['name'] is not None:
query = query.filter_by_name(params['name'])

query = query.join_to_bar()

result = query.all()

And have all the query compilation cached. All the query stuff can be put 
near the model so your code isn't covered with fragments of SQLAlchemy. I 
have no idea if this pattern is common, but it works well for us.

The naming of the bind parameters is a bit clunky though, haven't thought 
of a good way around that. And you end up with lots of steps which may be a 
performance issue (which motivated by original question) but we're going to 
measure that before claiming it's an actual problem. The Bakery can cache 
actual Query objects, that's what spoil() does, so it seems like a small 
step.

And exception handling is a bit annoying, with all errors appearing at the 
point the query is executed, rather than where the fragment is added. OTOH, 
it doesn't seem smart to try and to magic with tracebacks for something 
that shouldn't really happen in production code.

I'm not sure what the state of the art is w.r.t. baked queries, but I hope 
this gives an idea of how we use it.

Have a nice day,
Martijn

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Caching intermediate results in baked queries

2018-11-14 Thread Martijn van Oosterhout
Hoi Mike,

You're right, we should probably measure it first. You state that the
simple joins and filters are probably cheap which I don't doubt. We also do
correlated subqueries and stuff. Actually, we also use the baking mechanism
to cache some of our own processing which goes into the construction of the
query.

However, if the QueryContext is the important part then it may not matter.
We'll time it and see what happens. My question was mostly whether the
"spoil but not spoil" approach might be a quickwin.

Have a nice day,
Martijn

On Wed, 14 Nov 2018 at 14:54, Mike Bayer  wrote:

>
> The two things that the BakedQuery caches are the QueryContext, which
> is the thing that Query creates when everything it needs is ready to
> go and is by far the most expensive part of the Query, and then
> indirectly it caches the string SQL which also cannot be generated
> until everything is ready to go.   That is, there's not much to cache
> before you're ready to generate the SQL.If you're looking to just
> cache the overhead of calling the join() or filter() method, that is
> something, but this would need some new mechanism of having that be
> cached as well within the bakery, e.g.  a Query by itself that's the
> result of some series of steps.   It might be something you'd want
> to play with if you really want to hack on the internals of the
> BakedQuery but I'd recommend doing some code profiling up front to see
> potentially how much savings you'd really get, because I'm not sure
> it's that much.   If you have a limited number of Ns, then it's
> probably not worth it.
>
>
>
> --
Martijn van Oosterhout  http://svana.org/kleptog/

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] Caching intermediate results in baked queries

2018-11-13 Thread Martijn van Oosterhout
Hoi,

We're starting to do some pretty complicated things with the SQLAlchemy 
bakery (because we have some pretty complicated queries) and I have 
something I'd like to verify.

We have queries that are constructed like so (each letter is a step):

A B C .. M N0
A B C .. M N1
A B C .. M N2
A B C .. M N3

So we have lots of steps and only at the end does something change. Not all 
the time, but much more often than the steps A to M. The way the bakery is 
set up only the last step is actually cached, meaning that steps A to M get 
done each time the last step changes.

However, suppose we could, after step M, do a kind of checkpoint so that at 
least that part is done only once. AFAICT it would work like the "spoil" 
method, except leave spoil as False. So something like:

class BakedQuery(object):
def checkpoint(self):
_check_point = self._clone()
_check_point._cache_key += ('_query_only', )
self.steps = [_check_point._retrieve_baked_query]
return self


It's not a great name I agree, but the alternatives I can think of are 
mark, save, stuff like that.

Anyway, AFAICT this should just work. The assumption is that the query 
construction takes significant time, but I think we are sure of that. When 
an N4 comes, after executing the checkpoint after M, the last step simply 
extracts the cached intermediate step, applies N4 and we're done.

Am I missing something? Is this something that could be considered an 
submittable improvement?

Thanks in advance,
Martijn

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Baked queries vs before_execute hook

2017-10-12 Thread Martijn van Oosterhout
On Thursday, October 12, 2017 at 5:47:53 PM UTC+2, Mike Bayer wrote:
>
>
> It sounds like you are getting back inconsistent SQL for the same 
> query based on some external context that is not being considered as 
> part of the cache key.  This would indicate that you are probably 
> modifying the select() object *in place* inside your before_execute 
> hook.If your before_execute() hook returns a *new* select() 
> object, it would not pollute the cache with your late-modified value 
> against the cache keys. 
>
> That is, it's the difference between calling 
> select.append_whereclause() and select.where().The 
> before_execute() hook would need to be set up with retval=True and 
> return the new statement and parameters. 
>
>
Bingo! Looking at the code it has append_from() and append_whereclause() 
calls, so it's probably modifying in place. Sigh. That probably means this 
is going to break the caching in even more spectacular ways which we 
haven't yet spotted. The action of the hook is indeed dependant on 
something that is not part of the query, namely the "perms" field which 
only exists in our own CheckedQuery class.

The concept of the hook is pretty simple. It looks through the query for 
which tables it uses and if it finds a table marked as "special" it adds a 
filter and possibly some joins. I'm fairly sure this could be done safely 
using the Visitor pattern, in practice it's one big ball of spaghetti 
no-one wants to touch. Essentially it looks for a table and replaces it 
with a subquery, but it works by looping/recursing through the fields of 
the query itself. Ugh.

I think we're going to have to drop the idea of the hook in the long term, 
and at least short-circuit it for baked-queries, putting the query 
rewriting in our wrapper, then it can be cached like the rest. Especially 
since in 1.2 lazy loading is going to trigger this (though probably disable 
lazy loading in most places).

Caching the query rewriting isn't a bad plan either. But it looks like our 
query rewriting is more of a liability than I thought.

Thanks for the help!

Have a nice day,
-- 
Martijn

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] Baked queries vs before_execute hook

2017-10-12 Thread Martijn van Oosterhout
Hi,

Recently we've been looking into the baked query feature as a method of 
speeding up query compilation. We also use a before_execute hook to modify 
the query before execution to handle permission related stuff. One thing 
that turned up was that when using a baked query that it cached the final 
string version.

What this means is that the baked query framework caches the results of the 
before_execute hook meaning that queries occasionally produce the wrong 
output in situations where the before_execute hook would do something 
different. I'm not clear if this is a bug or a "you break it you get to 
keep both pieces".

We worked around this (yes, before_execute hooks are evil) but this became 
more urgent when an old product accidentally got SQLAlchemy 1.2.0b where 
baked queries are used for lazy loading, which caused all sorts of funky 
errors. Whoops!

I'm wondering if there is a way of at least detecting this? Such that if a 
before_execute hook changes a query that the result is automatically not 
cached. That would at least prevent things from breaking unexpectedly. But 
long term, caching the compilation is really nice and so we'd like to be 
able to keep that feature. Our hook is predictable such that with the same 
input query and a parameters which is stored in the Query object you always 
get the same result. So it would in theory be possible to work with the 
baked query framework, but I'm totally not clear how that would work.

Any ideas?

As an aside, we worked around a few things by creating a WrappedBakedQuery 
class, which allowed us to do thing like:

baked_query += lambda q: q.filter(Table.col == bind_param('foo'))
baked_query.set_param('foo', 1)

Which worked better in our setup.

Have a nice day,
-- 
Maritjn

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Combining yield_per and eager loading

2016-07-20 Thread Martijn van Oosterhout
Ok, so this is what I have for today. It works, and handles all kinds of 
corner cases and yet it's not quite what I want. It does everything as a 
joinedload. It's much easier to use now though.

You can do things like:

q = Session.query(Animal)

for animal in yielded_load(q, (joinedload(Animal.owner).joinedload(Human.
family),
   joinedload(Animal.species).joinedload(Species
.phylum)):
do_something(animal)

It says joinedload() but it doesn't actually pay attention to that, it just 
uses it to determine the path. It would be really nice to be able to 
specify that some things should be fetched using subqueryload(), but that 
would require unpacking/manipulating the Load objects and I don't think 
there's a supported interface for that. Additionally, it would be nice if 
could notice that paths share a common prefix and only fetch those once. 
Still, for the amount of code it's pretty effective.

from itertools import groupby, islice
from sqlalchemy.orm import attributes, Load, aliased
from sqlalchemy import tuple_


def yielded_load(query, load_options, N=1000):
# Note: query must return only a single object (for now anyway)
main_query = query.yield_per(N)

main_res = iter(main_query)

while True:
# Fetch block of results from query
objs = list(islice(main_res, N))

if not objs:
break

for load_option in load_options:
# Get path of attributes to follow
path = load_option.path
pk = path[0].prop.parent.primary_key

# Generate query that joins against original table
child_q = main_query.session.query().order_by(*pk)

for i, attr in enumerate(path):
if i == 0:
# For the first relationship we add the target and the 
pkey columns
# Note: add_columns() doesn't work here? 
with_entities() does
next_table = attr.prop.target
child_q = child_q.join(next_table, attr)
child_q = child_q.with_entities(attr.prop.mapper).
add_columns(*pk)
if attr.prop.order_by:
child_q = child_q.order_by(*attr.prop.order_by)
opts = Load(attr.prop.mapper)
else:
# For all relationships after the first we can use 
contains_eager
# Note: The aliasing is to handle cases where the 
relationships loop
next_table = aliased(attr.prop.target)
child_q = child_q.join(next_table, attr, isouter=True)
opts = opts.contains_eager(attr, alias=next_table)

child_q = child_q.options(opts)

keys = [[getattr(obj, col.key) for col in pk] for obj in objs]

child_q = child_q.filter(tuple_(*pk).in_(keys))

# Here we use the fact that the first column is the target 
object
collections = dict((k, [r[0] for r in v]) for k, v in groupby(
child_q,
lambda x: tuple([getattr(x, c.key) for c in pk])
))

for obj in objs:
# We can traverse many-to-one and one-to-many
if path[0].prop.uselist:
attributes.set_committed_value(
obj,
path[0].key,
collections.get(
tuple(getattr(obj, c.key) for c in pk),
())
)
else:
attributes.set_committed_value(
obj,
path[0].key,
collections.get(
tuple(getattr(obj, c.key) for c in pk),
[None])[0]
)

for obj in objs:
yield obj


-- 
Martijn

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Combining yield_per and eager loading

2016-07-20 Thread Martijn van Oosterhout
On 19 July 2016 at 23:22, Mike Bayer <mike...@zzzcomputing.com> wrote:

>
>
> On 07/19/2016 05:20 PM, Martijn van Oosterhout wrote:
>
>>
>>
>> Thanks. On the way home though I had a thought: wouldn't it be simpler
>> to run the original query with yield_from(), and then after each block
>> run the query with a filter on the primary keys returned, and add all
>> the joinedload/subqueryload/etc options to this query, run it and rely
>> on the identity map to fix it for the objects returned the first time.
>> Or is that something we cannot rely on?
>>
>
> it works for the loading you're doing, where the primary keys of what's
> been fetched are fed into the subsequent query.  But it doesnt work for
> current subquery loading which does not make use of those identifiers, nor
> for joined loading which does OUTER JOIN onto the original query at once
> (doing "joinedload" as a separate query is essentially what subquery
> loading already is).
>
>
Ah, good point. Pity. I like the whole generative interface for the
joinedload/subqueryload/etc and would have liked to reuse that machinery
somehow. Given I'm trying to eager load several levels of relationships,
it'd be nice to automate that somehow...

Have a nice day,
-- 
Martijn van Oosterhout <klep...@gmail.com> http://svana.org/kleptog/

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Combining yield_per and eager loading

2016-07-19 Thread Martijn van Oosterhout
On 19 July 2016 at 18:42, Mike Bayer <mike...@zzzcomputing.com> wrote:

>
>
> On 07/19/2016 12:33 PM, Martijn van Oosterhout wrote:
>
>> On Sunday, July 17, 2016 at 8:47:11 AM UTC+2, Martijn van Oosterhout
>> wrote:
>>
>> But in any case, even this improves performance greatly.
>>
>
> this is adequate for a new recipe if you are interested in adding it. Also
> I think sqlalchemy-utils provides a feature along these lines but I'm not
> sure how well it works or handles those harder cases like chained
> relationships.
>
>
Thanks. On the way home though I had a thought: wouldn't it be simpler to
run the original query with yield_from(), and then after each block run the
query with a filter on the primary keys returned, and add all the
joinedload/subqueryload/etc options to this query, run it and rely on the
identity map to fix it for the objects returned the first time. Or is that
something we cannot rely on?

Have a nice day,
-- 
Martijn van Oosterhout <klep...@gmail.com> http://svana.org/kleptog/

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Combining yield_per and eager loading

2016-07-19 Thread Martijn van Oosterhout
On Sunday, July 17, 2016 at 8:47:11 AM UTC+2, Martijn van Oosterhout wrote:
>
>
> I'll play a bit and see what I can get to work. Thanks again.
>
>
So, I have a chance to play and got something that actually works quite 
nicely, see below. Two things:

- I switched to referencing the primary key of the original object 
directly, because some of our relationships are a bit more complex.
- Chained relationships don't work.

But in any case, even this improves performance greatly.


from itertools import groupby, islice
from sqlalchemy.orm import attributes, object_session
from sqlalchemy import tuple_


def yielded_load(query, attrs, N=1000):
# Note: query must return only a single object (for now anyway)
main_query = query.yield_per(N)

main_res = iter(main_query)

while True:
# Fetch block of results from query
objs = list(islice(main_res, N))

if not objs:
break

for attr in attrs:
target = attr.prop.mapper
pk = attr.prop.parent.primary_key

# Generate query that joins against original table
child_q = object_session(objs[0]).query(target, *pk).order_by(*
pk)
if attr.prop.order_by:
child_q = child_q.order_by(*attr.prop.order_by)

keys = [[getattr(obj, col.key) for col in pk] for obj in objs]

child_q = child_q.join(attr).filter(tuple_(*pk).in_(keys))

collections = dict((k, [r[0] for r in v]) for k, v in groupby(
child_q, 
lambda x:tuple([getattr(x, c.key) for c in pk])
))

for obj in objs:
attributes.set_committed_value(
obj, 
attr.key, 
collections.get(
tuple(getattr(obj, c.key) for c in pk),
())
)

for obj in objs:
yield obj


 
-- 
Martijn

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Combining yield_per and eager loading

2016-07-17 Thread Martijn van Oosterhout
On 15 Jul 2016 23:57, "Mike Bayer"  wrote:
>
>
>
> OK, but the IN loading you're doing, that's good too, I'd like to add a
loader which uses that someday, and you're right, if we did, we could make
it work with yield_per too due to the nature of it.  If you want to use
that approach take a look at
https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/DisjointEagerLoading
.

Wow, that's another really nice example. Those usage examples really need
some more prominence in the google search results.

The windowed approach has the advantage of the queries being simpler, but
requires scanning the table twice (although the first might be cheaper).
Using the IN() approach is more general perhaps, but the queries would be
much longer so more traffic to the server. It would be nice if it could all
be made to Just Work.

I'll play a bit and see what I can get to work. Thanks again.

Martijn

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Combining yield_per and eager loading

2016-07-15 Thread Martijn van Oosterhout
On 15 July 2016 at 18:46, Mike Bayer <mike...@zzzcomputing.com> wrote:


> Here's the problem that cant be solved:
>
> 1. fetch rows 0-1000
>
> 2. start loading Foo objects:
>
>  Foo(1) ->  eagerload Foo.related with 400 members
>  Foo(2) ->  eagerload Foo.related with 500 members
>  Foo(3) -> eagerload Foo.related with the first 100 of 250 members
>
> 3. yield.  That is, stop reading more rows.  Send out Foo(1), Foo(2),
> Foo(3).  The application now can go nuts with these.  It can read them,
> write them, iterate through .related.
>
> Right there, we're broken.  Foo(3) is out in the world with less than half
> of its ".related" collection - they are still sitting on the cursor!
>
>
Ok, I think I see what you're getting at but I don't think that a problem
here. It looks like you have a query which joins all the relationships in
one go. That's not what I'm doing, I'm relying on the fact that I can
interleave the queries. On SQL level it would look like so:

DECLARE my_cursor CURSOR FOR SELECT * FROM bigtable;
FETCH 1000 FROM my_cursor;
SELECT * FROM related_table WHERE bigtable_id IN (1,2,3,4,5,)
SELECT * FROM related_table2 WHERE bigtable_id IN (1,2,3,4,5,)
SELECT * FROM related_table JOIN second_order_relationship USING (pkey)
WHERE bigtable_id IN (1,2,3,4,5,)

FETCH 1000 FROM my_cursor;
SELECT * FROM related_table WHERE bigtable_id IN (11,12,13,14,15,)
SELECT * FROM related_table2 WHERE bigtable_id IN (11,12,13,14,15,)
SELECT * FROM related_table JOIN second_order_relationship USING (pkey)
WHERE bigtable_id IN (11,12,13,14,15,)

<... repeat ...>
(this may be a postgresql specific feature, not sure).

Yes, this may mean that some of the related objects may be fetched multiple
times, but this is not a problem for me (the whole thing is read-only
anyway). What I need however is for the secondary queries to populate the
relationships in the original BigTable objects.

Vastly simpler is to use the recommended window functions to do pagination
> of any scale with no chance of complex failures.
>
>
A bit of googling suggests you are referring to this:
https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/WindowedRangeQuery
which indeed looks very neat and much easier than what I was thinking. It
doesn't say explicitly, but it looks like it will work transparently with
eager loading. It basically does the above, but skips the cursor and
replaces it with queries on ranges of the primary key (which is shorter and
probably more efficient to boot).

Thanks for the tip!
-- 
Martijn van Oosterhout <klep...@gmail.com> http://svana.org/kleptog/

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] Combining yield_per and eager loading

2016-07-15 Thread Martijn van Oosterhout
Ok, so SQLAlchemy has this nice feature where you can eager load 
relationships to significantly reduce the number of queries during 
processing.

On the other hand, to reduce memory usage you can use yield_per() (on 
Postgres) to significantly reduce the memory usage by not loading the 
entire database in memory at once.

For very good reasons mentioned in the documentation you can't use both of 
these in the same query, yet that is kind of my goal. What I'd like to 
achieve, for a given query which goes over a big table:

while not end of resultset:
   take 1000 results
   eagerload all the relationships
   process them

Now, the eager loading part is posing difficulties (or I'm not reading the 
documentation carefully enough). I found the 
attributes.set_committed_value() 
<http://docs.sqlalchemy.org/en/latest/orm/session_api.html#sqlalchemy.orm.attributes.set_committed_value>
 
function which solves half the problem, but I still need to generate the 
actual query to return the necessary objects. So perhaps (pseudo-code):

def eagerload_for_set(object_list, relationship)
   ids = set(o.get(relationship.left_id) for o in object_list)
   lookup = Query(relationship.right_table).filter_by(relationship.
right_column.in_(ids)).all()
   for o in object_list:
   o.set_committed_value(o, relationship.left, lookup[relationship.
left_id])

Before I go diving into the SQLAlchemy to make the above actually work, 
does it seem reasonable? Are there are handy utils somewhere that might 
help?

Thanks for any ideas,

Have a nice day,

-- 
Martijn van Oosterhout



-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] (Micro-)feature request: sessionmaker.configure(class_=foo)

2016-07-10 Thread Martijn van Oosterhout
Hoi,

See $SUBJECT.

This is a bit of a corner case, but sometimes I want to change the class_ 
generated by a sessionmaker(). There is a sessionmaker.configure() but it 
can configure everything except the class. You can easily recreate it but 
if it is usually wrapped in a scoped_session and global, that leads to 
issues...

It's not a really important feature, but it would allow us to clean up some 
warts here are there.

Have a nice day,
--
Martijn van Oosterhout

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] 1.1.0b1: construct has no default compilation handler.

2016-06-28 Thread Martijn van Oosterhout
On 28 June 2016 at 15:47, Mike Bayer <mike...@zzzcomputing.com> wrote:

>
>
> On 06/28/2016 09:09 AM, Martijn van Oosterhout wrote:
>
>>
>> That sqlite reference looks a bit weird, did I miss some initialisation
>> somewhere?
>>
>
> It looks like the @compiles system is in place, which SQLAlchemy does not
> use internally.  that wasn't in your code example but that's likely where
> the exception is raised from.
>
>
Bingo! This is some cruft from prehistory of this project, not sure how
exactly it gets loaded, but adding this to the script triggers it reliably:

from sqlalchemy.ext.compiler import compiles

@compiles(ARRAY, "sqlite")
def compile_binary_sqlite(type_, compiler, **kw):
return "STRING"

Now, in my eyes this should have no effect if you're not using sqlite, but
there is obviously something going on. I've verified that this does not
fail on 1.0.0. At least now I can continue testing 1.1.0.

Have a nice day,
-- 
Martijn van Oosterhout <klep...@gmail.com> http://svana.org/kleptog/

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] 1.1.0b1: construct has no default compilation handler.

2016-06-28 Thread Martijn van Oosterhout
On 27 June 2016 at 16:22, Mike Bayer <mike...@zzzcomputing.com> wrote:

> Not really sure, that URL itself won't allow a psycopg2 connection to even
> occur.


Yeah, I noticed that. psql has sensible defaults here that psycopg2
apparently doesn't have. In this case I'd just stripped the credentials out
of the script, sorry.



> The error means that the construct is trying to be generated as a string,
> like in a print statement.   But 1.1 has a new feature that allows default
> stringification of pg.ARRAY and other constructs to actually work.
>
> Need a stack trace at the very least.
>
>
Good point. Attached. Just to get some more information I put stopped a
debugger at the raise statement and got the following:

>
/home/martijn/virtualenv/eggs/SQLAlchemy-1.1.0b1-py2.7-linux-x86_64.egg/sqlalchemy/ext/compiler.py(459)__call__()
-> raise exc.CompileError(
(Pdb) l
454 if not fn:
455 try:
456 fn = self.specs['default']
457 except KeyError:
458 import pdb; pdb.set_trace()
459  -> raise exc.CompileError(
460 "%s construct has no default "
461 "compilation handler." % type(element))
462 return fn(element, compiler, **kw)
[EOF]
(Pdb) p locals()
{'self': ,
 'element': ARRAY(String()),
 'kw': {'type_expression': Column('reference', ARRAY(String()),
table=, primary_key=True, nullable=False,
server_default=DefaultClause('{}', for_update=False))},
 'pdb': ,
 'fn': None,
 'compiler': }
(Pdb) p self.specs
{'sqlite': }
(Pdb) p self.specs['default']
*** KeyError: KeyError('default',)

That sqlite reference looks a bit weird, did I miss some initialisation
somewhere?

Hope this helps,
-- 
Martijn van Oosterhout <klep...@gmail.com> http://svana.org/kleptog/

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


c
Description: Binary data


Re: [sqlalchemy] Re: Tracking query performance per Session

2016-06-27 Thread Martijn van Oosterhout
Ah, thanks for the tip, the info fields work nicely as well.

Have a nice day,

On Monday, June 27, 2016 at 4:09:54 PM UTC+2, Mike Bayer wrote:
>
>
> Hi Martijn - 
>
> Sorry I missed this post.Your solution is pretty good.  For the 
> setattr() stuff, both Connection and Session have an .info attribute 
> which is a dictionary you can use for memoizations. 
>
>
-- 
Martijn van Oosterhout <klep...@gmail.com> http://svana.org/kleptog/

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] 1.1.0b1: construct has no default compilation handler.

2016-06-27 Thread Martijn van Oosterhout
When trying out 1.1.0b1 (to look at the new events) I got the $SUBJECT 
error from our test cases.

I'm not sure what exactly is going on, because when I run the script below 
line-by-line in the django shell, it breaks, but if I run it from the 
command-line it works. But that's probably more to do with my local setup. 
With 1.0.0 it doesn't fail in either case.


from sqlalchemy import create_engine, MetaData
from sqlalchemy import Column
from sqlalchemy.dialects.postgresql import ARRAY
from sqlalchemy import String
from sqlalchemy.ext.declarative import declarative_base

metadata = MetaData()
metadata.bind = create_engine('postgresql:///')
Base = declarative_base(metadata=metadata)

class Test(Base):
__tablename__ = 'test'
reference = Column(ARRAY(String), nullable=False, server_default='{}', 
primary_key=True)

metadata.create_all(bind=metadata.bind)

===

Hope this helps,
-- 
Martijn van Oosterhout

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] "RuntimeError: dictionary changed size during iteration" during configure_mappers

2016-06-27 Thread Martijn van Oosterhout
On Thursday, June 23, 2016 at 11:22:32 PM UTC+2, Chung wrote:
>
> Ah thanks Mike; that makes sense.
>
> What we have is actually hundreds of different databases with roughly the 
> same schema, with new databases getting created all the time live.  The 
> table schemas for each database are also being altered live.  So I'm not 
> sure we can get away with not reflecting the table schema per request; any 
> request may be referencing any new database or any newly created column. 
>  Possibly there's still something meaningful we can cache?  We will think 
> about this.
>
>
That actually sounds like a situation we had. I think that if you attach 
the connection/bind to the Session instead of to the Metadata, you can use 
the same Metadata for several different databases, as long as they have the 
same schema. Unfortunately, if you didn't set it up that way in the 
beginning it can be painful to rework...

-- 
Martijn van Oosterhout

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] Re: Tracking query performance per Session

2016-06-27 Thread Martijn van Oosterhout

On Tuesday, June 21, 2016 at 11:25:36 PM UTC+2, Martijn van Oosterhout 
wrote:
>
> Hi, 
>
> In an effort to find the bottlenecks in our system we'd like to collect 
> statistics about which queries take a long time, or return a lot of rows, 
> or paths which are triggering lots of queries.
>
> Now, the system consists of lots of parts working in parallel, each with 
> their own Session object, so we want the stats per Session.
>
>
In answer to my own question, this is what I came up with: 
https://gist.github.com/kleptog/1a866b67faa2c917758496a4fbbf92b8

It allows you to do things like:

s1 = SessionMaker()
s1.execute("SELECT 1")
s1.commit()
print s1._query_stats
print s1._query_stats.query_log

Output:


[(1467010565.738685, 'SELECT 1', {}, [('__main__', '', 127), 
('__main__', 'test', 115)], 0.0002570152282714844)]

Using the log you have access to all the executed queries, which you can 
the aggregate, save to disk, etc. It tracks start time, parameters, call 
stack and duration. The adding of extra attributes to the Session and 
Connection objects is not pretty, but it works.

Have a nice day,
-- 
Martijn van Oosterhout <klep...@gmail.com> http://svana.org/kleptog/

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] Tracking query performance per Session

2016-06-21 Thread Martijn van Oosterhout
Hi, 

In an effort to find the bottlenecks in our system we'd like to collect 
statistics about which queries take a long time, or return a lot of rows, 
or paths which are triggering lots of queries.

Now, the system consists of lots of parts working in parallel, each with 
their own Session object, so we want the stats per Session.

The obvious way to do this seems to be by using the events subsystem, 
hooking into the before/after_execute events on a Connection. but the 
problem here is that you only have the Connection object, but no idea which 
Session it belongs to.

Is there any way to extract this information reliably? There is the 
after_begin() event on a Session which has both the Session and the 
Connection, but there doesn't appear to be a reliable moment when the 
Session is no longer related to a connection. Or am I missing something?

Thanks in advance,

-- 
Martijn van Oosterhout <klep...@gmail.com> http://svana.org/kleptog/

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] EXTRACT() not working as expected in PostgreSQL with TIMESTAMP WITH TIMEZONE

2013-06-04 Thread Martijn van Oosterhout
I have just upgraded sqlalchemy and am running some tests on some old code 
and getting some very strange results.

I have a table with a column defined as TIMESTAMP WITH TIMEZONE:

test_table = Table('test', metadata,
Column('id', Integer,  primary_key=True),
Column('data', UnicodeText, primary_key=True),
Column('ts', DateTime(timezone=True), nullable=False),
)

And I then have a query which uses it like so (TestTable is the ORM mapping 
of test_table):

session.query(extract('epoch', database.TestTable.ts))

and this is producing output like:

SELECT EXTRACT(epoch FROM test_table.ts :: timestamp);

That cast is a problem, since casting a timestamptz to a timestamp removes 
the timestamp info, which means you get a different answer, since it's now 
interpreted as UTC, which means the answer is (in my case) two hours off:

# select extract(epoch from '2013-06-02 11:23:45+02'::timestamptz);
 date_part  

 1370165025
(1 row)

# select extract(epoch from '2013-06-02 
11:23:45+02'::timestamptz::timestamp);
 date_part  

 1370172225
(1 row)

This cast seems to have been introduced in response to 
http://www.sqlalchemy.org/trac/ticket/1647 which seems to completely ignore 
the WITH TIMEZONE case. I tested it on 0.5.4, but on 0.7.6 it's definitely 
broken. What I don't understand is why the cast is there at all. Unless the 
argument is text postgresql can work out the type by itself just fine.

I think it's a bug and if people agree I can file one. But is there a 
workaround to suppress the cast?

Thanks in advance,
-- 
Martijn van Oosterhout klep...@gmail.com

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




[sqlalchemy] Re: EXTRACT() not working as expected in PostgreSQL with TIMESTAMP WITH TIMEZONE

2013-06-04 Thread Martijn van Oosterhout
For the record and for other people running into the same problem, here's a 
workaround that kills the cast by wrapping the column in a function that 
does nothing:

session.query(extract('epoch', func.timestamptz(database.TestTable.ts)))

This of course will only work until sqlalchemy learns about all the builtin 
casts, but for now it's fine.

Have a nice day,

On Tuesday, June 4, 2013 10:53:22 AM UTC+2, Martijn van Oosterhout wrote:
snip 

 I think it's a bug and if people agree I can file one. But is there a 
 workaround to suppress the cast?

 Thanks in advance,
 -- 
 Martijn van Oosterhout klep...@gmail.com



-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.