Re: [sqlalchemy] subqueryload for a ColumnProperty?

2011-09-11 Thread Sumeet Agarwal
Mapped selects, nice! I wish I'd seen that part in the documentation
earlier. Do you think it might be worth a reference near Alternatives to
column_property()http://www.sqlalchemy.org/docs/orm/mapper_config.html#alternatives-to-column-propertywhere
you also mention object_session()?

So there's one more issue. Once there's a lot of customers and orders, this
query performs pretty poorly in MySQL (tried 5.5.14 Source and 5.0.92-50-log
Percona). To simplify, I'll use joinedload() in my example instead of
subqueryload().

 f = 
 s.query(Customer).filter_by(id=46).options(joinedload(_num_orders)).all()2011-09-11
  03:04:33,598 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)2011-09-11 
 03:04:33,599 INFO sqlalchemy.engine.base.Engine SELECT customer.id AS 
 customer_id, anon_1.customer_id AS anon_1_customer_id, anon_1.count AS 
 anon_1_count
FROM customer LEFT OUTER JOIN (SELECT `order`.customer_id AS
customer_id, count(`order`.id) AS count
FROM `order` GROUP BY `order`.customer_id) AS anon_1 ON customer.id =
anon_1.customer_id
WHERE customer.id = %(id_1)s2011-09-11 03:04:33,599 INFO
sqlalchemy.engine.base.Engine {'id_1': 46}


The EXPLAIN plan:

mysql explain SELECT customer.id AS customer_id, anon_1.customer_id
AS anon_1_customer_id, anon_1.count AS anon_1_count  FROM customer
LEFT OUTER JOIN (SELECT `order`.customer_id AS customer_id,
count(`order`.id) AS count  FROM `order` GROUP BY `order`.customer_id)
AS anon_1 ON customer.id = anon_1.customer_id  WHERE customer.id = 46
\G
*** 1. row ***
   id: 1
  select_type: PRIMARY
table: customer
 type: const
possible_keys: PRIMARY
  key: PRIMARY
  key_len: 4
  ref: const
 rows: 1
Extra: Using index
*** 2. row ***
   id: 1
  select_type: PRIMARY
table: derived2
 type: ALL
possible_keys: NULL
  key: NULL
  key_len: NULL
  ref: NULL
 rows: 11200
Extra:
*** 3. row ***
   id: 2
  select_type: DERIVED
table: order
 type: index
possible_keys: NULL
  key: customer_id
  key_len: 5
  ref: NULL
 rows: 28725
Extra: Using index3 rows in set (0.04 sec)


It results in a huge derived table and a table scan over the order table.
Out of curiosity, I ran the same statement in PostgresSQL (9.0.4) and
noticed it works much better:

sumeet=# explain analyze SELECT customer.id AS customer_id,
anon_1.customer_id AS anon_1_customer_id, anon_1.count AS anon_1_count
FROM customer LEFT OUTER JOIN (SELECT order.customer_id AS
customer_id, count(order.id) AS count FROM order GROUP BY
order.customer_id) AS anon_1 ON customer.id = anon_1.customer_id
WHERE customer.id = 46;
  QUERY PLAN

--
 Nested Loop Left Join  (cost=0.00..508.06 rows=1 width=16) (actual
time=4.069..4.070 rows=1 loops=1)
   Join Filter: (customer.id = order.customer_id)
   -  Index Scan using customer_pkey on customer  (cost=0.00..8.27
rows=1 width=4) (actual time=0.017..0.018 rows=1 loops=1)
 Index Cond: (id = 46)
   -  GroupAggregate  (cost=0.00..499.77 rows=1 width=8) (actual
time=4.046..4.046 rows=1 loops=1)
 -  Seq Scan on order  (cost=0.00..499.75 rows=2 width=8)
(actual time=0.021..4.042 rows=3 loops=1)
   Filter: (customer_id = 46)
 Total runtime: 4.127 ms
(8 rows)


Not that it helps me any :) I'm on MySQL. I can dramatically improve the
query by adding a WHERE clause inside the subquery. I bolded where the
change is:

mysql explain SELECT customer.id AS customer_id, anon_1.customer_id
AS anon_1_customer_id, anon_1.count AS anon_1_count  FROM customer
LEFT OUTER JOIN (SELECT `order`.customer_id AS customer_id,
count(`order`.id) AS count  FROM `order` *WHERE customer_id = 46*
GROUP BY `order`.customer_id) AS anon_1 ON customer.id =
anon_1.customer_id  WHERE customer.id = 46 \G
*** 1. row ***
   id: 1
  select_type: PRIMARY
table: customer
 type: const
possible_keys: PRIMARY
  key: PRIMARY
  key_len: 4
  ref: const
 rows: 1
Extra: Using index
*** 2. row ***
   id: 1
  select_type: PRIMARY
table: derived2
 type: system
possible_keys: NULL
  key: NULL
  key_len: NULL
  ref: NULL
 rows: 1
Extra:
*** 3. row ***
   id: 2
  select_type: DERIVED
table: order
 type: ref
possible_keys: customer_id
  key: customer_id
  key_len: 5
  ref:
 rows: 3
Extra: Using where; Using index3 rows in 

Re: [sqlalchemy] subqueryload for a ColumnProperty?

2011-09-11 Thread Michael Bayer

On Sep 11, 2011, at 7:05 AM, Sumeet Agarwal wrote:

 It results in a huge derived table and a table scan over the order table. Out 
 of curiosity, I ran the same statement in PostgresSQL (9.0.4) and noticed it 
 works much better:

Yeah one thing I do when I use MySQL, is that I assume at the gate that joins 
are out of the question.   MySQL is not a system that is reasonably capable of 
basic relational tasks.   If I'm using MySQL, it's usually because I'm 
integrating with some existing PHP application or some manager forced some bad 
decision on me.

I know this is not helping and you want this:

 
 mysql explain SELECT customer.id AS customer_id, anon_1.customer_id AS 
 anon_1_customer_id, anon_1.count AS anon_1_count  FROM customer LEFT OUTER 
 JOIN (SELECT `order`.customer_id AS customer_id, count(`order`.id) AS count  
 FROM `order` WHERE customer_id = 46 GROUP BY `order`.custome r_id) AS anon_1 
 ON customer.id = anon_1.customer_id  WHERE customer.id = 46 
So to get exactly the behavior of, copy the WHERE clause into the joined load 
or subq load, that's not directly possible with the JoinedLoader or 
SubqueryLoader.It's also the kind of thing that's not very easy to do 
generically, if your lead query were much more complex than a simple WHERE 
clause for example.

By far the easiest way is to just write it out as two queries and assign the 
value.   You don't even need the whole relationship() thing, just emit a second 
query for the counts, then set the attribute.   That's doityourself.py 
attached, where I purposely did it with no joins whatsoever since if you have 
the customer_id here, you don't need the JOIN at all - MySQL hates joins so 
this approach will definitely be the fastest (plus no SQLAlchemy attribute 
overhead).You can make it a little more generic by creating a function that 
you give the WHERE clause to, it then applies the WHERE to both the customer 
and order tables in some way.  

If you still wanted it with the JOIN to the parent,  and optionally deal with 
the relationship() to OrderCount again, that approach is actually a 
specialization of the subqueryload recipe at 
http://www.sqlalchemy.org/trac/wiki/UsageRecipes/DisjointEagerLoading .The 
recipe then uses q.from_self(Child).join(Parent.children) to create the join 
from the parent to child.   But here the level of hardcoding is already so 
great I don't see what it buys you except a slower MySQL query.

If you were doing joined loading, you can keep using OrderCount and use 
contains_eager() to write it out explicitly:

subq = s.query(OrderCount).filter(OrderCount.customer_id==1).subquery()
for c in s.query(Customer).\
filter(Customer.id==1).\
outerjoin(subq, Customer._num_orders).\
options(contains_eager(_num_orders, alias=subq)):
print c.id, c.num_orders


The query there still has more nesting than MySQL likes.

Next approach, hardcode a bind in there, very brittle but would produce this 
result:

order_counts = select([Order.customer_id, 
func.count(Order.id).label('count')]).\
where(Order.customer_id==bindparam(cust_id)).\
group_by(Order.customer_id).alias()

mapper(
OrderCount, 
order_counts,
primary_key=order_counts.c.customer_id
)

the joinedload scenario would work like this:

s.query(Customer).filter(Customer.id==48).options(joinedload(_num_orders)).params(cust_id=48)

subqueryload(), not as easy - we need to use the technique at 
http://www.sqlalchemy.org/trac/wiki/UsageRecipes/GlobalFilter (which also works 
with the joinedload version):

from sqlalchemy.orm.interfaces import MapperOption

class SetABindOption(MapperOption):
propagate_to_loaders = True

def __init__(self, **kw):
self.kw = kw

def process_query_conditionally(self, query):
process query during a lazyload
query._params = query._params.union(self.kw)

def process_query(self, query):
process query during a primary user query

self.process_query_conditionally(query)

s.query(Customer).filter(Customer.id==1).options(subqueryload(_num_orders), 
SetABindOption(cust_id=1))

But I'd stick with, just do an extra SELECT statement unless more genericism is 
needed.  Since it appears this case is so special purpose already and MySQL is 
going to chug to a halt if you fall back on other methods.






-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base= declarative_base()

class Customer(Base):
__tablename__ = 'customer'
id = Column(Integer, primary_key=True)
orders = relationship(Order)

class Order(Base):

Re: [sqlalchemy] subqueryload for a ColumnProperty?

2011-09-07 Thread Michael Bayer
For a deferred() itself, we don't have an option that does this.Though this 
is an unusual request.   If you definitely want the deferreds to load, what's 
the issue having them render inline into the original query ?The advantage 
to subqueryload is primarily in that it loads multiple rows per parent object 
efficiently, without re-fetching the full parent row many times,  or needing to 
use a less efficient OUTER JOIN.   A deferred is always one row per parent - 
and no join is needed anyway.




On Sep 6, 2011, at 6:49 PM, Sumeet Agarwal wrote:

 I have a collection of deferred `ColumnProperty`s that I'd like to
 start loading in subsequent queries. I know you can get this to happen
 on the first access of a `ColumnProperty` that is deferred, but I
 wonder if there's any way to specify this in the query.
 
 For relationships, `joinedload[_all()]` has a counterpart
 `subqueryload[_all()]`. Is there nothing similar we can do for
 `ColumnProperty`?
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 sqlalchemy group.
 To post to this group, send email to sqlalchemy@googlegroups.com.
 To unsubscribe from this group, send email to 
 sqlalchemy+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/sqlalchemy?hl=en.
 

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] subqueryload for a ColumnProperty?

2011-09-07 Thread Sumeet Agarwal
Still getting the hang of SQLAlchemy terminology.

I didn't mean a simple deferred() column, but a deferred column_property()
which defines a scalar select(). Like in the example here
http://www.sqlalchemy.org/docs/orm/mapper_config.html#sql-expressions-as-mapped-attributes
.

My goal is to, instead of including a correlated subquery in the SELECT. The
column_property looks like

Customer.num_orders = orm.column_property(
select([func.count(Order.id)])\
.where(Order.customer_id == Customer.id)\
.correlate(Customer.__table__),
deferred=True
)


I'd rather do a separate query to load num_orders, rather than getting it
lazily or by using undefer(). It almost feels like I might want to define
num_orders as a relationship somehow? I dunno if what I would like to do
maps cleanly to a SQLAlchemy pattern.

On Wed, Sep 7, 2011 at 7:51 AM, Michael Bayer mike...@zzzcomputing.comwrote:

 For a deferred() itself, we don't have an option that does this.Though
 this is an unusual request.   If you definitely want the deferreds to load,
 what's the issue having them render inline into the original query ?The
 advantage to subqueryload is primarily in that it loads multiple rows per
 parent object efficiently, without re-fetching the full parent row many
 times,  or needing to use a less efficient OUTER JOIN.   A deferred is
 always one row per parent - and no join is needed anyway.




 On Sep 6, 2011, at 6:49 PM, Sumeet Agarwal wrote:

  I have a collection of deferred `ColumnProperty`s that I'd like to
  start loading in subsequent queries. I know you can get this to happen
  on the first access of a `ColumnProperty` that is deferred, but I
  wonder if there's any way to specify this in the query.
 
  For relationships, `joinedload[_all()]` has a counterpart
  `subqueryload[_all()]`. Is there nothing similar we can do for
  `ColumnProperty`?
 
  --
  You received this message because you are subscribed to the Google Groups
 sqlalchemy group.
  To post to this group, send email to sqlalchemy@googlegroups.com.
  To unsubscribe from this group, send email to
 sqlalchemy+unsubscr...@googlegroups.com.
  For more options, visit this group at
 http://groups.google.com/group/sqlalchemy?hl=en.
 

 --
 You received this message because you are subscribed to the Google Groups
 sqlalchemy group.
 To post to this group, send email to sqlalchemy@googlegroups.com.
 To unsubscribe from this group, send email to
 sqlalchemy+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/sqlalchemy?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] subqueryload for a ColumnProperty?

2011-09-07 Thread Michael Bayer

On Sep 7, 2011, at 5:09 PM, Sumeet Agarwal wrote:

 Still getting the hang of SQLAlchemy terminology.
 
 I didn't mean a simple deferred() column, but a deferred column_property() 
 which defines a scalar select(). Like in the example here 
 http://www.sqlalchemy.org/docs/orm/mapper_config.html#sql-expressions-as-mapped-attributes.
 
 My goal is to, instead of including a correlated subquery in the SELECT. The 
 column_property looks like

My goal is to..., missing a verb there :). If it were me, I'd think 
you're asking to unwrap the correlated subquery into a plain JOIN.   Since that 
works more efficiently.   

 
 I'd rather do a separate query to load num_orders, rather than getting it 
 lazily or by using undefer(). It almost feels like I might want to define 
 num_orders as a relationship somehow? I dunno if what I would like to do maps 
 cleanly to a SQLAlchemy pattern.

Using relationship() is kind of the old school approach, but yup I think that 
actually allows you to get exactly subqueryload() for an attribute, so good 
idea !  a rare moment that SQLAlchemy amazes even me.   here's that since its 
faster for me to just illustrate

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base= declarative_base()

class Customer(Base):
__tablename__ = 'customer'
id = Column(Integer, primary_key=True)
orders = relationship(Order)

@property
def num_orders(self):
return self._num_orders.count if self._num_orders else 0

class Order(Base):
__tablename__ = 'order'
id = Column(Integer, primary_key=True)
customer_id = Column(Integer, ForeignKey('customer.id'))

class OrderCount(object):
pass
order_counts = select([Order.customer_id, 
func.count(Order.id).label('count')]).\
group_by(Order.customer_id).alias()

mapper(
OrderCount, 
order_counts,
primary_key=order_counts.c.customer_id
)

Customer._num_orders = relationship(OrderCount, uselist=False)

e = create_engine('sqlite://', echo=True)

Base.metadata.create_all(e)

s = Session(e)
s.add_all([
Customer(orders=[Order(), Order(), Order()]),
Customer(orders=[Order(), Order()]),
Customer(orders=[]),
])
s.commit()

for c in s.query(Customer).options(subqueryload(_num_orders)):
print c.id, c.num_orders










 
 On Wed, Sep 7, 2011 at 7:51 AM, Michael Bayer mike...@zzzcomputing.com 
 wrote:
 For a deferred() itself, we don't have an option that does this.Though 
 this is an unusual request.   If you definitely want the deferreds to load, 
 what's the issue having them render inline into the original query ?The 
 advantage to subqueryload is primarily in that it loads multiple rows per 
 parent object efficiently, without re-fetching the full parent row many 
 times,  or needing to use a less efficient OUTER JOIN.   A deferred is always 
 one row per parent - and no join is needed anyway.
 
 
 
 
 On Sep 6, 2011, at 6:49 PM, Sumeet Agarwal wrote:
 
  I have a collection of deferred `ColumnProperty`s that I'd like to
  start loading in subsequent queries. I know you can get this to happen
  on the first access of a `ColumnProperty` that is deferred, but I
  wonder if there's any way to specify this in the query.
 
  For relationships, `joinedload[_all()]` has a counterpart
  `subqueryload[_all()]`. Is there nothing similar we can do for
  `ColumnProperty`?
 
  --
  You received this message because you are subscribed to the Google Groups 
  sqlalchemy group.
  To post to this group, send email to sqlalchemy@googlegroups.com.
  To unsubscribe from this group, send email to 
  sqlalchemy+unsubscr...@googlegroups.com.
  For more options, visit this group at 
  http://groups.google.com/group/sqlalchemy?hl=en.
 
 
 --
 You received this message because you are subscribed to the Google Groups 
 sqlalchemy group.
 To post to this group, send email to sqlalchemy@googlegroups.com.
 To unsubscribe from this group, send email to 
 sqlalchemy+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/sqlalchemy?hl=en.
 
 
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 sqlalchemy group.
 To post to this group, send email to sqlalchemy@googlegroups.com.
 To unsubscribe from this group, send email to 
 sqlalchemy+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/sqlalchemy?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] subqueryload for a ColumnProperty?

2011-09-06 Thread Sumeet Agarwal
I have a collection of deferred `ColumnProperty`s that I'd like to
start loading in subsequent queries. I know you can get this to happen
on the first access of a `ColumnProperty` that is deferred, but I
wonder if there's any way to specify this in the query.

For relationships, `joinedload[_all()]` has a counterpart
`subqueryload[_all()]`. Is there nothing similar we can do for
`ColumnProperty`?

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.