[sqlalchemy] Re: Eager loading with SELECT instead of JOIN

2009-10-15 Thread Michael Bayer

Emil Ivanov wrote:
>
> As far as I understand eager loading it works by adding a join clause

and...the rationale for the Hibernate feature is to optimize *lazy*
loading, not eager loading - its a query that you only want if needed.  
Also hibernate does a poor job with join-based eager loading - they make
no adjustment for things like LIMIT/OFFSET, for example (which is
disastrous IMHO).

The performance gains in the "eager" version are only that of fetching
columns.  In Hibernate's case, some JDBC implementations don't pull
columns over the wire unless requested on the result.   Sadly this is not
the case for most Python DBAPIs.



--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Eager loading with SELECT instead of JOIN

2009-10-15 Thread Michael Bayer

Michael Bayer wrote:
>
> Emil Ivanov wrote:
>>
>> As far as I understand eager loading it works by adding a join clause
>> to the select. That has the drawback of increased number or rows
>> returned. Imagine having 100 objects and each with 200 related
>> objects, most of which overlap so the total number of related objects
>> is about 300. That will generate 20 000 rows, where in reality it
>> would be much better to fetch the 100 objects with a single query and
>> then fetch the 300 related objects with another query.

Also something to clarify here.   If you have 100 parents, each with 200
children, and there is "overlap", that implies there is a many-to-many
relation between parent/child, and that there are essentially 20K rows in
the association table.   But you still need to query all 20K of those rows
in order to determine the correct associations to the parents.  So no rows
are saved.

Simlarly, if the relation is one-to-many or many-to-one, there would be no
"overlap" and you still ultimately have to fetch everything.

The approach only saves on columns being fetched, not rows.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Eager loading with SELECT instead of JOIN

2009-10-15 Thread Michael Bayer

Emil Ivanov wrote:
>
> As far as I understand eager loading it works by adding a join clause
> to the select. That has the drawback of increased number or rows
> returned. Imagine having 100 objects and each with 200 related
> objects, most of which overlap so the total number of related objects
> is about 300. That will generate 20 000 rows, where in reality it
> would be much better to fetch the 100 objects with a single query and
> then fetch the 300 related objects with another query.
>
> It that possible with the current version of sqlalchemy.
>
> Hibernate has something like this in the form of Batch fetching:
> http://docs.jboss.org/hibernate/stable/core/reference/en/html_single/#performance-fetching-batch

its not built in but you can run the second query manually, using the
first query as a subquery to formulate the WHERE criterion for the second.
  I do this often, in fact.

there is a helper method sqlalchemy.orm.attributes.set_committed_value()
which you can use to populate the related scalar or collections of each
parent object, without any history events firing off, so that the related
object(s) will have the state of objects that were loaded from the
database.

the general idea is:

q= sess.query(Parent).filter(...)
parents = q.all()

children =
sess.query(Child).filter(Child.id.in_(sess.query(Parent.id).filter(...))

parent_dict = dict((p.id, []) for p in parents)

for child in children:
parent_dict[child.parent_id].append(child)

for p in parents:
set_committed_value(p, "children", parent_dict[p.id])

there are several reasons this is not built in.  one is that the subquery
step is very difficult to formulate for relations that deal with complex
join conditions and composite primary keys - by leaving this to the user
SQLA avoids wading into what is probably kind of a bottomless hole.   The
other is that the ORM doesn't include an "immediate second query" in its
architecture, nor does it include a "deferred load among many objects from
a previous query in response to a single attribute access" - which is
essentially the "lazy" version of this and is often what people prefer.

there are two features though which would make the above operation simpler
- a query.select_entities(*cols_or_entities) method, which is essentially
what query.values() does without the execution (I always preferred it that
way, in fact), and perhaps a "knit_collections_together" type of method
that would do the second step.



--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---