Re: [sqlalchemy] Eager loading hates single table inheritance

2012-01-09 Thread Michael Bayer

On Jan 9, 2012, at 8:56 PM, Jimmy Yuen Ho Wong wrote:

>> In the subqueryload_all()/subqueryload() example, you'll note the second one 
>> with two separate subqueryload() calls does in fact correctly do the 
>> "subqueryload" twice.   There's no way SQLA would ever "figure out" 
>> automatically that they're against the same table and then join the two 
>> queries together, decisionmaking like would be enormously complicated as 
>> well as the mechanics of how to get a single loader to coordinate among two 
>> relationships.   If you use just one relationship() to SearchOption then 
>> provide filtered accessors, then you get exactly the optimization you're 
>> looking for.
> Is it possible to include a special flag to the subqueryload() call to
> tell SA to optimize for this case?

it's not even the decision itself that is most challenging here, it's the 
implementation, as well as API clutter. Routing the loader into two 
collections within the internals would be a giant knot of almost never used 
code, right in the core of the ORM, for a flag that nobody ever uses or knows 
about.   The event system is how things like this should be done - the event I 
gave you is a starter.There's likely ways to get some other event hooks in 
there as well, there's a hook called "append_result" that actually should be 
working here but it wasn't doing it when I tried.

> SA already knows about the class
> hierarchy to be able to distinguish between the classes and how they map
> to the rows returned, changing the model mapping is not always feasible.
> In fact, if SA could do this, subqueryload() will be the optimal
> solution for this use case. joinedload(), even when it works, it's still
> too wasteful. Using events will work for now, but it's not as obvious.

I disagree - a flag would be unused, opaque, arbitrary.   This is not in any 
way an obvious or common use case - I've never seen it before. A ten line 
event handler OTOH illustrates exactly what's going on and keeps the core 
simple.   We've had many weird "flag" based features in the past and the work 
they bring is to take them *out*, and replace them with an open ended and 
user-centric approach.  Examples include:  "entity name", "mutable=True", 
implicit order by, "polymorphic_fetch", Session.load(), all kinds of crap.  
removing bad API is 10x harder than adding it.   We don't do it except for 
features that are of widespread use or are very simple and don't complicate the 
core, like event hooks.

>> Ticket 2120 calls for at least an option "nested_joins=True", specifying 
>> that the more correct/efficient system of nesting joins should be used.   
>> This is all 0.8 stuff, as the joined eager loading code would be 
>> destabilized by this - if it turns out to be an isolated option it could 
>> move to 0.7.   Your set of tests here makes me more interested in the issue 
>> though so perhaps we'll see if I have time to try some things out.
> Does it mean that the only way to eager load reliably in this
> many-to-many single table inheritance use case, depending on the data,
> is joinedload(innerjoin=True) for uselist=False relationships for now?
> In my case, I can just use innerjoin=True all the way, but there may be
> cases where the code has to switch between innerjoin and left join
> depending on the relationships.

joinedload() writes out flattened joins so whatever limitations are apparent 
there, yes.


-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Eager loading hates single table inheritance

2012-01-09 Thread Jimmy Yuen Ho Wong
Ah Thanks! A bit of a hack but certainly works for now. Thanks for
helping out. Really appreciate it!

Jimmy Yuen Ho Wong

On 10/1/12 3:31 AM, Michael Bayer wrote:
> On Jan 9, 2012, at 11:19 AM, Michael Bayer wrote:
>
>> The first thing I note here is, if I were doing a model like this, I'd 
>> either use two different association tables between Product->Origin and 
>> Product->Food, or I'd make one relationship(), and handle the filtering in 
>> Python (which is essentially what you're wishing SQLAlchemy did here).   The 
>> ORM wants to know about your table relationships which here is just 
>> A->assoc->B.All of the issues here, some of which I consider to be SQLA 
>> bugs anyway, would go away if you did it in that manner, subqueryloading 
>> would be efficient, etc.
>
> Here's an event that does what you need:
>
> from sqlalchemy.orm import attributes
> from sqlalchemy import event
>
> class Product(Base):
>
> __tablename__ = "product"
>
> id = Column(Integer, autoincrement=True, primary_key=True)
>
> options = relationship(SearchOption, secondary=product_searchoption_table)
> origin = relationship(OriginOption, uselist=False,
>   secondary=product_searchoption_table)
> foods = relationship(FoodOption,
>  secondary=product_searchoption_table)
>
> @event.listens_for(Product, "load")
> def mything(target, context):
> if 'options' in target.__dict__:
> attributes.set_committed_value(
> target, 'foods',
> [o for o in target.options if o.discriminator=='food']
> )
> origin = [o for o in target.options if o.discriminator=='origin']
> attributes.set_committed_value(
> target, 'origin', 
> origin[0] if origin else None
> )
>
> # only 2 queries
> for row in session.query(Product).options(subqueryload(Product.options)):
> print row, row.origin, row.foods
>
> if I added an onload event for individual relationship attributes that would 
> make this event a little more targeted.
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Eager loading hates single table inheritance

2012-01-09 Thread Jimmy Yuen Ho Wong
On 10/1/12 12:19 AM, Michael Bayer wrote:
> A common theme in SQLAlchemy is that, despite all the flak we get for being 
> "complicated", SQLAlchemy is actually very simple.   It has a handful of 
> constructs which seek to do exactly the same thing in exactly the same way, 
> as consistently as possible.   So we sometimes get requests for "SQLAlchemy 
> should figure out XYZ" and it's like, well not really, that would be really 
> complicated.
Ah ha don't get me wrong but I didn't say SA was complicated, I said my
actual queries used in production is complicated. What's I've shown you
was just a simplified version of what I've narrowed down. SA sure had a
large surface area to get started because it does so much more, but I've
grown to love it. This is really the first time I tried to do something
complicated like this with SA and I was testing if SA would be smart
about this. I can't really live without SA now.
> you'd use two joinedload() twice for that. So all of the examples where 
> you're distinguishing xyz_all() from xyz(), all the same thing.   
>
> It's silently ignored now so http://www.sqlalchemy.org/trac/ticket/2370 is 
> added.  If I can get to it in 0.7 it will emit a warning, will raise an error 
> in 0.8.
Ah, thanks for telling me this. Didn't notice this from the docs.
> In the subqueryload_all()/subqueryload() example, you'll note the second one 
> with two separate subqueryload() calls does in fact correctly do the 
> "subqueryload" twice.   There's no way SQLA would ever "figure out" 
> automatically that they're against the same table and then join the two 
> queries together, decisionmaking like would be enormously complicated as well 
> as the mechanics of how to get a single loader to coordinate among two 
> relationships.   If you use just one relationship() to SearchOption then 
> provide filtered accessors, then you get exactly the optimization you're 
> looking for.
Is it possible to include a special flag to the subqueryload() call to
tell SA to optimize for this case? SA already knows about the class
hierarchy to be able to distinguish between the classes and how they map
to the rows returned, changing the model mapping is not always feasible.
In fact, if SA could do this, subqueryload() will be the optimal
solution for this use case. joinedload(), even when it works, it's still
too wasteful. Using events will work for now, but it's not as obvious.
SA already is so smart about things, surely it can be smarter no? :P
> Ticket 2120 calls for at least an option "nested_joins=True", specifying that 
> the more correct/efficient system of nesting joins should be used.   This is 
> all 0.8 stuff, as the joined eager loading code would be destabilized by this 
> - if it turns out to be an isolated option it could move to 0.7.   Your set 
> of tests here makes me more interested in the issue though so perhaps we'll 
> see if I have time to try some things out.
Does it mean that the only way to eager load reliably in this
many-to-many single table inheritance use case, depending on the data,
is joinedload(innerjoin=True) for uselist=False relationships for now?
In my case, I can just use innerjoin=True all the way, but there may be
cases where the code has to switch between innerjoin and left join
depending on the relationships.
>
> Finally the contains_eager version.   SQLAlchemy again expects two distinct 
> sets of columns for each relationship, so you must join to the table twice:
>
> sa1 = aliased(OriginOption)
> sa2 = aliased(FoodOption)
>
> p = session.query(Product)\
> .join(sa1, Product.origin)\
> .join(sa2, Product.foods)\
> .options(contains_eager(Product.origin, alias=sa1),
>  contains_eager(Product.foods, alias=sa2))\
> .filter(Product.id == 2).one()
>
>
> Same theme here, you're hoping SQLAlchemy can "figure out" something in 
> Python, i.e. that only "certain" rows being routed to 
> Product.origin/Product.foods should be used for each, but it's not that 
> complicated. It relies upon SQL to present it with the correct data 
> geometry and assumes it is correct.  Second-guessing what it gets back from 
> SQL to check, "oh is this some special 1% edge case where I might have to 
> filter extra types that I'm not supposed to receive here?" wouldn't be 
> efficient or consistent with how everything else works.
> consistently.
Ah thanks for this solution, but again, self joining for each
relationship is too inefficient. The point is, as a user, I think the
query with a single join already contains all the data needed to
reconstruct collections. Plus, I've already told SA that the results
will contain the rows that it can populate the collections with, why
can't it do that? That's the thought that went through my head.

Again, I think if contains_eager() accepts a flag that tells it to
optimize for this case, the code to handle this will be isolated.

-- 
You received this message because

Re: [sqlalchemy] Eager loading hates single table inheritance

2012-01-09 Thread Michael Bayer

On Jan 9, 2012, at 11:19 AM, Michael Bayer wrote:

> The first thing I note here is, if I were doing a model like this, I'd either 
> use two different association tables between Product->Origin and 
> Product->Food, or I'd make one relationship(), and handle the filtering in 
> Python (which is essentially what you're wishing SQLAlchemy did here).   The 
> ORM wants to know about your table relationships which here is just 
> A->assoc->B.All of the issues here, some of which I consider to be SQLA 
> bugs anyway, would go away if you did it in that manner, subqueryloading 
> would be efficient, etc.


Here's an event that does what you need:

from sqlalchemy.orm import attributes
from sqlalchemy import event

class Product(Base):

__tablename__ = "product"

id = Column(Integer, autoincrement=True, primary_key=True)

options = relationship(SearchOption, secondary=product_searchoption_table)
origin = relationship(OriginOption, uselist=False,
  secondary=product_searchoption_table)
foods = relationship(FoodOption,
 secondary=product_searchoption_table)

@event.listens_for(Product, "load")
def mything(target, context):
if 'options' in target.__dict__:
attributes.set_committed_value(
target, 'foods',
[o for o in target.options if o.discriminator=='food']
)
origin = [o for o in target.options if o.discriminator=='origin']
attributes.set_committed_value(
target, 'origin', 
origin[0] if origin else None
)

# only 2 queries
for row in session.query(Product).options(subqueryload(Product.options)):
print row, row.origin, row.foods

if I added an onload event for individual relationship attributes that would 
make this event a little more targeted.




-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Eager loading hates single table inheritance

2012-01-09 Thread Michael Bayer

On Jan 9, 2012, at 7:34 AM, Yuen Ho Wong wrote:

> Here's a list of pathological test cases that confuses the hell out of SA 
> while trying to eager load more than 1 collections which are mapped to the 
> same table using single table inheritance. In short, only joinedload*() 
> appears to work out of all the eager loading methods. This pretty much means 
> that supplying eager load options to a Query object doesn't mean you will 
> always get back the same result.

A common theme in SQLAlchemy is that, despite all the flak we get for being 
"complicated", SQLAlchemy is actually very simple.   It has a handful of 
constructs which seek to do exactly the same thing in exactly the same way, as 
consistently as possible.   So we sometimes get requests for "SQLAlchemy should 
figure out XYZ" and it's like, well not really, that would be really 
complicated.

Here we have a mixture of that, as well as some missing features that are 
planned, as well as a few API mis-usages, as well as I wouldn't really do 
things the way you're doing them at least for now.

The first thing I note here is, if I were doing a model like this, I'd either 
use two different association tables between Product->Origin and Product->Food, 
or I'd make one relationship(), and handle the filtering in Python (which is 
essentially what you're wishing SQLAlchemy did here).   The ORM wants to know 
about your table relationships which here is just A->assoc->B.All of the 
issues here, some of which I consider to be SQLA bugs anyway, would go away if 
you did it in that manner, subqueryloading would be efficient, etc.

Another nitpick, joinedload_all(), subqueryload_all() is meant to load along 
chains: A->B->C->D, not siblings, A->B, A->C, i.e.:

p = session.query(Product)\
.options(joinedload_all(Product.origin, Product.foods))\  ---> 
incorrect
.filter(Product.id == 2).one()

you'd use two joinedload() twice for that. So all of the examples where 
you're distinguishing xyz_all() from xyz(), all the same thing.   

It's silently ignored now so http://www.sqlalchemy.org/trac/ticket/2370 is 
added.  If I can get to it in 0.7 it will emit a warning, will raise an error 
in 0.8.

In the subqueryload_all()/subqueryload() example, you'll note the second one 
with two separate subqueryload() calls does in fact correctly do the 
"subqueryload" twice.   There's no way SQLA would ever "figure out" 
automatically that they're against the same table and then join the two queries 
together, decisionmaking like would be enormously complicated as well as the 
mechanics of how to get a single loader to coordinate among two relationships.  
 If you use just one relationship() to SearchOption then provide filtered 
accessors, then you get exactly the optimization you're looking for.

In the joinedload() example, that's sort of a bug or sort of a missing feature. 
   I can't fix that immediately, because we still have a policy whereby 
eagerloading doesn't parenthesize the joins - it always flattens them out.   A 
discussion of this issue is at http://www.sqlalchemy.org/trac/ticket/2120 and 
at this point it's basically SQLite preventing us from doing it, as it chokes 
on a statement like : select * from a left outer join (b join c on b.id=c.bid) 
on a.id=b.aid; .The example is added in 
http://www.sqlalchemy.org/trac/ticket/2369 .

Ticket 2120 calls for at least an option "nested_joins=True", specifying that 
the more correct/efficient system of nesting joins should be used.   This is 
all 0.8 stuff, as the joined eager loading code would be destabilized by this - 
if it turns out to be an isolated option it could move to 0.7.   Your set of 
tests here makes me more interested in the issue though so perhaps we'll see if 
I have time to try some things out.

Finally the contains_eager version.   SQLAlchemy again expects two distinct 
sets of columns for each relationship, so you must join to the table twice:

sa1 = aliased(OriginOption)
sa2 = aliased(FoodOption)

p = session.query(Product)\
.join(sa1, Product.origin)\
.join(sa2, Product.foods)\
.options(contains_eager(Product.origin, alias=sa1),
 contains_eager(Product.foods, alias=sa2))\
.filter(Product.id == 2).one()


Same theme here, you're hoping SQLAlchemy can "figure out" something in Python, 
i.e. that only "certain" rows being routed to Product.origin/Product.foods 
should be used for each, but it's not that complicated. It relies upon SQL 
to present it with the correct data geometry and assumes it is correct.  
Second-guessing what it gets back from SQL to check, "oh is this some special 
1% edge case where I might have to filter extra types that I'm not supposed to 
receive here?" wouldn't be efficient or consistent with how everything else 
works.

So overall, a few bugs we'll fix at some point, but if you want filtering in 
Python, build that onto your Product object.   I 

[sqlalchemy] Eager loading hates single table inheritance

2012-01-09 Thread Yuen Ho Wong
Here's a list of pathological test cases that confuses the hell out of SA 
while trying to eager load more than 1 collections which are mapped to the 
same table using single table inheritance. In short, only joinedload*() 
appears to work out of all the eager loading methods. This pretty much 
means that supplying eager load options to a Query object doesn't mean you 
will always get back the same result.

Ideally, I'd like subqueryload*() and contains_eager() to work just like 
joinedload*() to prevent a situation where I have to waste bandwidth 
loading the same data over and over again or doing MxN queries.

Is there anyway that I can do what I want without rewriting my complicated 
query in full SQL expression?

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/sqlalchemy/-/u-PW089d3McJ.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

#! /usr/bin/env python

from sqlalchemy import Column, Integer, Unicode, ForeignKey, Enum, Table, \
create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship, backref, sessionmaker, joinedload, \
joinedload_all, contains_eager, subqueryload, subqueryload_all

Base = declarative_base()
metadata = Base.metadata


class SearchOption(Base):
__tablename__ = "searchoption"

id = Column(Integer, autoincrement=True, primary_key=True)
parent_id = Column(Integer, ForeignKey(id,
   onupdate="CASCADE",
   ondelete="CASCADE"))
parent = relationship("SearchOption", uselist=False, remote_side=[id],
  backref=backref("children"))
discriminator = Column("type", Enum("origin", "food",
name="searchoptiontype"))
__mapper_args__ = {"polymorphic_on": discriminator}
displayname = Column(Unicode(64), nullable=False)


class OriginOption(SearchOption):
__mapper_args__ = {"polymorphic_identity": "origin"}


class FoodOption(SearchOption):
__mapper_args__ = {"polymorphic_identity": "food"}


product_searchoption_table = Table("product_searchoption", metadata,
   Column("product_id",
  Integer,
  ForeignKey("product.id",
 onupdate="CASCADE",
 ondelete="CASCADE"),
  primary_key=True),
   Column("searchoption_id",
  Integer,
  ForeignKey("searchoption.id",
 onupdate="CASCADE",
 ondelete="CASCADE"),
  primary_key=True))


class Product(Base):

__tablename__ = "product"

id = Column(Integer, autoincrement=True, primary_key=True)

origin = relationship(OriginOption, uselist=False,
  secondary=product_searchoption_table)
foods = relationship(FoodOption,
 secondary=product_searchoption_table)


if __name__ == "__main__":
engine = create_engine("postgresql+psycopg2://tester:tester@localhost/test_eagerload",
   echo=True)
Session = sessionmaker(engine)
session = Session()

metadata.create_all(engine)

usa = OriginOption(displayname=u"usa")
canada = OriginOption(displayname=u"canada")

apple = FoodOption(displayname=u"apple")
orange = FoodOption(displayname=u"orange")
banana = FoodOption(displayname=u"banana")

product1 = Product(origin=usa, foods=[apple])
product2 = Product(origin=canada, foods=[orange, banana])
product3 = Product()

session.add(product1)
session.add(product2)
session.add(product3)

session.commit()

session.expunge_all()

# If all the collections to eager load belong to a single table inheritance
# mapping, there's no way to let SA know to optimize this it seems.
p = session.query(Product)\
.options(subqueryload_all(Product.origin, Product.foods))\
.filter(Product.id == 2).one()

assert p.id == 2
assert p.origin.displayname == u"canada" # This is only eager loaded by the previous query
assert [f.displayname for f in p.foods] == [u'orange', u'banana'] # This is still lazy loaded

session.expunge_all()

# Now all the collections are eagerly loaded, but extremely inefficient
# because of all the MxN queries
p = session