Re: [sqlalchemy] Large number of polymorphic subclasses

2017-05-01 Thread mike bayer
The selectinload feature revealed the need to reorganize most of the 
"loading_relationships" documentation to make adequate context for the 
new feature and overall improve a lot of crufty organization.   This 
reorg of the docs somewhat awkwardly needed to be decoupled from the 
feature itself so that it was in a separate commit, and also able to be 
merged up to the 1.1 documentation with the new organization but without 
the actual feature.


The situation is similar with joined eager "in" load - the inheritance 
docs are disorganized from years of updates without a full pass over the 
whole thing to organize in terms of how inheritance config/loading works 
now.  So I am working on reorganizing the inheritance documentation 
before I continue with writing the tests for the joined eager "in" load 
feature so that the docs for the new feature can be easily integrated 
into the existing docs.


As always, everything I do is fully visible just by subscribing to the 
gerrits:


https://gerrit.sqlalchemy.org/#/c/392/

https://gerrit.sqlalchemy.org/#/c/359/



On 05/01/2017 06:12 PM, da...@benchling.com wrote:

Hey Mike,

I just saw the new selectinload feature, and am super excited about it!

As far as I can tell, this is only for relationships, so I wanted to 
check what the plans are for an inheritance loader using selectinload as 
well, given that you initially mentioned them together and I can't find 
an issue in the tracker for it.


I've created an issue for both a new relationship loader and an
inheritance loader at the same time, since they will use very similar
paths, at https://bitbucket.org/zzzeek/sqlalchemy/issues/3944

.


Thanks!
Damon

On Tuesday, April 4, 2017 at 8:24:07 PM UTC-7, Mike Bayer wrote:

Reusing the original query is hard, and the case you have is due to
mismatched entities that would have to be handled, probably by
wrapping the original query in a subquery just like subquery eager
loading does.  The subquery eager loading feature took a few years
to work out an enormous number of issues with this approach.

The new feature which I'm now working on, several hours a day, in
response to this thread solves the whole problem in a much better
way, using a simple IN expression against the primary keys of all
states loaded in the query.   Betas of 1.2 will hopefully be
available in a month or two.The feature can be replicated in
older versions by just grabbing the list of states coming out of
__iter__ and using their primary keys in an IN clause.   Parallel
development of this other approach as an interim throwaway is
probably not worth it if it continues to become more complicated.

On Apr 4, 2017 7:02 PM, > wrote:

Hey Mike,

Thanks for those -- seems to have helped those cases, though
we're running into some pretty weird behavior with joins. Here's
a simplified case that shows one of the issues we're running
into (use _loader_from_cls from above):


class W(Base):
 __tablename__ = 'w'
 id = Column(Integer, primary_key=True)
 type = Column(String)

 x_id = Column(Integer, ForeignKey('x.id '))
 x = relationship('X')

 __mapper_args__ = {'polymorphic_on': type}


class W2(W):
 __tablename__ = 'w2'
 id = Column(Integer, ForeignKey('w.id '),
primary_key=True)
 w2 = Column(String)
 __mapper_args__ = {'polymorphic_identity': 'w2'}


class X(Base):
 __tablename__ = 'x'
 id = Column(Integer, primary_key=True)


@event.listens_for(W, "load", propagate=True)
def load_extra(target, context):
  key = ('loader_by_cls', type(target))

  if key not in context.attributes:
  context.attributes[key] = _loader_for_cls(target, context)


e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)
s = Session(e)

s.add(W2(x=X()))
s.commit()

s.query(W).join(W.x).first()


One of the statements emitted by this is

SELECT w2.id  AS w2_id, w.id  AS
w_id, w.type AS w_type, w.x_id AS w_x_id, w2.w2 AS w2_w2
FROM w2, w JOIN x ON x.id  = w.x_id
  LIMIT ? OFFSET ?

Specifically, the "FROM w2, w JOIN x" is not what we want here
-- we'd just want "FROM w2"

Replacing

 q = orig_query.with_entities(target_cls)

with

 q =
context.session.query(target_cls).join(orig_query.subquery())

fixes the issue, though this feels less than ideal. Subclassing
Query also seems less than ideal. Do you h

Re: [sqlalchemy] Large number of polymorphic subclasses

2017-05-01 Thread damon
Hey Mike,

I just saw the new selectinload feature, and am super excited about it!

As far as I can tell, this is only for relationships, so I wanted to check 
what the plans are for an inheritance loader using selectinload as well, 
given that you initially mentioned them together and I can't find an issue 
in the tracker for it.
 

> I've created an issue for both a new relationship loader and an 
> inheritance loader at the same time, since they will use very similar 
> paths, at https://bitbucket.org/zzzeek/sqlalchemy/issues/3944 
> 
> .


Thanks!
Damon

On Tuesday, April 4, 2017 at 8:24:07 PM UTC-7, Mike Bayer wrote:
>
> Reusing the original query is hard, and the case you have is due to 
> mismatched entities that would have to be handled, probably by wrapping the 
> original query in a subquery just like subquery eager loading does.  The 
> subquery eager loading feature took a few years to work out an enormous 
> number of issues with this approach.   
>
> The new feature which I'm now working on, several hours a day, in response 
> to this thread solves the whole problem in a much better way, using a 
> simple IN expression against the primary keys of all states loaded in the 
> query.   Betas of 1.2 will hopefully be available in a month or two.The 
> feature can be replicated in older versions by just grabbing the list of 
> states coming out of __iter__ and using their primary keys in an IN clause. 
>   Parallel development of this other approach as an interim throwaway is 
> probably not worth it if it continues to become more complicated.
>
> On Apr 4, 2017 7:02 PM, > wrote:
>
> Hey Mike,
>
> Thanks for those -- seems to have helped those cases, though we're running 
> into some pretty weird behavior with joins. Here's a simplified case that 
> shows one of the issues we're running into (use _loader_from_cls from 
> above):
>
>
> class W(Base):
> __tablename__ = 'w'
> id = Column(Integer, primary_key=True)
> type = Column(String)
>
> x_id = Column(Integer, ForeignKey('x.id'))
> x = relationship('X')
>
> __mapper_args__ = {'polymorphic_on': type}
>
>
> class W2(W):
> __tablename__ = 'w2'
> id = Column(Integer, ForeignKey('w.id'), primary_key=True)
> w2 = Column(String)
> __mapper_args__ = {'polymorphic_identity': 'w2'}
>
>
> class X(Base):
> __tablename__ = 'x'
> id = Column(Integer, primary_key=True)
>
>
> @event.listens_for(W, "load", propagate=True)
> def load_extra(target, context):
>  key = ('loader_by_cls', type(target))
>
>  if key not in context.attributes:
>  context.attributes[key] = _loader_for_cls(target, context)
>
>
> e = create_engine("sqlite://", echo=True)
> Base.metadata.create_all(e)
> s = Session(e)
>
> s.add(W2(x=X()))
> s.commit()
>
> s.query(W).join(W.x).first()
>
>
> One of the statements emitted by this is
>
> SELECT w2.id AS w2_id, w.id AS w_id, w.type AS w_type, w.x_id AS w_x_id, 
> w2.w2 AS w2_w2
> FROM w2, w JOIN x ON x.id = w.x_id
>  LIMIT ? OFFSET ?
>
> Specifically, the "FROM w2, w JOIN x" is not what we want here -- we'd 
> just want "FROM w2"
>
> Replacing 
>
> q = orig_query.with_entities(target_cls)
>
> with
>
> q = context.session.query(target_cls).join(orig_query.subquery())
>
> fixes the issue, though this feels less than ideal. Subclassing Query also 
> seems less than ideal. Do you have any ideas here?
>
>
> Thanks,
> Damon
>
> On Tuesday, April 4, 2017 at 6:42:32 AM UTC-7, Mike Bayer wrote:
>
>> This repaste of the example contains two refinements to address each of 
>> these issues distinctly.  We want to avoid re-entrant invocation of 
>> _loader_for_cls, so putting a flag into attributes handles that.   Then 
>> there's the issue of same class coming in that we are already querying, 
>> we can look at the type being loaded in column_descriptions, and this 
>> also solves the re-entrant problem though I think the flag is more 
>> direct.  The column_descriptions check likely needs more refinements, 
>> but the general idea is that if the entity being loaded is already a B2, 
>> then you wouldn't do any extra B2 queries (unless you have multiple 
>> levels of inheritance / polymorphic loading in which case that probably 
>> needs adjustment). 
>>
>> The "strong reference" comment, you can try commenting that out in the 
>> test and see what happens.  Because this loader trick loads all the 
>> subclasses up front upon seeing the first member of a particular 
>> subtype, it is loading for subsequent instances that aren't seen yet as 
>> well.  If we don't make a strong reference to them, they get lost and 
>> the extra attribute loading fails. 
>>
>> The feature I have in development works a lot better because all the 
>> additional loads are done *after* all the entities are loaded, and it 
>> does it using an IN query that only inc

Re: [sqlalchemy] Large number of polymorphic subclasses

2017-04-04 Thread Mike Bayer
Reusing the original query is hard, and the case you have is due to
mismatched entities that would have to be handled, probably by wrapping the
original query in a subquery just like subquery eager loading does.  The
subquery eager loading feature took a few years to work out an enormous
number of issues with this approach.

The new feature which I'm now working on, several hours a day, in response
to this thread solves the whole problem in a much better way, using a
simple IN expression against the primary keys of all states loaded in the
query.   Betas of 1.2 will hopefully be available in a month or two.The
feature can be replicated in older versions by just grabbing the list of
states coming out of __iter__ and using their primary keys in an IN clause.
  Parallel development of this other approach as an interim throwaway is
probably not worth it if it continues to become more complicated.

On Apr 4, 2017 7:02 PM,  wrote:

Hey Mike,

Thanks for those -- seems to have helped those cases, though we're running
into some pretty weird behavior with joins. Here's a simplified case that
shows one of the issues we're running into (use _loader_from_cls from
above):


class W(Base):
__tablename__ = 'w'
id = Column(Integer, primary_key=True)
type = Column(String)

x_id = Column(Integer, ForeignKey('x.id'))
x = relationship('X')

__mapper_args__ = {'polymorphic_on': type}


class W2(W):
__tablename__ = 'w2'
id = Column(Integer, ForeignKey('w.id'), primary_key=True)
w2 = Column(String)
__mapper_args__ = {'polymorphic_identity': 'w2'}


class X(Base):
__tablename__ = 'x'
id = Column(Integer, primary_key=True)


@event.listens_for(W, "load", propagate=True)
def load_extra(target, context):
 key = ('loader_by_cls', type(target))

 if key not in context.attributes:
 context.attributes[key] = _loader_for_cls(target, context)


e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)
s = Session(e)

s.add(W2(x=X()))
s.commit()

s.query(W).join(W.x).first()


One of the statements emitted by this is

SELECT w2.id AS w2_id, w.id AS w_id, w.type AS w_type, w.x_id AS w_x_id,
w2.w2 AS w2_w2
FROM w2, w JOIN x ON x.id = w.x_id
 LIMIT ? OFFSET ?

Specifically, the "FROM w2, w JOIN x" is not what we want here -- we'd just
want "FROM w2"

Replacing

q = orig_query.with_entities(target_cls)

with

q = context.session.query(target_cls).join(orig_query.subquery())

fixes the issue, though this feels less than ideal. Subclassing Query also
seems less than ideal. Do you have any ideas here?


Thanks,
Damon

On Tuesday, April 4, 2017 at 6:42:32 AM UTC-7, Mike Bayer wrote:

> This repaste of the example contains two refinements to address each of
> these issues distinctly.  We want to avoid re-entrant invocation of
> _loader_for_cls, so putting a flag into attributes handles that.   Then
> there's the issue of same class coming in that we are already querying,
> we can look at the type being loaded in column_descriptions, and this
> also solves the re-entrant problem though I think the flag is more
> direct.  The column_descriptions check likely needs more refinements,
> but the general idea is that if the entity being loaded is already a B2,
> then you wouldn't do any extra B2 queries (unless you have multiple
> levels of inheritance / polymorphic loading in which case that probably
> needs adjustment).
>
> The "strong reference" comment, you can try commenting that out in the
> test and see what happens.  Because this loader trick loads all the
> subclasses up front upon seeing the first member of a particular
> subtype, it is loading for subsequent instances that aren't seen yet as
> well.  If we don't make a strong reference to them, they get lost and
> the extra attribute loading fails.
>
> The feature I have in development works a lot better because all the
> additional loads are done *after* all the entities are loaded, and it
> does it using an IN query that only includes those entities that
> definitely need the load.  I'm not sure if there are other negative side
> effects from our loading of the "subclass" entity in some cases ahead of
> where the primary query gets at the entity.The recipe here could be
> made to do all the extra loads after the primary query but theres no
> "after query" hook, you'd have to subclass Query and override __iter__
> to add this step.
>
> from sqlalchemy import *
> from sqlalchemy.orm import *
> from sqlalchemy.ext.declarative import declarative_base
> from sqlalchemy import event
>
> Base = declarative_base()
>
>
> class A(Base):
>  __tablename__ = 'a'
>  id = Column(Integer, primary_key=True)
>  a1 = Column(String)
>  type = Column(String)
>
>  __mapper_args__ = {'polymorphic_on': type}
>
>
> class B1(A):
>  __tablename__ = 'b1'
>  id = Column(Integer, ForeignKey('a.id'), primary_key=True)
>  b1 = Column(String)
>  b_data = Column(String)
>  __mapper_args__ = {'pol

Re: [sqlalchemy] Large number of polymorphic subclasses

2017-04-04 Thread damon
Hey Mike,

Thanks for those -- seems to have helped those cases, though we're running 
into some pretty weird behavior with joins. Here's a simplified case that 
shows one of the issues we're running into (use _loader_from_cls from 
above):


class W(Base):
__tablename__ = 'w'
id = Column(Integer, primary_key=True)
type = Column(String)

x_id = Column(Integer, ForeignKey('x.id'))
x = relationship('X')

__mapper_args__ = {'polymorphic_on': type}


class W2(W):
__tablename__ = 'w2'
id = Column(Integer, ForeignKey('w.id'), primary_key=True)
w2 = Column(String)
__mapper_args__ = {'polymorphic_identity': 'w2'}


class X(Base):
__tablename__ = 'x'
id = Column(Integer, primary_key=True)


@event.listens_for(W, "load", propagate=True)
def load_extra(target, context):
 key = ('loader_by_cls', type(target))

 if key not in context.attributes:
 context.attributes[key] = _loader_for_cls(target, context)


e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)
s = Session(e)

s.add(W2(x=X()))
s.commit()

s.query(W).join(W.x).first()


One of the statements emitted by this is

SELECT w2.id AS w2_id, w.id AS w_id, w.type AS w_type, w.x_id AS w_x_id, 
w2.w2 AS w2_w2
FROM w2, w JOIN x ON x.id = w.x_id
 LIMIT ? OFFSET ?

Specifically, the "FROM w2, w JOIN x" is not what we want here -- we'd just 
want "FROM w2"

Replacing 

q = orig_query.with_entities(target_cls)

with

q = context.session.query(target_cls).join(orig_query.subquery())

fixes the issue, though this feels less than ideal. Subclassing Query also 
seems less than ideal. Do you have any ideas here?


Thanks,
Damon

On Tuesday, April 4, 2017 at 6:42:32 AM UTC-7, Mike Bayer wrote:
>
> This repaste of the example contains two refinements to address each of 
> these issues distinctly.  We want to avoid re-entrant invocation of 
> _loader_for_cls, so putting a flag into attributes handles that.   Then 
> there's the issue of same class coming in that we are already querying, 
> we can look at the type being loaded in column_descriptions, and this 
> also solves the re-entrant problem though I think the flag is more 
> direct.  The column_descriptions check likely needs more refinements, 
> but the general idea is that if the entity being loaded is already a B2, 
> then you wouldn't do any extra B2 queries (unless you have multiple 
> levels of inheritance / polymorphic loading in which case that probably 
> needs adjustment). 
>
> The "strong reference" comment, you can try commenting that out in the 
> test and see what happens.  Because this loader trick loads all the 
> subclasses up front upon seeing the first member of a particular 
> subtype, it is loading for subsequent instances that aren't seen yet as 
> well.  If we don't make a strong reference to them, they get lost and 
> the extra attribute loading fails. 
>
> The feature I have in development works a lot better because all the 
> additional loads are done *after* all the entities are loaded, and it 
> does it using an IN query that only includes those entities that 
> definitely need the load.  I'm not sure if there are other negative side 
> effects from our loading of the "subclass" entity in some cases ahead of 
> where the primary query gets at the entity.The recipe here could be 
> made to do all the extra loads after the primary query but theres no 
> "after query" hook, you'd have to subclass Query and override __iter__ 
> to add this step. 
>
> from sqlalchemy import * 
> from sqlalchemy.orm import * 
> from sqlalchemy.ext.declarative import declarative_base 
> from sqlalchemy import event 
>
> Base = declarative_base() 
>
>
> class A(Base): 
>  __tablename__ = 'a' 
>  id = Column(Integer, primary_key=True) 
>  a1 = Column(String) 
>  type = Column(String) 
>
>  __mapper_args__ = {'polymorphic_on': type} 
>
>
> class B1(A): 
>  __tablename__ = 'b1' 
>  id = Column(Integer, ForeignKey('a.id'), primary_key=True) 
>  b1 = Column(String) 
>  b_data = Column(String) 
>  __mapper_args__ = {'polymorphic_identity': 'b1'} 
>
>
> class B2(A): 
>  __tablename__ = 'b2' 
>  id = Column(Integer, ForeignKey('a.id'), primary_key=True) 
>  b2 = Column(String) 
>  b_data = Column(String) 
>  cs = relationship("C", lazy='subquery') 
>
>  __mapper_args__ = {'polymorphic_identity': 'b2'} 
>
>
> class C(Base): 
>  __tablename__ = 'c' 
>  id = Column(Integer, primary_key=True) 
>  b2_id = Column(ForeignKey('b2.id')) 
>
>
> class B3(A): 
>  __tablename__ = 'b3' 
>  id = Column(Integer, ForeignKey('a.id'), primary_key=True) 
>  b3 = Column(String) 
>  b_data = Column(String) 
>  __mapper_args__ = {'polymorphic_identity': 'b3'} 
>
>
> def _loader_for_cls(target, context): 
>  orig_query = context.query 
>
>  target_cls = type(target) 
>
>  if orig_query.column_descriptions[0]['type'] is target_cls: 
>  return None 

Re: [sqlalchemy] Large number of polymorphic subclasses

2017-04-04 Thread mike bayer
This repaste of the example contains two refinements to address each of 
these issues distinctly.  We want to avoid re-entrant invocation of 
_loader_for_cls, so putting a flag into attributes handles that.   Then 
there's the issue of same class coming in that we are already querying, 
we can look at the type being loaded in column_descriptions, and this 
also solves the re-entrant problem though I think the flag is more 
direct.  The column_descriptions check likely needs more refinements, 
but the general idea is that if the entity being loaded is already a B2, 
then you wouldn't do any extra B2 queries (unless you have multiple 
levels of inheritance / polymorphic loading in which case that probably 
needs adjustment).


The "strong reference" comment, you can try commenting that out in the 
test and see what happens.  Because this loader trick loads all the 
subclasses up front upon seeing the first member of a particular 
subtype, it is loading for subsequent instances that aren't seen yet as 
well.  If we don't make a strong reference to them, they get lost and 
the extra attribute loading fails.


The feature I have in development works a lot better because all the 
additional loads are done *after* all the entities are loaded, and it 
does it using an IN query that only includes those entities that 
definitely need the load.  I'm not sure if there are other negative side 
effects from our loading of the "subclass" entity in some cases ahead of 
where the primary query gets at the entity.The recipe here could be 
made to do all the extra loads after the primary query but theres no 
"after query" hook, you'd have to subclass Query and override __iter__ 
to add this step.


from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import event

Base = declarative_base()


class A(Base):
__tablename__ = 'a'
id = Column(Integer, primary_key=True)
a1 = Column(String)
type = Column(String)

__mapper_args__ = {'polymorphic_on': type}


class B1(A):
__tablename__ = 'b1'
id = Column(Integer, ForeignKey('a.id'), primary_key=True)
b1 = Column(String)
b_data = Column(String)
__mapper_args__ = {'polymorphic_identity': 'b1'}


class B2(A):
__tablename__ = 'b2'
id = Column(Integer, ForeignKey('a.id'), primary_key=True)
b2 = Column(String)
b_data = Column(String)
cs = relationship("C", lazy='subquery')

__mapper_args__ = {'polymorphic_identity': 'b2'}


class C(Base):
__tablename__ = 'c'
id = Column(Integer, primary_key=True)
b2_id = Column(ForeignKey('b2.id'))


class B3(A):
__tablename__ = 'b3'
id = Column(Integer, ForeignKey('a.id'), primary_key=True)
b3 = Column(String)
b_data = Column(String)
__mapper_args__ = {'polymorphic_identity': 'b3'}


def _loader_for_cls(target, context):
orig_query = context.query

target_cls = type(target)

if orig_query.column_descriptions[0]['type'] is target_cls:
return None

# take the original query and chance the entity to the subclass
q = orig_query.with_entities(target_cls)

# defer everything that's not PK / polymorphic_on from A.  this whole
# bit is just to avoid all those extra columns
to_defer = []
mapper = inspect(target).mapper
inherited = mapper.inherits
while inherited is not None:
for attr in inherited.column_attrs:
if not attr.expression.primary_key and \
attr.expression is not inherited.polymorphic_on:
to_defer.append(attr.key)
for attr in inherited.relationships:
to_defer.append(attr.key)
inherited = inherited.inherits
q = q.options(*[defer(k) for k in to_defer])

q._attributes["load_extra_called"] = True

# store this strong reference so recs don't get lost while
# iterating
return q.all()


@event.listens_for(A, "load", propagate=True)
def load_extra(target, context):
if "load_extra_called" in context.attributes:
return

key = ('loader_by_cls', type(target))

if key not in context.attributes:
context.attributes[key] = _loader_for_cls(target, context)


e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)


s = Session(e)
s.add_all([
B1(b_data='b11', b1='b1', a1='b11'),
B2(b_data='b21', b2='b2', a1='b21', cs=[C(), C(), C()]),
B3(b_data='b31', b3='b3', a1='b31'),
B1(b_data='b12', b1='b1', a1='b12'),
B1(b_data='b13', b1='b1', a1='b13'),
B2(b_data='b22', b2='b2', a1='b22', cs=[C(), C()]),
B3(b_data='b32', b3='b3', a1='b12'),
B3(b_data='b33', b3='b3', a1='b33')
])
s.commit()


q = s.query(A).filter(A.a1.like('%2%'))

result = q.all()

print "- no more SQL "
for b in result:
if isinstance(b, B1):
print b.b1
elif isinstance(b, B2):
print b.cs
print b.b2
elif isinstance(b, B3):
print b.b3




On 04/04/2017 12:16 AM, da...@b

Re: [sqlalchemy] Large number of polymorphic subclasses

2017-04-03 Thread damon
Hey Mike,

Looks like I spoke too soon -- a few more questions:

Using the example code you posted, we're actually seeing 4 additional 
queries (one per result model), rather than the expected 3 (one per result 
model type). If you print context.query inside load_extra, I think it's 
clear why:

- the loader sequentially processes loaded models 
(https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/loading.py#L75)
- on processing the first B2, it executes a query to load all B2's data
- when processing the results of this new query, it now processes the 
second B2 record (under a new context)
- load_extra() gets called for the new B2 with a different context than the 
first B2
- this now executes _loader_for_cls

Note that if you don't call _loader_for_cls in load_extra, load_extra is 
called for each record using the correct context.

A related issue is that if you query directly for B2, it'll redundantly 
execute _loader_for_cls.

I think we could solve both of these issues by doing this instead:

q = orig_query.with_entities(target_cls)
if q == orig_query:
return

Unfortunately, the == operator doesn't work for comparing queries. Do you 
have a way to compare query equality, or alternatively have a solution to 
both of these issues?

Could you also elaborate on what you meant by this comment?

# store this strong reference so recs don't get lost while
# iterating


Thanks again for all your help,
Damon

On Monday, April 3, 2017 at 6:42:17 PM UTC-7, da...@benchling.com wrote:
>
> Thanks a ton for your help, Mike!
>
> We played around with it and are pretty happy with your solution using the 
> load() event, so we'll be using that moving forward.
>
>
> Damon
>
> On Wednesday, March 29, 2017 at 2:40:39 PM UTC-7, Mike Bayer wrote:
>>
>> I have a working version of both loading relationships via IN as well as 
>> loading joined inheritance subclasses via IN, including your great idea 
>> that extra eager loaders should continue to work for the subclass 
>> loaders. 
>>
>> I've only tested it with one scenario so far and both patches have a 
>> long way to go re: tests and documentation. 
>>
>> the scratch test case is currently in the commit message for the second 
>> patch, which illustrates a base class + subclass load where both classes 
>> have an additional relationship. This is at 
>> https://gerrit.sqlalchemy.org/#/c/359/, note this builds upon the 
>> previous gerrit at https://gerrit.sqlalchemy.org/#/c/352/.The work 
>> at the moment will likely fall down with any kind of surprises but if 
>> you wanted to start running it and finding those cases, that is always 
>> helpful. 
>>
>> So far this looks promising as something that can be in 1.2 with perhaps 
>> some "EXPERIMENTAL" warnings surrounding it, but overall 1.2 was lacking 
>> a "killer ORM feature" so these two would be it. 
>>
>>
>>
>> On 03/23/2017 06:02 PM, mike bayer wrote: 
>> > 
>> > 
>> > On 03/23/2017 02:40 PM, mike bayer wrote: 
>> >> 
>> >> 
>> >> On 03/23/2017 12:53 PM, da...@benchling.com wrote: 
>> >>> Hey Mike, 
>> >>> 
>> >>> Thanks for the quick response! 
>> >>> 
>> >>> For developers that are pretty familiar with the SQLAlchemy API, but 
>> not 
>> >>> so much the internals, would implementing the subqueryloads to 
>> >>> contribute to SA be a reasonable endeavor? Could you ballpark how 
>> much 
>> >>> time how long it might take for us to do it? 
>> >> 
>> >> I haven't looked at what this would take, but it would be intricate 
>> and 
>> >> also need a lot of tests that are not easy to write.Like, if you 
>> >> worked on it, you could probably get something working, but then that 
>> >> probably wouldn't be how it really needs to be because all kinds of 
>> >> things that are simple for simple cases don't work with the vast 
>> amount 
>> >> of edge cases which we have. 
>> > 
>> > I've created an issue for both a new relationship loader and an 
>> > inheritance loader at the same time, since they will use very similar 
>> > paths, at https://bitbucket.org/zzzeek/sqlalchemy/issues/3944.A 
>> POC 
>> > for the relationship loader so far looks to be very simple (but then 
>> > again all the loaders start out very simple...) and is at 
>> > https://gerrit.sqlalchemy.org/352.   The same infrastructure and 
>> > approach would also be used for the mapper inheritance loader, which 
>> > would be enabled via a mapper()-level flag, as well as per-query using 
>> a 
>> > new query option. 
>> > 
>> > I can't yet guarantee this will be a 1.2 thing, I'd have to get some 
>> > more momentum going w/ test cases and all that.  In a 1.2 release it 
>> > would also be a little bit experimental as new loading styles usually 
>> > have small issues coming up for months or years, as people try more use 
>> > cases. 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> >> 
>> >> The routine that's loading the additional columns just for one object 
>> at 
>> >> a time is here: 
>> >> 
>> >> 
>> https:/

Re: [sqlalchemy] Large number of polymorphic subclasses

2017-04-03 Thread damon
Thanks a ton for your help, Mike!

We played around with it and are pretty happy with your solution using the 
load() event, so we'll be using that moving forward.


Damon

On Wednesday, March 29, 2017 at 2:40:39 PM UTC-7, Mike Bayer wrote:
>
> I have a working version of both loading relationships via IN as well as 
> loading joined inheritance subclasses via IN, including your great idea 
> that extra eager loaders should continue to work for the subclass 
> loaders. 
>
> I've only tested it with one scenario so far and both patches have a 
> long way to go re: tests and documentation. 
>
> the scratch test case is currently in the commit message for the second 
> patch, which illustrates a base class + subclass load where both classes 
> have an additional relationship. This is at 
> https://gerrit.sqlalchemy.org/#/c/359/, note this builds upon the 
> previous gerrit at https://gerrit.sqlalchemy.org/#/c/352/.The work 
> at the moment will likely fall down with any kind of surprises but if 
> you wanted to start running it and finding those cases, that is always 
> helpful. 
>
> So far this looks promising as something that can be in 1.2 with perhaps 
> some "EXPERIMENTAL" warnings surrounding it, but overall 1.2 was lacking 
> a "killer ORM feature" so these two would be it. 
>
>
>
> On 03/23/2017 06:02 PM, mike bayer wrote: 
> > 
> > 
> > On 03/23/2017 02:40 PM, mike bayer wrote: 
> >> 
> >> 
> >> On 03/23/2017 12:53 PM, da...@benchling.com  wrote: 
> >>> Hey Mike, 
> >>> 
> >>> Thanks for the quick response! 
> >>> 
> >>> For developers that are pretty familiar with the SQLAlchemy API, but 
> not 
> >>> so much the internals, would implementing the subqueryloads to 
> >>> contribute to SA be a reasonable endeavor? Could you ballpark how much 
> >>> time how long it might take for us to do it? 
> >> 
> >> I haven't looked at what this would take, but it would be intricate and 
> >> also need a lot of tests that are not easy to write.Like, if you 
> >> worked on it, you could probably get something working, but then that 
> >> probably wouldn't be how it really needs to be because all kinds of 
> >> things that are simple for simple cases don't work with the vast amount 
> >> of edge cases which we have. 
> > 
> > I've created an issue for both a new relationship loader and an 
> > inheritance loader at the same time, since they will use very similar 
> > paths, at https://bitbucket.org/zzzeek/sqlalchemy/issues/3944.A POC 
> > for the relationship loader so far looks to be very simple (but then 
> > again all the loaders start out very simple...) and is at 
> > https://gerrit.sqlalchemy.org/352.   The same infrastructure and 
> > approach would also be used for the mapper inheritance loader, which 
> > would be enabled via a mapper()-level flag, as well as per-query using a 
> > new query option. 
> > 
> > I can't yet guarantee this will be a 1.2 thing, I'd have to get some 
> > more momentum going w/ test cases and all that.  In a 1.2 release it 
> > would also be a little bit experimental as new loading styles usually 
> > have small issues coming up for months or years, as people try more use 
> > cases. 
> > 
> > 
> > 
> > 
> > 
> > 
> >> 
> >> The routine that's loading the additional columns just for one object 
> at 
> >> a time is here: 
> >> 
> >> 
> https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/loading.py#L635
>  
> >> 
> >> and then here for the bulk of it: 
> >> 
> https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/mapper.py#L2588
>  
> >> 
> >> 
> >> 
> >> But the much harder part would be how to work this step into the 
> loading 
> >> infrastructure, which would be somewhere in 
> >> 
> https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/loading.py#L273,
>  
>
> >> 
> >> which is a very intricate function with over a decade of constant 
> >> refactorings behind it, and I'd have to think pretty deeply for awhile 
> >> how best to do this. 
> >> 
> >> Not to mention that there's more than one way to do this query, there's 
> >> either re-using the criteria from the original query, or there's 
> >> injecting the primary key ids of the whole list of objects into an IN 
> >> clause after the fact.  The latter approach is probably more efficient 
> >> but wouldn't work for composite primary keys outside of Postgresql.   
> As 
> >> a built in feature I'd want "IN" loading to be an option at least. 
> >> 
> >> 
> >> 
> >>> 
> >>> Regarding STI and relationships, is there any way to do that but still 
> >>> get the benefits of JTI? e.g. is there an easy way to resolve 
> >>> my_base_class_inst.subclass_prop as a proxy to the subclass? We could 
> >>> roll our own using __getitem__ but it seems a little hairy. 
> >> 
> >> So proxying to a related item wouldn't be terrible, sure you could use 
> >> __getitem__ or you could also add descriptors to the primary class, 
> >> adding the descriptors to match the "info" could be automa

Re: [sqlalchemy] Large number of polymorphic subclasses

2017-03-29 Thread mike bayer
I have a working version of both loading relationships via IN as well as 
loading joined inheritance subclasses via IN, including your great idea 
that extra eager loaders should continue to work for the subclass 
loaders.


I've only tested it with one scenario so far and both patches have a 
long way to go re: tests and documentation.


the scratch test case is currently in the commit message for the second 
patch, which illustrates a base class + subclass load where both classes 
have an additional relationship. This is at 
https://gerrit.sqlalchemy.org/#/c/359/, note this builds upon the 
previous gerrit at https://gerrit.sqlalchemy.org/#/c/352/.The work 
at the moment will likely fall down with any kind of surprises but if 
you wanted to start running it and finding those cases, that is always 
helpful.


So far this looks promising as something that can be in 1.2 with perhaps 
some "EXPERIMENTAL" warnings surrounding it, but overall 1.2 was lacking 
a "killer ORM feature" so these two would be it.




On 03/23/2017 06:02 PM, mike bayer wrote:



On 03/23/2017 02:40 PM, mike bayer wrote:



On 03/23/2017 12:53 PM, da...@benchling.com wrote:

Hey Mike,

Thanks for the quick response!

For developers that are pretty familiar with the SQLAlchemy API, but not
so much the internals, would implementing the subqueryloads to
contribute to SA be a reasonable endeavor? Could you ballpark how much
time how long it might take for us to do it?


I haven't looked at what this would take, but it would be intricate and
also need a lot of tests that are not easy to write.Like, if you
worked on it, you could probably get something working, but then that
probably wouldn't be how it really needs to be because all kinds of
things that are simple for simple cases don't work with the vast amount
of edge cases which we have.


I've created an issue for both a new relationship loader and an
inheritance loader at the same time, since they will use very similar
paths, at https://bitbucket.org/zzzeek/sqlalchemy/issues/3944.A POC
for the relationship loader so far looks to be very simple (but then
again all the loaders start out very simple...) and is at
https://gerrit.sqlalchemy.org/352.   The same infrastructure and
approach would also be used for the mapper inheritance loader, which
would be enabled via a mapper()-level flag, as well as per-query using a
new query option.

I can't yet guarantee this will be a 1.2 thing, I'd have to get some
more momentum going w/ test cases and all that.  In a 1.2 release it
would also be a little bit experimental as new loading styles usually
have small issues coming up for months or years, as people try more use
cases.








The routine that's loading the additional columns just for one object at
a time is here:

https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/loading.py#L635

and then here for the bulk of it:
https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/mapper.py#L2588



But the much harder part would be how to work this step into the loading
infrastructure, which would be somewhere in
https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/loading.py#L273,

which is a very intricate function with over a decade of constant
refactorings behind it, and I'd have to think pretty deeply for awhile
how best to do this.

Not to mention that there's more than one way to do this query, there's
either re-using the criteria from the original query, or there's
injecting the primary key ids of the whole list of objects into an IN
clause after the fact.  The latter approach is probably more efficient
but wouldn't work for composite primary keys outside of Postgresql.   As
a built in feature I'd want "IN" loading to be an option at least.





Regarding STI and relationships, is there any way to do that but still
get the benefits of JTI? e.g. is there an easy way to resolve
my_base_class_inst.subclass_prop as a proxy to the subclass? We could
roll our own using __getitem__ but it seems a little hairy.


So proxying to a related item wouldn't be terrible, sure you could use
__getitem__ or you could also add descriptors to the primary class,
adding the descriptors to match the "info" could be automated as well
(or even do it in the other direction, add special descriptor to main
class == a column gets added to related class).


Doing the thing where you query() for all the related classes after the
fact yourself might not be that terrible.   you can use the load() event
which receives the query context that has the query you need already.  I
guess it's time for proof of concept. Here's that.  You can see at
the end we load all the "bs" without any per-object load.

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import event

Base = declarative_base()


class A(Base):
__tablename__ = 'a'
id = Column(Integer, primary_key=True)
a1 = Column(Stri

Re: [sqlalchemy] Large number of polymorphic subclasses

2017-03-23 Thread mike bayer



On 03/23/2017 02:40 PM, mike bayer wrote:



On 03/23/2017 12:53 PM, da...@benchling.com wrote:

Hey Mike,

Thanks for the quick response!

For developers that are pretty familiar with the SQLAlchemy API, but not
so much the internals, would implementing the subqueryloads to
contribute to SA be a reasonable endeavor? Could you ballpark how much
time how long it might take for us to do it?


I haven't looked at what this would take, but it would be intricate and
also need a lot of tests that are not easy to write.Like, if you
worked on it, you could probably get something working, but then that
probably wouldn't be how it really needs to be because all kinds of
things that are simple for simple cases don't work with the vast amount
of edge cases which we have.


I've created an issue for both a new relationship loader and an 
inheritance loader at the same time, since they will use very similar 
paths, at https://bitbucket.org/zzzeek/sqlalchemy/issues/3944.A POC 
for the relationship loader so far looks to be very simple (but then 
again all the loaders start out very simple...) and is at 
https://gerrit.sqlalchemy.org/352.   The same infrastructure and 
approach would also be used for the mapper inheritance loader, which 
would be enabled via a mapper()-level flag, as well as per-query using a 
new query option.


I can't yet guarantee this will be a 1.2 thing, I'd have to get some 
more momentum going w/ test cases and all that.  In a 1.2 release it 
would also be a little bit experimental as new loading styles usually 
have small issues coming up for months or years, as people try more use 
cases.









The routine that's loading the additional columns just for one object at
a time is here:

https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/loading.py#L635
and then here for the bulk of it:
https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/mapper.py#L2588


But the much harder part would be how to work this step into the loading
infrastructure, which would be somewhere in
https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/loading.py#L273,
which is a very intricate function with over a decade of constant
refactorings behind it, and I'd have to think pretty deeply for awhile
how best to do this.

Not to mention that there's more than one way to do this query, there's
either re-using the criteria from the original query, or there's
injecting the primary key ids of the whole list of objects into an IN
clause after the fact.  The latter approach is probably more efficient
but wouldn't work for composite primary keys outside of Postgresql.   As
a built in feature I'd want "IN" loading to be an option at least.





Regarding STI and relationships, is there any way to do that but still
get the benefits of JTI? e.g. is there an easy way to resolve
my_base_class_inst.subclass_prop as a proxy to the subclass? We could
roll our own using __getitem__ but it seems a little hairy.


So proxying to a related item wouldn't be terrible, sure you could use
__getitem__ or you could also add descriptors to the primary class,
adding the descriptors to match the "info" could be automated as well
(or even do it in the other direction, add special descriptor to main
class == a column gets added to related class).


Doing the thing where you query() for all the related classes after the
fact yourself might not be that terrible.   you can use the load() event
which receives the query context that has the query you need already.  I
guess it's time for proof of concept. Here's that.  You can see at
the end we load all the "bs" without any per-object load.

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import event

Base = declarative_base()


class A(Base):
__tablename__ = 'a'
id = Column(Integer, primary_key=True)
a1 = Column(String)
type = Column(String)

__mapper_args__ = {'polymorphic_on': type}


class B1(A):
__tablename__ = 'b1'
id = Column(Integer, ForeignKey('a.id'), primary_key=True)
b1 = Column(String)
b_data = Column(String)
__mapper_args__ = {'polymorphic_identity': 'b1'}


class B2(A):
__tablename__ = 'b2'
id = Column(Integer, ForeignKey('a.id'), primary_key=True)
b2 = Column(String)
b_data = Column(String)
cs = relationship("C", lazy='subquery')

__mapper_args__ = {'polymorphic_identity': 'b2'}


class C(Base):
__tablename__ = 'c'
id = Column(Integer, primary_key=True)
b2_id = Column(ForeignKey('b2.id'))


class B3(A):
__tablename__ = 'b3'
id = Column(Integer, ForeignKey('a.id'), primary_key=True)
b3 = Column(String)
b_data = Column(String)
__mapper_args__ = {'polymorphic_identity': 'b3'}


def _loader_for_cls(target, context):
orig_query = context.query

target_cls = type(target)

# take the original query and chance the entity to the subclass
q = orig_query.w

Re: [sqlalchemy] Large number of polymorphic subclasses

2017-03-23 Thread mike bayer



On 03/23/2017 12:53 PM, da...@benchling.com wrote:

Hey Mike,

Thanks for the quick response!

For developers that are pretty familiar with the SQLAlchemy API, but not
so much the internals, would implementing the subqueryloads to
contribute to SA be a reasonable endeavor? Could you ballpark how much
time how long it might take for us to do it?


I haven't looked at what this would take, but it would be intricate and 
also need a lot of tests that are not easy to write.Like, if you 
worked on it, you could probably get something working, but then that 
probably wouldn't be how it really needs to be because all kinds of 
things that are simple for simple cases don't work with the vast amount 
of edge cases which we have.


The routine that's loading the additional columns just for one object at 
a time is here:


https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/loading.py#L635 
and then here for the bulk of it: 
https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/mapper.py#L2588


But the much harder part would be how to work this step into the loading 
infrastructure, which would be somewhere in 
https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/loading.py#L273, 
which is a very intricate function with over a decade of constant 
refactorings behind it, and I'd have to think pretty deeply for awhile

how best to do this.

Not to mention that there's more than one way to do this query, there's 
either re-using the criteria from the original query, or there's 
injecting the primary key ids of the whole list of objects into an IN 
clause after the fact.  The latter approach is probably more efficient 
but wouldn't work for composite primary keys outside of Postgresql.   As 
a built in feature I'd want "IN" loading to be an option at least.






Regarding STI and relationships, is there any way to do that but still
get the benefits of JTI? e.g. is there an easy way to resolve
my_base_class_inst.subclass_prop as a proxy to the subclass? We could
roll our own using __getitem__ but it seems a little hairy.


So proxying to a related item wouldn't be terrible, sure you could use 
__getitem__ or you could also add descriptors to the primary class, 
adding the descriptors to match the "info" could be automated as well 
(or even do it in the other direction, add special descriptor to main 
class == a column gets added to related class).



Doing the thing where you query() for all the related classes after the 
fact yourself might not be that terrible.   you can use the load() event 
which receives the query context that has the query you need already.  I 
guess it's time for proof of concept. Here's that.  You can see at 
the end we load all the "bs" without any per-object load.


from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import event

Base = declarative_base()


class A(Base):
__tablename__ = 'a'
id = Column(Integer, primary_key=True)
a1 = Column(String)
type = Column(String)

__mapper_args__ = {'polymorphic_on': type}


class B1(A):
__tablename__ = 'b1'
id = Column(Integer, ForeignKey('a.id'), primary_key=True)
b1 = Column(String)
b_data = Column(String)
__mapper_args__ = {'polymorphic_identity': 'b1'}


class B2(A):
__tablename__ = 'b2'
id = Column(Integer, ForeignKey('a.id'), primary_key=True)
b2 = Column(String)
b_data = Column(String)
cs = relationship("C", lazy='subquery')

__mapper_args__ = {'polymorphic_identity': 'b2'}


class C(Base):
__tablename__ = 'c'
id = Column(Integer, primary_key=True)
b2_id = Column(ForeignKey('b2.id'))


class B3(A):
__tablename__ = 'b3'
id = Column(Integer, ForeignKey('a.id'), primary_key=True)
b3 = Column(String)
b_data = Column(String)
__mapper_args__ = {'polymorphic_identity': 'b3'}


def _loader_for_cls(target, context):
orig_query = context.query

target_cls = type(target)

# take the original query and chance the entity to the subclass
q = orig_query.with_entities(target_cls)

# defer everything that's not PK / polymorphic_on from A.  this whole
# bit is just to avoid all those extra columns
to_defer = []
mapper = inspect(target).mapper
inherited = mapper.inherits
while inherited is not None:
for attr in inherited.column_attrs:
if not attr.expression.primary_key and \
attr.expression is not inherited.polymorphic_on:
to_defer.append(attr.key)
for attr in inherited.relationships:
to_defer.append(attr.key)
inherited = inherited.inherits
q = q.options(*[defer(k) for k in to_defer])

# store this strong reference so recs don't get lost while
# iterating
return q.all()


@event.listens_for(A, "load", propagate=True)
def load_extra(target, context):
key = ('loader_by_cls', type(target))

if key 

Re: [sqlalchemy] Large number of polymorphic subclasses

2017-03-23 Thread damon
Hey Mike,

Thanks for the quick response!

For developers that are pretty familiar with the SQLAlchemy API, but not so 
much the internals, would implementing the subqueryloads to contribute to 
SA be a reasonable endeavor? Could you ballpark how much time how long it 
might take for us to do it?

Regarding STI and relationships, is there any way to do that but still get 
the benefits of JTI? e.g. is there an easy way to resolve 
my_base_class_inst.subclass_prop as a proxy to the subclass? We could roll 
our own using __getitem__ but it seems a little hairy.


Thanks again,
Damon

On Wednesday, March 22, 2017 at 3:59:45 PM UTC-7, Mike Bayer wrote:
>
>
>
> On 03/22/2017 02:17 PM, da...@benchling.com  wrote: 
> > Hey all, 
> > 
> > We were wondering if you had any advice on having a large (~10) number 
> > of polymorphic subclasses for a single base class. Using 
> > with_polymorphic: '*' causes SQLAlchemy to joinedload all subclasses 
> > like this: 
> > 
> > SELECT ... 
> > FROM base_table 
> > LEFT OUTER JOIN sub_table_1 ON base_table.id = sub_table_1.id 
> > LEFT OUTER JOIN sub_table_2 ON base_table.id = sub_table_2.id 
> > ... 
> > 
> > Postgres buckles under too many joins, and these queries start taking a 
> > really long time. 
> > 
> > One other note is that for most of our queries, only a few of these 
> > sub-tables are actually needed, so most of the joins are wasted. 
> > Unfortunately, ahead of time, we don't know which tables will be needed 
> > -- we're relying on the discriminator. 
> > 
> > Ideally, we'd be able to specify that the ORM should subqueryload the 
> > subclasses (and only execute subqueries on the types that are present). 
> > This would have to happen both when querying the base table, but also 
> > when accessing relationships. We'd want it to execute a query on the 
> > base table, then execute one query for each present subclass. 
> > 
> > Another solution might be to use some kind of hook that 
> > 
> > - is executed after a query returns with results (or after a list of 
> > models are added to the session?) 
> > - groups the models by type and runs its own subqueries to load the data 
> > 
> > Any help here is greatly appreciated! 
>
>
> The purpose of with_polymorphic is more about being able to filter on 
> multiple classes at the same time, which is why it uses joins, but these 
> don't scale to many subclasses.Adding a subquery load for the 
> related tables would be something that the ORM can someday have as a 
> feature, but it would need a lot of tests to ensure it's working as 
> advertised. 
>
> There's a lot of ways to get those other tables loaded but none of them 
> look that great.   Turning off with_polymorphic(), one approach is to 
> collect all the distinct types and identifiers from your query result; 
> then do a separate query for each subtype: 
>
> result = session.query(BaseClass).filter(...).all() 
>
>  types = sorted([(type(obj), obj.id) for obj in result], 
> key=lambda t: t[0]) 
>
>  for type, ids in itertools.groupby(types, key=lambda t: t[0]): 
>  session.query(type).filter(type.id.in_(ids)).all() 
>
> That will emit a query with an INNER JOIN for each class and will 
> populate the remaining records in the identity map.  The columns that 
> are already loaded are not re-accessed, though the DBAPI will still send 
> them over the network to the cursor.   You can try limiting the columns 
> you query for in each statement as well by using the defer() option. 
>
> Another way is to use with_polymorphic() but to provide a different kind 
> of SQL statement, like a polymorphic_union().   This would be a UNION of 
> statements that each have an inner join.   the resulting SQL is a beast 
> but it at least isn't using those left outer joins.   I think you can 
> probably use sqlalchemy.orm.util.polymorphic_union() directly to get 
> this UNION statement built up automatically. 
>
> Still another way is to reorganize the mappings to use single-table 
> inheritance and relationship() to link out to the related table, then 
> the normal "subqueryload" feature to load them as relationships.   Even 
> though this way is ugly, I might use this (short of implementing the 
> related table subqueryload feature) just to make things simple. 
>
>
> Definitely a feature that should be added but that's not an immediate 
> solution. 
>
> > 
> > 
> > Thanks, 
> > Damon 
> > 
> > -- 
> > SQLAlchemy - 
> > The Python SQL Toolkit and Object Relational Mapper 
> > 
> > http://www.sqlalchemy.org/ 
> > 
> > To post example code, please provide an MCVE: Minimal, Complete, and 
> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> > description. 
> > --- 
> > You received this message because you are subscribed to the Google 
> > Groups "sqlalchemy" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> > an email to sqlalchemy+...@googlegroups.com  
> > 

Re: [sqlalchemy] Large number of polymorphic subclasses

2017-03-22 Thread mike bayer



On 03/22/2017 02:17 PM, da...@benchling.com wrote:

Hey all,

We were wondering if you had any advice on having a large (~10) number
of polymorphic subclasses for a single base class. Using
with_polymorphic: '*' causes SQLAlchemy to joinedload all subclasses
like this:

SELECT ...
FROM base_table
LEFT OUTER JOIN sub_table_1 ON base_table.id = sub_table_1.id
LEFT OUTER JOIN sub_table_2 ON base_table.id = sub_table_2.id
...

Postgres buckles under too many joins, and these queries start taking a
really long time.

One other note is that for most of our queries, only a few of these
sub-tables are actually needed, so most of the joins are wasted.
Unfortunately, ahead of time, we don't know which tables will be needed
-- we're relying on the discriminator.

Ideally, we'd be able to specify that the ORM should subqueryload the
subclasses (and only execute subqueries on the types that are present).
This would have to happen both when querying the base table, but also
when accessing relationships. We'd want it to execute a query on the
base table, then execute one query for each present subclass.

Another solution might be to use some kind of hook that

- is executed after a query returns with results (or after a list of
models are added to the session?)
- groups the models by type and runs its own subqueries to load the data

Any help here is greatly appreciated!



The purpose of with_polymorphic is more about being able to filter on 
multiple classes at the same time, which is why it uses joins, but these 
don't scale to many subclasses.Adding a subquery load for the 
related tables would be something that the ORM can someday have as a 
feature, but it would need a lot of tests to ensure it's working as 
advertised.


There's a lot of ways to get those other tables loaded but none of them 
look that great.   Turning off with_polymorphic(), one approach is to 
collect all the distinct types and identifiers from your query result; 
then do a separate query for each subtype:


result = session.query(BaseClass).filter(...).all()

types = sorted([(type(obj), obj.id) for obj in result], 
key=lambda t: t[0])


for type, ids in itertools.groupby(types, key=lambda t: t[0]):
session.query(type).filter(type.id.in_(ids)).all()

That will emit a query with an INNER JOIN for each class and will 
populate the remaining records in the identity map.  The columns that 
are already loaded are not re-accessed, though the DBAPI will still send 
them over the network to the cursor.   You can try limiting the columns 
you query for in each statement as well by using the defer() option.


Another way is to use with_polymorphic() but to provide a different kind 
of SQL statement, like a polymorphic_union().   This would be a UNION of 
statements that each have an inner join.   the resulting SQL is a beast 
but it at least isn't using those left outer joins.   I think you can 
probably use sqlalchemy.orm.util.polymorphic_union() directly to get 
this UNION statement built up automatically.


Still another way is to reorganize the mappings to use single-table 
inheritance and relationship() to link out to the related table, then 
the normal "subqueryload" feature to load them as relationships.   Even 
though this way is ugly, I might use this (short of implementing the 
related table subqueryload feature) just to make things simple.



Definitely a feature that should be added but that's not an immediate 
solution.





Thanks,
Damon

--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and
Verifiable Example. See http://stackoverflow.com/help/mcve for a full
description.
---
You received this message because you are subscribed to the Google
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to sqlalchemy+unsubscr...@googlegroups.com
.
To post to this group, send email to sqlalchemy@googlegroups.com
.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


--
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper


http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.