Re: [sqlalchemy] SA0.7.9 - poor query performance (full scan instead of index scan because of joined table inheritance)

2013-06-17 Thread Ladislav Lenart
Hello.

Here is the fully self-contained regression of the issue, including the
workaround for SA 0.7.9. Thank you again, because I wouldn't figure it out
without your help (the select_from part). I haven't tried it on SA 0.9.

If you have any questions, please ask.

HTH,

Ladislav Lenart


On 14.6.2013 11:18, Ladislav Lenart wrote:
 Hello.
 
 
 On 13.6.2013 18:44, Michael Bayer wrote:

 On Jun 13, 2013, at 11:03 AM, Ladislav Lenart lenart...@volny.cz wrote:

 Unfortunately migrating to SA 0.9 is not an option for me at the moment due 
 to
 severe time constraints.

 I was pretty sure you'd say that, though I'm really looking to verify that 
 my fixes are going to hold up under real world usage.   The issues you're 
 having are real issues, and they've been fixed.
 
 I will make a 1:1 self-containted test case for this issue during the next 
 week,
 I promise!
 
 
 Could you please help me write SA query for 0.7.9 that uses index scan and 
 also
 loads all the necessary relations? It must be possible with a proper use of
 from_statement(), contains_eager() and/or other SA features. It is just 
 that toy
 examples in the documentation don't help me much with this complex beast.

 you use the SQL expression language in conjunction with .join()/outerjoin(), 
 pass to query.select_from(), then use contains_eager():

 j = 
 Foo.__table__.outerjoin(Bar.__table__.outerjoin(BarA.__table__).outerjoin(Data.__table__))

 q = s.query(Foo).\
 select_from(j).\
 filter(Foo.id.in_([1, 2, 3])).\
 options(
 contains_eager(Foo.bar),
 contains_eager(Foo.bar.of_type(BarA), BarA.data)
 )
 
 Thank you very much! I am going to try it :-)
 
 
 Ladislav Lenart

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.


# coding=utf-8

Standalone regression of the full table-scan problem in combination with joined 
table inheritance. Applies to SA 0.7.9 at least.

See: Issue, Workaround, main_issue, main_workaround and main_fill.

The rest is support code and/or ORM definitions.

from sqlalchemy import Column, ForeignKey, UniqueConstraint, CheckConstraint,\
Integer, Unicode, String, Date
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.engine import create_engine
from sqlalchemy.sql.expression import distinct
from sqlalchemy.orm.session import sessionmaker
from sqlalchemy.orm import relationship, subqueryload_all, subqueryload,\
joinedload_all, contains_eager
from sqlalchemy.orm.util import aliased


Base = declarative_base()
session = None


class Contact(Base):
__tablename__ = 'Contact'
id = Column(Integer(), primary_key=True)
type = Column(Integer(), nullable=False) # 1 - personal, 2 - corporate
parent_id = Column(Integer(), ForeignKey('Partner.id', ondelete='CASCADE'), nullable=False)
parent = relationship('Partner', back_populates='contacts', primaryjoin='Contact.parent_id == Partner.id', remote_side='Partner.id', uselist=False)
contact_tags = relationship('ContactTag', back_populates='contact', primaryjoin='Contact.id == ContactTag.contact_id', cascade='all, delete-orphan', passive_updates=True, passive_deletes=True)
phones = relationship('Phone', back_populates='contact', primaryjoin='Contact.id == Phone.contact_id', cascade='all, delete-orphan', passive_updates=True, passive_deletes=True)
emails = relationship('Email', back_populates='contact', primaryjoin='Contact.id == Email.contact_id', cascade='all, delete-orphan', passive_updates=True, passive_deletes=True)
__mapper_args__ = {'polymorphic_on': type}
__table_args__ = (CheckConstraint(u'type in (1, 2)'),)

class ContactTag(Base):
__tablename__ = 'ContactTag'
id = Column(Integer(), primary_key=True)
tag_id = Column(Integer(), ForeignKey('Tag.id', ondelete='CASCADE'), nullable=False)
tag = relationship('Tag', primaryjoin='ContactTag.tag_id == Tag.id', remote_side='Tag.id', uselist=False)
contact_id = Column(Integer(), ForeignKey('Contact.id', ondelete='CASCADE'), nullable=False)
contact = relationship('Contact', back_populates='contact_tags', primaryjoin='ContactTag.contact_id == Contact.id', remote_side='Contact.id', uselist=False)
__table_args__ = (UniqueConstraint('contact_id', 'tag_id'),)

class Tag(Base):
__tablename__ = 'Tag'
id = Column(Integer(), primary_key=True)
partner_id = Column(Integer(), ForeignKey('Partner.id', ondelete='CASCADE'), nullable=False, index=True)
partner = relationship('Partner', back_populates='tags', primaryjoin='Tag.partner_id == Partner.id', remote_side='Partner.id', uselist=False)
label = 

Re: [sqlalchemy] SA0.7.9 - poor query performance (full scan instead of index scan because of joined table inheritance)

2013-06-17 Thread Ladislav Lenart
Hello.

I ended up with the following query:

@classmethod
def _find_contacts_fetch_window(cls, contact_cls, win):
Special data-fetching query for contacts and all their related info
including tags, partner, client,...

NOTE: We build the FROM part entirely by hand, because SA generates bad
SQL for postgres. It does a FULL SCAN of client and personal_client /
corporate_client even though it reads at most window_size rows from
them. All this because SA inheritance creates a subselect which leads
to the full scan.

# win .. list of ids
# contact_cls .. PersonalContact / CorporateContact
client_cls = contact_cls.client_cls() # PersonalClient / CorporateClient
data_cls = client_cls.data_cls() # PersonalData / CorporateData

# We need TABLEs to build the FROM part by hand.
# We reference PersonalData/CorporateData and Address twice, hence we
# need to alias them.
# We also need their aliased ORM classes for contains_eager() to work.
contact_table = Contact.__table__
contact_subtable = contact_cls.__table__
client_table = Client.__table__
personal_client_table = client_cls.__table__
partner_table = Partner.__table__
partner_data_table = PersonalData.__table__.alias(name='partner_data')
partner_address_table = Address.__table__.alias(name='partner_address')
client_data_table = data_cls.__table__.alias(name='client_data')
client_address_table = Address.__table__.alias(name='client_address')
partner_data = aliased(PersonalData, partner_data_table)
partner_address = aliased(Address, partner_address_table)
client_data = aliased(data_cls, client_data_table)
client_address = aliased(Address, client_address_table)
select_from = contact_table.join(
contact_subtable,
contact_table.c.id == contact_subtable.c.id
).outerjoin(
client_table,
contact_subtable.c.client_id == client_table.c.id
).outerjoin(
personal_client_table,
client_table.c.id == personal_client_table.c.id
).outerjoin(
client_data_table,
personal_client_table.c.data_id == client_data_table.c.id
).outerjoin(
client_address_table,
client_data_table.c.address_id == client_address_table.c.id
)
options = [
subqueryload_all(contact_cls.contact_tags, ContactTag.tag),
subqueryload(contact_cls.phones),
subqueryload(contact_cls.emails),
contains_eager(contact_cls.client),
contains_eager(contact_cls.client, client_cls.data, 
alias=client_data),
contains_eager(contact_cls.client, client_cls.data,
client_data.address, alias=client_address),
]
if contact_cls is PersonalContact:
select_from = select_from.outerjoin(
partner_table,
contact_subtable.c.partner_id == partner_table.c.id
).outerjoin(
partner_data_table,
partner_table.c.personal_data_id == partner_data_table.c.id
).outerjoin(
partner_address_table,
partner_data_table.c.address_id == partner_address_table.c.id
)
options.extend([
contains_eager(contact_cls.partner),
contains_eager(contact_cls.partner, Partner.personal_data,
alias=partner_data),
contains_eager(contact_cls.partner, Partner.personal_data,
partner_data.address, alias=partner_address),
])
q = session.query(contact_cls).select_from(select_from)
q = q.filter(contact_cls.id.in_(win))
q = q.options(*options)
return q


It works great though it took me quite a while to get right, especially the
aliasing between tables and classes. Also, I wouldn't figure it out without your
help (I mean the part with select_from).

Thank you again,

Ladislav Lenart


On 13.6.2013 18:44, Michael Bayer wrote:
 
 On Jun 13, 2013, at 11:03 AM, Ladislav Lenart lenart...@volny.cz wrote:
 
 Unfortunately migrating to SA 0.9 is not an option for me at the moment due 
 to
 severe time constraints.
 
 I was pretty sure you'd say that, though I'm really looking to verify that my 
 fixes are going to hold up under real world usage.   The issues you're having 
 are real issues, and they've been fixed.
 

 Could you please help me write SA query for 0.7.9 that uses index scan and 
 also
 loads all the necessary relations? It must be possible with a proper use of
 from_statement(), contains_eager() and/or other SA features. It is just that 
 toy
 examples in the documentation don't help me much with this complex beast.
 
 you use the SQL expression language in conjunction with .join()/outerjoin(), 
 pass to query.select_from(), then use contains_eager():

Re: [sqlalchemy] SA0.7.9 - poor query performance (full scan instead of index scan because of joined table inheritance)

2013-06-17 Thread Ladislav Lenart
Hello.

I ended up with the following query:

@classmethod
def _find_contacts_fetch_window(cls, contact_cls, win):
Special data-fetching query for contacts and all their related info
including tags, partner, client,...

NOTE: We build the FROM part entirely by hand, because SA generates bad
SQL for postgres. It does a FULL SCAN of client and personal_client /
corporate_client even though it reads at most window_size rows from
them. All this because SA inheritance creates a subselect which leads
to the full scan.

# win .. list of ids
# contact_cls .. PersonalContact / CorporateContact
client_cls = contact_cls.client_cls() # PersonalClient / CorporateClient
data_cls = client_cls.data_cls() # PersonalData / CorporateData

# We need TABLEs to build the FROM part by hand.
# We reference PersonalData/CorporateData and Address twice, hence we
# need to alias them.
# We also need their aliased ORM classes for contains_eager() to work.
contact_table = Contact.__table__
contact_subtable = contact_cls.__table__
client_table = Client.__table__
personal_client_table = client_cls.__table__
partner_table = Partner.__table__
partner_data_table = PersonalData.__table__.alias(name='partner_data')
partner_address_table = Address.__table__.alias(name='partner_address')
client_data_table = data_cls.__table__.alias(name='client_data')
client_address_table = Address.__table__.alias(name='client_address')
partner_data = aliased(PersonalData, partner_data_table)
partner_address = aliased(Address, partner_address_table)
client_data = aliased(data_cls, client_data_table)
client_address = aliased(Address, client_address_table)
select_from = contact_table.join(
contact_subtable,
contact_table.c.id == contact_subtable.c.id
).outerjoin(
client_table,
contact_subtable.c.client_id == client_table.c.id
).outerjoin(
personal_client_table,
client_table.c.id == personal_client_table.c.id
).outerjoin(
client_data_table,
personal_client_table.c.data_id == client_data_table.c.id
).outerjoin(
client_address_table,
client_data_table.c.address_id == client_address_table.c.id
)
options = [
subqueryload_all(contact_cls.contact_tags, ContactTag.tag),
subqueryload(contact_cls.phones),
subqueryload(contact_cls.emails),
contains_eager(contact_cls.client),
contains_eager(contact_cls.client, client_cls.data, 
alias=client_data),
contains_eager(contact_cls.client, client_cls.data,
client_data.address, alias=client_address),
]
if contact_cls is PersonalContact:
select_from = select_from.outerjoin(
partner_table,
contact_subtable.c.partner_id == partner_table.c.id
).outerjoin(
partner_data_table,
partner_table.c.personal_data_id == partner_data_table.c.id
).outerjoin(
partner_address_table,
partner_data_table.c.address_id == partner_address_table.c.id
)
options.extend([
contains_eager(contact_cls.partner),
contains_eager(contact_cls.partner, Partner.personal_data,
alias=partner_data),
contains_eager(contact_cls.partner, Partner.personal_data,
partner_data.address, alias=partner_address),
])
q = session.query(contact_cls).select_from(select_from)
q = q.filter(contact_cls.id.in_(win))
q = q.options(*options)
return q


It works great though it took me quite a while to get right, especially the
aliasing between tables and classes. Also, I wouldn't figure it out without your
help (I mean the part with select_from).

Thank you again,

Ladislav Lenart


On 13.6.2013 18:44, Michael Bayer wrote:
 
 On Jun 13, 2013, at 11:03 AM, Ladislav Lenart lenart...@volny.cz wrote:
 
 Unfortunately migrating to SA 0.9 is not an option for me at the moment due 
 to
 severe time constraints.
 
 I was pretty sure you'd say that, though I'm really looking to verify that my 
 fixes are going to hold up under real world usage.   The issues you're having 
 are real issues, and they've been fixed.
 

 Could you please help me write SA query for 0.7.9 that uses index scan and 
 also
 loads all the necessary relations? It must be possible with a proper use of
 from_statement(), contains_eager() and/or other SA features. It is just that 
 toy
 examples in the documentation don't help me much with this complex beast.
 
 you use the SQL expression language in conjunction with .join()/outerjoin(), 
 pass to query.select_from(), then use contains_eager():

Re: [sqlalchemy] SA0.7.9 - poor query performance (full scan instead of index scan because of joined table inheritance)

2013-06-14 Thread Ladislav Lenart
Hello.


On 13.6.2013 18:44, Michael Bayer wrote:
 
 On Jun 13, 2013, at 11:03 AM, Ladislav Lenart lenart...@volny.cz wrote:
 
 Unfortunately migrating to SA 0.9 is not an option for me at the moment due 
 to
 severe time constraints.
 
 I was pretty sure you'd say that, though I'm really looking to verify that my 
 fixes are going to hold up under real world usage.   The issues you're having 
 are real issues, and they've been fixed.

I will make a 1:1 self-containted test case for this issue during the next week,
I promise!


 Could you please help me write SA query for 0.7.9 that uses index scan and 
 also
 loads all the necessary relations? It must be possible with a proper use of
 from_statement(), contains_eager() and/or other SA features. It is just that 
 toy
 examples in the documentation don't help me much with this complex beast.
 
 you use the SQL expression language in conjunction with .join()/outerjoin(), 
 pass to query.select_from(), then use contains_eager():
 
 j = 
 Foo.__table__.outerjoin(Bar.__table__.outerjoin(BarA.__table__).outerjoin(Data.__table__))
 
 q = s.query(Foo).\
 select_from(j).\
 filter(Foo.id.in_([1, 2, 3])).\
 options(
 contains_eager(Foo.bar),
 contains_eager(Foo.bar.of_type(BarA), BarA.data)
 )

Thank you very much! I am going to try it :-)


Ladislav Lenart

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.




[sqlalchemy] SA0.7.9 - poor query performance (full scan instead of index scan because of joined table inheritance)

2013-06-13 Thread Ladislav Lenart
Hello.

I have a query that does a full scan of an inherited table with more than
million rows even though I need only 100 of them (on postgres 9.1). This is a
real bummer! Please help me rewrite the SA query or instruct postgres to not do
this stupidity.


I have the following setup (only the interesting relations):

Contact
contact_tags - ContactTag (collection)
phones - Phone (collection)
emails - Email (collection)

ContactTag
tag - Tag (cannot be NULL)

PersonalContact (Contact subclass)
partner - Partner (can be NULL)
client - PersonalClient (can be NULL)

CorporateContact (Contact subclass)
client - CorporateClient (can be NULL)

Client

PersonalClient (Client subclass)
data - PersonalData (cannot be NULL)

CorporateClient (Client subclass)
data - CorporateData (cannot be NULL)


I have the following query that loads data of one window:

# window (input argument) is a list of id values.
q = session.query(PersonalContact).filter(PersonalContact.id.in_(window)
q = q.options(
subqueryload_all(PersonalContact.contact_tags, ContactTag.tag),
subqueryload(PersonalContact.phones),
subqueryload(PersonalContact.emails),
joinedload_all(
PersonalContact.partner,
Partner.personal_data,
PersonalData.address,
   ),
joinedload_all(
PersonalContact.client,
PersonalClient.data,
PersonalData.address
),
)


(Note that I have similar query for CorporateContact. Infact, I generate them
both in the same method.)

It produces SQL like this (the problematic part is emphasized):

SELECT *
FROM
contact
JOIN personal_contact ON contact.id = personal_contact.id
-- ***
LEFT OUTER JOIN (
SELECT *
FROM
client
JOIN personal_client ON client.id = personal_client.id
) AS anon_1 ON personal_contact.client_id = anon_1.personal_client_id
-- ***
LEFT OUTER JOIN partner AS partner_1
ON personal_contact.partner_id = partner_1.id
LEFT OUTER JOIN personal_data AS personal_data_1
ON partner_1.personal_data_id = personal_data_1.id
LEFT OUTER JOIN address AS address_1
ON personal_data_1.address_id = address_1.id
LEFT OUTER JOIN personal_data AS personal_data_2
ON anon_1.personal_client_data_id = personal_data_2.id
LEFT OUTER JOIN address AS address_2
ON personal_data_2.address_id = address_2.id
WHERE personal_contact.id IN (...)


The inner select directly corresponds to joinedload of PersonalContact.client, a
PersonalClient instance (and a Client subclass).

The postgres does a full scan of tables Client and PersonalClient even though I
will need at most 100 rows from each one.

However, if I rewrite the problematic part by hand like this:

LEFT OUTER JOIN client
ON personal_contact.client_id = client.id
 LEFT OUTER JOIN personal_client
ON client.id = personal_client.id

it works like a charm.

Unfortunately I don't know how to write such a query in SA. I am really stuck so
any help is much appreciated.


Thank you,

Ladislav Lenart


-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [sqlalchemy] SA0.7.9 - poor query performance (full scan instead of index scan because of joined table inheritance)

2013-06-13 Thread Michael Bayer
Please try out 0.9 from the git master which fixes the issue of the nested 
SELECT on the right side of a join. 

Sent from my iPhone

On Jun 13, 2013, at 9:18 AM, Ladislav Lenart lenart...@volny.cz wrote:

 Hello.
 
 I have a query that does a full scan of an inherited table with more than
 million rows even though I need only 100 of them (on postgres 9.1). This is a
 real bummer! Please help me rewrite the SA query or instruct postgres to not 
 do
 this stupidity.
 
 
 I have the following setup (only the interesting relations):
 
Contact
contact_tags - ContactTag (collection)
phones - Phone (collection)
emails - Email (collection)
 
ContactTag
tag - Tag (cannot be NULL)
 
PersonalContact (Contact subclass)
partner - Partner (can be NULL)
client - PersonalClient (can be NULL)
 
CorporateContact (Contact subclass)
client - CorporateClient (can be NULL)
 
Client
 
PersonalClient (Client subclass)
data - PersonalData (cannot be NULL)
 
CorporateClient (Client subclass)
data - CorporateData (cannot be NULL)
 
 
 I have the following query that loads data of one window:
 
# window (input argument) is a list of id values.
q = session.query(PersonalContact).filter(PersonalContact.id.in_(window)
q = q.options(
subqueryload_all(PersonalContact.contact_tags, ContactTag.tag),
subqueryload(PersonalContact.phones),
subqueryload(PersonalContact.emails),
joinedload_all(
PersonalContact.partner,
Partner.personal_data,
PersonalData.address,
   ),
joinedload_all(
PersonalContact.client,
PersonalClient.data,
PersonalData.address
),
)
 
 
 (Note that I have similar query for CorporateContact. Infact, I generate them
 both in the same method.)
 
 It produces SQL like this (the problematic part is emphasized):
 
SELECT *
FROM
contact
JOIN personal_contact ON contact.id = personal_contact.id
-- ***
LEFT OUTER JOIN (
SELECT *
FROM
client
JOIN personal_client ON client.id = personal_client.id
) AS anon_1 ON personal_contact.client_id = anon_1.personal_client_id
-- ***
LEFT OUTER JOIN partner AS partner_1
ON personal_contact.partner_id = partner_1.id
LEFT OUTER JOIN personal_data AS personal_data_1
ON partner_1.personal_data_id = personal_data_1.id
LEFT OUTER JOIN address AS address_1
ON personal_data_1.address_id = address_1.id
LEFT OUTER JOIN personal_data AS personal_data_2
ON anon_1.personal_client_data_id = personal_data_2.id
LEFT OUTER JOIN address AS address_2
ON personal_data_2.address_id = address_2.id
WHERE personal_contact.id IN (...)
 
 
 The inner select directly corresponds to joinedload of 
 PersonalContact.client, a
 PersonalClient instance (and a Client subclass).
 
 The postgres does a full scan of tables Client and PersonalClient even though 
 I
 will need at most 100 rows from each one.
 
 However, if I rewrite the problematic part by hand like this:
 
LEFT OUTER JOIN client
ON personal_contact.client_id = client.id
 LEFT OUTER JOIN personal_client
ON client.id = personal_client.id
 
 it works like a charm.
 
 Unfortunately I don't know how to write such a query in SA. I am really stuck 
 so
 any help is much appreciated.
 
 
 Thank you,
 
 Ladislav Lenart
 
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 sqlalchemy group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to sqlalchemy+unsubscr...@googlegroups.com.
 To post to this group, send email to sqlalchemy@googlegroups.com.
 Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
 For more options, visit https://groups.google.com/groups/opt_out.
 
 

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [sqlalchemy] SA0.7.9 - poor query performance (full scan instead of index scan because of joined table inheritance)

2013-06-13 Thread Ladislav Lenart
Unfortunately migrating to SA 0.9 is not an option for me at the moment due to
severe time constraints.

Could you please help me write SA query for 0.7.9 that uses index scan and also
loads all the necessary relations? It must be possible with a proper use of
from_statement(), contains_eager() and/or other SA features. It is just that toy
examples in the documentation don't help me much with this complex beast.

Here is a simplified version of my problem:

Foo
bar (can be NULL)

Bar

BarA (subclass of Bar)
data - Data (cannot be NULL)

Data
value (string)

I need to rewrite this query:

q = session.query(Foo).filter(Foo.id.in_(...))
q = q.options(
joinedload_all(Foo.bar, BarA.data)
)

in such a way that it does NOT perform any full scan and also populates
Foo.bar.data.value of each returned Foo. The new query can return whatever it
pleases as long as it returns Foo instances with properly populated relations.

Please help me write it (or tell me that it is not possible in SA 0.7.9, though
I highly doubt that).


Thank you,

Ladislav Lenart


On 13.6.2013 15:51, Michael Bayer wrote:
 Please try out 0.9 from the git master which fixes the issue of the nested 
 SELECT on the right side of a join. 
 
 Sent from my iPhone
 
 On Jun 13, 2013, at 9:18 AM, Ladislav Lenart lenart...@volny.cz wrote:
 
 Hello.

 I have a query that does a full scan of an inherited table with more than
 million rows even though I need only 100 of them (on postgres 9.1). This is a
 real bummer! Please help me rewrite the SA query or instruct postgres to not 
 do
 this stupidity.


 I have the following setup (only the interesting relations):

Contact
contact_tags - ContactTag (collection)
phones - Phone (collection)
emails - Email (collection)

ContactTag
tag - Tag (cannot be NULL)

PersonalContact (Contact subclass)
partner - Partner (can be NULL)
client - PersonalClient (can be NULL)

CorporateContact (Contact subclass)
client - CorporateClient (can be NULL)

Client

PersonalClient (Client subclass)
data - PersonalData (cannot be NULL)

CorporateClient (Client subclass)
data - CorporateData (cannot be NULL)


 I have the following query that loads data of one window:

# window (input argument) is a list of id values.
q = session.query(PersonalContact).filter(PersonalContact.id.in_(window)
q = q.options(
subqueryload_all(PersonalContact.contact_tags, ContactTag.tag),
subqueryload(PersonalContact.phones),
subqueryload(PersonalContact.emails),
joinedload_all(
PersonalContact.partner,
Partner.personal_data,
PersonalData.address,
   ),
joinedload_all(
PersonalContact.client,
PersonalClient.data,
PersonalData.address
),
)


 (Note that I have similar query for CorporateContact. Infact, I generate them
 both in the same method.)

 It produces SQL like this (the problematic part is emphasized):

SELECT *
FROM
contact
JOIN personal_contact ON contact.id = personal_contact.id
-- ***
LEFT OUTER JOIN (
SELECT *
FROM
client
JOIN personal_client ON client.id = personal_client.id
) AS anon_1 ON personal_contact.client_id = anon_1.personal_client_id
-- ***
LEFT OUTER JOIN partner AS partner_1
ON personal_contact.partner_id = partner_1.id
LEFT OUTER JOIN personal_data AS personal_data_1
ON partner_1.personal_data_id = personal_data_1.id
LEFT OUTER JOIN address AS address_1
ON personal_data_1.address_id = address_1.id
LEFT OUTER JOIN personal_data AS personal_data_2
ON anon_1.personal_client_data_id = personal_data_2.id
LEFT OUTER JOIN address AS address_2
ON personal_data_2.address_id = address_2.id
WHERE personal_contact.id IN (...)


 The inner select directly corresponds to joinedload of 
 PersonalContact.client, a
 PersonalClient instance (and a Client subclass).

 The postgres does a full scan of tables Client and PersonalClient even 
 though I
 will need at most 100 rows from each one.

 However, if I rewrite the problematic part by hand like this:

LEFT OUTER JOIN client
ON personal_contact.client_id = client.id
 LEFT OUTER JOIN personal_client
ON client.id = personal_client.id

 it works like a charm.

 Unfortunately I don't know how to write such a query in SA. I am really 
 stuck so
 any help is much appreciated.


 Thank you,

 Ladislav Lenart


 -- 
 You received this message because you are subscribed to the Google Groups 
 sqlalchemy group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to sqlalchemy+unsubscr...@googlegroups.com.
 

Re: [sqlalchemy] SA0.7.9 - poor query performance (full scan instead of index scan because of joined table inheritance)

2013-06-13 Thread Ladislav Lenart
Hello.

Just a minor correction:

The new query can return whatever it pleases as long as it *also*
returns Foo instances with properly populated relations.

The rest of the e-mail is the same.


Please help me,

Ladislav Lenart


On 13.6.2013 17:03, Ladislav Lenart wrote:
 Unfortunately migrating to SA 0.9 is not an option for me at the moment due to
 severe time constraints.
 
 Could you please help me write SA query for 0.7.9 that uses index scan and 
 also
 loads all the necessary relations? It must be possible with a proper use of
 from_statement(), contains_eager() and/or other SA features. It is just that 
 toy
 examples in the documentation don't help me much with this complex beast.
 
 Here is a simplified version of my problem:
 
 Foo
 bar (can be NULL)
 
 Bar
 
 BarA (subclass of Bar)
 data - Data (cannot be NULL)
 
 Data
 value (string)
 
 I need to rewrite this query:
 
 q = session.query(Foo).filter(Foo.id.in_(...))
 q = q.options(
 joinedload_all(Foo.bar, BarA.data)
 )
 
 in such a way that it does NOT perform any full scan and also populates
 Foo.bar.data.value of each returned Foo. The new query can return whatever it
 pleases as long as it returns Foo instances with properly populated relations.
 
 Please help me write it (or tell me that it is not possible in SA 0.7.9, 
 though
 I highly doubt that).
 
 
 Thank you,
 
 Ladislav Lenart
 
 
 On 13.6.2013 15:51, Michael Bayer wrote:
 Please try out 0.9 from the git master which fixes the issue of the nested 
 SELECT on the right side of a join. 

 Sent from my iPhone

 On Jun 13, 2013, at 9:18 AM, Ladislav Lenart lenart...@volny.cz wrote:

 Hello.

 I have a query that does a full scan of an inherited table with more than
 million rows even though I need only 100 of them (on postgres 9.1). This is 
 a
 real bummer! Please help me rewrite the SA query or instruct postgres to 
 not do
 this stupidity.


 I have the following setup (only the interesting relations):

Contact
contact_tags - ContactTag (collection)
phones - Phone (collection)
emails - Email (collection)

ContactTag
tag - Tag (cannot be NULL)

PersonalContact (Contact subclass)
partner - Partner (can be NULL)
client - PersonalClient (can be NULL)

CorporateContact (Contact subclass)
client - CorporateClient (can be NULL)

Client

PersonalClient (Client subclass)
data - PersonalData (cannot be NULL)

CorporateClient (Client subclass)
data - CorporateData (cannot be NULL)


 I have the following query that loads data of one window:

# window (input argument) is a list of id values.
q = session.query(PersonalContact).filter(PersonalContact.id.in_(window)
q = q.options(
subqueryload_all(PersonalContact.contact_tags, ContactTag.tag),
subqueryload(PersonalContact.phones),
subqueryload(PersonalContact.emails),
joinedload_all(
PersonalContact.partner,
Partner.personal_data,
PersonalData.address,
   ),
joinedload_all(
PersonalContact.client,
PersonalClient.data,
PersonalData.address
),
)


 (Note that I have similar query for CorporateContact. Infact, I generate 
 them
 both in the same method.)

 It produces SQL like this (the problematic part is emphasized):

SELECT *
FROM
contact
JOIN personal_contact ON contact.id = personal_contact.id
-- ***
LEFT OUTER JOIN (
SELECT *
FROM
client
JOIN personal_client ON client.id = personal_client.id
) AS anon_1 ON personal_contact.client_id = anon_1.personal_client_id
-- ***
LEFT OUTER JOIN partner AS partner_1
ON personal_contact.partner_id = partner_1.id
LEFT OUTER JOIN personal_data AS personal_data_1
ON partner_1.personal_data_id = personal_data_1.id
LEFT OUTER JOIN address AS address_1
ON personal_data_1.address_id = address_1.id
LEFT OUTER JOIN personal_data AS personal_data_2
ON anon_1.personal_client_data_id = personal_data_2.id
LEFT OUTER JOIN address AS address_2
ON personal_data_2.address_id = address_2.id
WHERE personal_contact.id IN (...)


 The inner select directly corresponds to joinedload of 
 PersonalContact.client, a
 PersonalClient instance (and a Client subclass).

 The postgres does a full scan of tables Client and PersonalClient even 
 though I
 will need at most 100 rows from each one.

 However, if I rewrite the problematic part by hand like this:

LEFT OUTER JOIN client
ON personal_contact.client_id = client.id
 LEFT OUTER JOIN personal_client
ON client.id = personal_client.id

 it works like a charm.

 Unfortunately I don't know how to write such 

Re: [sqlalchemy] SA0.7.9 - poor query performance (full scan instead of index scan because of joined table inheritance)

2013-06-13 Thread Michael Bayer

On Jun 13, 2013, at 11:03 AM, Ladislav Lenart lenart...@volny.cz wrote:

 Unfortunately migrating to SA 0.9 is not an option for me at the moment due to
 severe time constraints.

I was pretty sure you'd say that, though I'm really looking to verify that my 
fixes are going to hold up under real world usage.   The issues you're having 
are real issues, and they've been fixed.

 
 Could you please help me write SA query for 0.7.9 that uses index scan and 
 also
 loads all the necessary relations? It must be possible with a proper use of
 from_statement(), contains_eager() and/or other SA features. It is just that 
 toy
 examples in the documentation don't help me much with this complex beast.

you use the SQL expression language in conjunction with .join()/outerjoin(), 
pass to query.select_from(), then use contains_eager():

j = 
Foo.__table__.outerjoin(Bar.__table__.outerjoin(BarA.__table__).outerjoin(Data.__table__))

q = s.query(Foo).\
select_from(j).\
filter(Foo.id.in_([1, 2, 3])).\
options(
contains_eager(Foo.bar),
contains_eager(Foo.bar.of_type(BarA), BarA.data)
)



-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.