[sqlalchemy] column_property for correlated subquery

2013-03-18 Thread millerdev
Hi,

Using declarative here, and I'm trying to create a column_property with a 
correlated subquery that returns a count of records with a matching value 
in some other column. Here's what I've tried. Option 1 is the best, option 
2 is ugly but second best, option 3 is not a good option since there are 
many other classes involved and the place where I'd need to put that code 
is far away from where it logically belongs.

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import *
from sqlalchemy.ext.declarative import declared_attr

Base = declarative_base()
option = 1

class Foo(Base):
__tablename__ = 'foo'
id = Column(Integer, primary_key=True)
bar_id = Column(Integer, ForeignKey(bar.id))
name = Column(String)

if option == 1:
# does not work (see first traceback below)
@declared_attr
def name_count(cls):
clx = aliased(cls)
return column_property(
select(func.count([clx.id]))
.where(clx.name == cls.name)
.correlate(cls.__table__))

if option == 2:
# does not work (see second traceback below)
_foo = aliased(Foo)
Foo.name_count = column_property(
select([func.count(_foo.id)])
.where(_foo.name == Foo.name)
.correlate(Foo.__table__))


class Bar(Base):
__tablename__ = 'bar'
id = Column(Integer, primary_key=True)
name = Column(String)


if option == 3:
# works, but really not where I want to put this code
_foo = aliased(Foo)
Foo.name_count = column_property(
select([func.count(_foo.id)])
.where(_foo.name == Foo.name)
.correlate(Foo.__table__))


Option 1 traceback:

Traceback (most recent call last):
  File temp/example.py, line 8, in module
class Foo(Base):
  File .../python2.7/site-packages/sqlalchemy/ext/declarative.py, line 
1348, in __init__
_as_declarative(cls, classname, cls.__dict__)
  File .../python2.7/site-packages/sqlalchemy/ext/declarative.py, line 
1181, in _as_declarative
value = getattr(cls, k)
  File .../python2.7/site-packages/sqlalchemy/ext/declarative.py, line 
1554, in __get__
return desc.fget(cls)
  File temp/example.py, line 15, in name_count
clx = aliased(cls)
  File .../python2.7/site-packages/sqlalchemy/orm/util.py, line 385, in 
aliased
return AliasedClass(element, alias=alias, name=name, 
adapt_on_names=adapt_on_names)
  File .../python2.7/site-packages/sqlalchemy/orm/util.py, line 298, in 
__init__
self.__mapper = _class_to_mapper(cls)
  File .../python2.7/site-packages/sqlalchemy/orm/util.py, line 673, in 
_class_to_mapper
raise exc.UnmappedClassError(class_or_mapper)
sqlalchemy.orm.exc.UnmappedClassError: Class '__main__.Foo' is not mapped


Option 2 traceback:

Traceback (most recent call last):
  File temp/example.py, line 16, in module
select([func.count(_foo.id)])
  File .../python2.7/site-packages/sqlalchemy/sql/expression.py, line 
1229, in __call__
return func(*c, **o)
  File .../python2.7/site-packages/sqlalchemy/sql/functions.py, line 16, 
in __call__
args = [_literal_as_binds(c) for c in args]
  File .../python2.7/site-packages/sqlalchemy/sql/expression.py, line 
1440, in _literal_as_binds
return element.__clause_element__()
  File .../python2.7/site-packages/sqlalchemy/orm/attributes.py, line 
117, in __clause_element__
return self.comparator.__clause_element__()
  File .../python2.7/site-packages/sqlalchemy/util/langhelpers.py, line 
506, in oneshot
result = self.fget(obj, *args, **kw)
  File .../python2.7/site-packages/sqlalchemy/orm/properties.py, line 
156, in __clause_element__
return self.adapter(self.prop.columns[0])
  File .../python2.7/site-packages/sqlalchemy/orm/util.py, line 334, in 
__adapt_element
return self.__adapter.traverse(elem).\
  File .../python2.7/site-packages/sqlalchemy/sql/visitors.py, line 185, 
in traverse
return replacement_traverse(obj, self.__traverse_options__, replace)
  File .../python2.7/site-packages/sqlalchemy/sql/visitors.py, line 281, 
in replacement_traverse
obj = clone(obj, **opts)
  File .../python2.7/site-packages/sqlalchemy/sql/visitors.py, line 270, 
in clone
newelem = replace(elem)
  File .../python2.7/site-packages/sqlalchemy/sql/visitors.py, line 182, 
in replace
e = v.replace(elem)
  File .../python2.7/site-packages/sqlalchemy/sql/util.py, line 720, in 
replace
return self._corresponding_column(col, True)
  File .../python2.7/site-packages/sqlalchemy/sql/util.py, line 695, in 
_corresponding_column
require_embedded=require_embedded)
  File .../python2.7/site-packages/sqlalchemy/sql/expression.py, line 
2492, in corresponding_column
if self.c.contains_column(column):
  File .../python2.7/site-packages/sqlalchemy/util/langhelpers.py, line 
485, in __get__
obj.__dict__[self.__name__] = result = self.fget(obj)
  File 

[sqlalchemy] Re: column_property for correlated subquery

2013-03-18 Thread millerdev

Forgot to add, I'm on SA 0.7.8

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




[sqlalchemy] Re: Optimizing joined entity loads

2009-06-19 Thread millerdev

  Thanks a lot for the tips on how to approach this problem. That's
  exactly what I needed.

 in 0.4 you'd get it off the impl (0.5 too, this is just uglier API):

Excellent! Here's what I came up with as an initial solution:

def poly_load(parent, collection, path):
def itersiblings(parent, path):
def iteritems(items, attr):
for item in items:
for child in getattr(item, attr):
yield child
items = [parent]
while path:
items = iteritems(items, path.pop(0))
return items
path = path.split(.)
assert len(path) % 2 == 0, path must contain an even number of
elements
mid = len(path) / 2
gparent = parent
for attr in path[:mid]:
gparent = getattr(gparent, attr)
session = sqlalchemy.orm.session.object_session(parent)
backref = getattr(type(parent), collection).property.backref.key
itemclass = getattr(type(parent),
collection).property.mapper.class_
qry = session.query(itemclass) \
.join([backref] + path[:mid]) \
.filter(type(gparent).table.c.id == gparent.id)
groups = defaultdict(list)
for item in qry:
groups[getattr(item, backref).id].append(item)
impl = getattr(type(parent), collection).impl
for sibling in itersiblings(gparent, path[mid:]):
if sibling.id in groups:
impl.set_committed_value(sibling._state, groups.get
(sibling.id))

Example usage:

# prepare for takeoff
order = session.get(Order, 123)
item = order.items[0] # triggers lazy load

# sit back and watch the fireworks!

poly_load(item, attributes, order.items)
# BOOM loaded all attributes of all items

poly_load(item, tags, order.items)
# BOOM loaded all tags of all items

poly_load(item.tags[0], bars, item.order.items.tags)
# BOOOM loaded all bars of all tags of all items

Some assumptions I was able to make that kept it simple:
- All mapped classes in my model have a 'table' attribute
- All entities in my model have an 'id' attribute, which is the
primary key.
- Relationships traversed by this loader are configured with the
necessary backrefs to make it work.

Initial tests seem to show a DRAMATIC performance improvement. Thanks
a lot for your help Mike.

Next up, roll this into a loader strategy so I can configure it on the
mapper and have it all happen automatically.

~ Daniel
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Optimizing joined entity loads

2009-06-18 Thread millerdev

 I dont really understand the case here.

My first example wasn't very good. In an attempt to keep it simple I
actually made it too simple. Here's another example:

Order (has items)
Item (has attributes, has tags)
Attribute
Tag

If I set both Item.attributes and Item.tags to eager-load, then my
result set size is the product of len(attributes) * len(tags), which
is where the result set becomes HUGE. This is a description of the
queries before the optimization:

select orders (1 query)
select items (1 query)
select attributes (1 query per item)
select tags (1 query per item)

I'd like to combine all attribute queries into a single query.
Likewise for tags. So instead of having 2 + len(items) * 2 queries
(assuming 10 items, that's 22 queries), I'd have exactly 4 queries.
Like this:

select orders ... where order_id = ?   (1 query)
select items ... where order_id = ?   (1 query)
select attributes ... join items where order_id = ?   (1 query)
select tags ... join items where order_id = ?   (1 query)

This would be done by the loader strategy (probably a variant of
LazyLoader), which would issue a single query. The result of that
query would be used to populate the attributes collection of each item
on the order.

 ...  So i dont  
 see how the result set is HUGE in one case and not the other  
 (assuming HUGE means, number of rows.  if number of columns, SQLA  
 ignores columns for entities which it already has during a load).

I think my new example above should clear up the confusion. However,
the old example (using eager loading) would return duplicate copies of
the item data for each attribute. If there are a lot of columns in the
items table, the size of the result set can get quite large using this
type of eager load, and it's pretty inefficient since it's returning a
duplicate copy of the item with each attribute. The strategy I'm
looking for eliminates all that duplicate data at the expense of a
single extra query.

In the case of having multiple relations (e.g. attributes and tags)
the eager-load result set grows exponentially, while the strategy I'm
looking for only requires a single query per relation but loads no
duplicate data. Theoretically this is the most efficient solution
possible assuming that all data must be loaded (i.e. every item,
attribute and tag).

 Normally, if you wanted the attributes to eagerload off the related  
 items, but not from the order, you would specify eagerloading on only  
 those attributes which you want eagerloaded.

Yes, I understand that. It's not what I'm asking for though.

Thanks.

~ Daniel
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Optimizing joined entity loads

2009-06-18 Thread millerdev

Clarification:

 If I set both Item.attributes and Item.tags to eager-load, then my
 result set size is the product of len(attributes) * len(tags), which
 is where the result set becomes HUGE.

I jumped right from the eager-load to the completely non-optimized (no
eager loading) scenario:

 This is a description of the
 queries before the optimization:

 select orders (1 query)
 select items (1 query)
 select attributes (1 query per item)
 select tags (1 query per item)


Here's the query list for the eager-load-attributes-and-tags scenario:

select orders (1 query)
select items eager-loading attributes and tags (1 query)

So 2 queries in that scenario, but the second query takes FOREVER to
execute and returns a HUGE result set.

~ Daniel
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: Optimizing joined entity loads

2009-06-18 Thread millerdev

 So this is some variant of, i have a bunch of objects and I'd like to  
...
snip lots of reasons why this should not be a standard feature

Yeah, I understand what I'm asking for here, and I would never expect
this kind of optimization to kick in by default. Instead, it would
only be used in those cases where profiling has shown that there is a
bottleneck. I have profiled my code to find out where the bottlenecks
are, so I think I fall into the category of individuals who actually
know how to use a feature like this.

 I think Hibernate might have this kind of loading available...

I wouldn't be surprised. From my use of Hibernate (which was quite a
while ago now) my inclination was that it had lots of options for
complex and highly customized optimizations, like the one I'm trying
to do.

 You'd start using plain session.query() to get the rows you want, hash  
 together the collections as desired, and then apply them to the  
 already loaded objects using the attributes.set_committed_value()  
 function which is used for this purpose (pseudocode follows)(also  
 use trunk rev 6066 since I had to fix set_committed_value()):

I suppose r6066 is a 0.5 revision number? How well does
set_committed_value() work in SA 0.4.7 ? I haven't upgraded to SA 0.5
yet, and I'd rather not do that right now if I can avoid it. However,
if this can't be done with 0.4...

Thanks a lot for the tips on how to approach this problem. That's
exactly what I needed.

~ Daniel

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---