[sqlalchemy] column_property for correlated subquery
Hi, Using declarative here, and I'm trying to create a column_property with a correlated subquery that returns a count of records with a matching value in some other column. Here's what I've tried. Option 1 is the best, option 2 is ugly but second best, option 3 is not a good option since there are many other classes involved and the place where I'd need to put that code is far away from where it logically belongs. from sqlalchemy import * from sqlalchemy.orm import * from sqlalchemy.ext.declarative import * from sqlalchemy.ext.declarative import declared_attr Base = declarative_base() option = 1 class Foo(Base): __tablename__ = 'foo' id = Column(Integer, primary_key=True) bar_id = Column(Integer, ForeignKey(bar.id)) name = Column(String) if option == 1: # does not work (see first traceback below) @declared_attr def name_count(cls): clx = aliased(cls) return column_property( select(func.count([clx.id])) .where(clx.name == cls.name) .correlate(cls.__table__)) if option == 2: # does not work (see second traceback below) _foo = aliased(Foo) Foo.name_count = column_property( select([func.count(_foo.id)]) .where(_foo.name == Foo.name) .correlate(Foo.__table__)) class Bar(Base): __tablename__ = 'bar' id = Column(Integer, primary_key=True) name = Column(String) if option == 3: # works, but really not where I want to put this code _foo = aliased(Foo) Foo.name_count = column_property( select([func.count(_foo.id)]) .where(_foo.name == Foo.name) .correlate(Foo.__table__)) Option 1 traceback: Traceback (most recent call last): File temp/example.py, line 8, in module class Foo(Base): File .../python2.7/site-packages/sqlalchemy/ext/declarative.py, line 1348, in __init__ _as_declarative(cls, classname, cls.__dict__) File .../python2.7/site-packages/sqlalchemy/ext/declarative.py, line 1181, in _as_declarative value = getattr(cls, k) File .../python2.7/site-packages/sqlalchemy/ext/declarative.py, line 1554, in __get__ return desc.fget(cls) File temp/example.py, line 15, in name_count clx = aliased(cls) File .../python2.7/site-packages/sqlalchemy/orm/util.py, line 385, in aliased return AliasedClass(element, alias=alias, name=name, adapt_on_names=adapt_on_names) File .../python2.7/site-packages/sqlalchemy/orm/util.py, line 298, in __init__ self.__mapper = _class_to_mapper(cls) File .../python2.7/site-packages/sqlalchemy/orm/util.py, line 673, in _class_to_mapper raise exc.UnmappedClassError(class_or_mapper) sqlalchemy.orm.exc.UnmappedClassError: Class '__main__.Foo' is not mapped Option 2 traceback: Traceback (most recent call last): File temp/example.py, line 16, in module select([func.count(_foo.id)]) File .../python2.7/site-packages/sqlalchemy/sql/expression.py, line 1229, in __call__ return func(*c, **o) File .../python2.7/site-packages/sqlalchemy/sql/functions.py, line 16, in __call__ args = [_literal_as_binds(c) for c in args] File .../python2.7/site-packages/sqlalchemy/sql/expression.py, line 1440, in _literal_as_binds return element.__clause_element__() File .../python2.7/site-packages/sqlalchemy/orm/attributes.py, line 117, in __clause_element__ return self.comparator.__clause_element__() File .../python2.7/site-packages/sqlalchemy/util/langhelpers.py, line 506, in oneshot result = self.fget(obj, *args, **kw) File .../python2.7/site-packages/sqlalchemy/orm/properties.py, line 156, in __clause_element__ return self.adapter(self.prop.columns[0]) File .../python2.7/site-packages/sqlalchemy/orm/util.py, line 334, in __adapt_element return self.__adapter.traverse(elem).\ File .../python2.7/site-packages/sqlalchemy/sql/visitors.py, line 185, in traverse return replacement_traverse(obj, self.__traverse_options__, replace) File .../python2.7/site-packages/sqlalchemy/sql/visitors.py, line 281, in replacement_traverse obj = clone(obj, **opts) File .../python2.7/site-packages/sqlalchemy/sql/visitors.py, line 270, in clone newelem = replace(elem) File .../python2.7/site-packages/sqlalchemy/sql/visitors.py, line 182, in replace e = v.replace(elem) File .../python2.7/site-packages/sqlalchemy/sql/util.py, line 720, in replace return self._corresponding_column(col, True) File .../python2.7/site-packages/sqlalchemy/sql/util.py, line 695, in _corresponding_column require_embedded=require_embedded) File .../python2.7/site-packages/sqlalchemy/sql/expression.py, line 2492, in corresponding_column if self.c.contains_column(column): File .../python2.7/site-packages/sqlalchemy/util/langhelpers.py, line 485, in __get__ obj.__dict__[self.__name__] = result = self.fget(obj) File
[sqlalchemy] Re: column_property for correlated subquery
Forgot to add, I'm on SA 0.7.8 -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
[sqlalchemy] Re: Optimizing joined entity loads
Thanks a lot for the tips on how to approach this problem. That's exactly what I needed. in 0.4 you'd get it off the impl (0.5 too, this is just uglier API): Excellent! Here's what I came up with as an initial solution: def poly_load(parent, collection, path): def itersiblings(parent, path): def iteritems(items, attr): for item in items: for child in getattr(item, attr): yield child items = [parent] while path: items = iteritems(items, path.pop(0)) return items path = path.split(.) assert len(path) % 2 == 0, path must contain an even number of elements mid = len(path) / 2 gparent = parent for attr in path[:mid]: gparent = getattr(gparent, attr) session = sqlalchemy.orm.session.object_session(parent) backref = getattr(type(parent), collection).property.backref.key itemclass = getattr(type(parent), collection).property.mapper.class_ qry = session.query(itemclass) \ .join([backref] + path[:mid]) \ .filter(type(gparent).table.c.id == gparent.id) groups = defaultdict(list) for item in qry: groups[getattr(item, backref).id].append(item) impl = getattr(type(parent), collection).impl for sibling in itersiblings(gparent, path[mid:]): if sibling.id in groups: impl.set_committed_value(sibling._state, groups.get (sibling.id)) Example usage: # prepare for takeoff order = session.get(Order, 123) item = order.items[0] # triggers lazy load # sit back and watch the fireworks! poly_load(item, attributes, order.items) # BOOM loaded all attributes of all items poly_load(item, tags, order.items) # BOOM loaded all tags of all items poly_load(item.tags[0], bars, item.order.items.tags) # BOOOM loaded all bars of all tags of all items Some assumptions I was able to make that kept it simple: - All mapped classes in my model have a 'table' attribute - All entities in my model have an 'id' attribute, which is the primary key. - Relationships traversed by this loader are configured with the necessary backrefs to make it work. Initial tests seem to show a DRAMATIC performance improvement. Thanks a lot for your help Mike. Next up, roll this into a loader strategy so I can configure it on the mapper and have it all happen automatically. ~ Daniel --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: Optimizing joined entity loads
I dont really understand the case here. My first example wasn't very good. In an attempt to keep it simple I actually made it too simple. Here's another example: Order (has items) Item (has attributes, has tags) Attribute Tag If I set both Item.attributes and Item.tags to eager-load, then my result set size is the product of len(attributes) * len(tags), which is where the result set becomes HUGE. This is a description of the queries before the optimization: select orders (1 query) select items (1 query) select attributes (1 query per item) select tags (1 query per item) I'd like to combine all attribute queries into a single query. Likewise for tags. So instead of having 2 + len(items) * 2 queries (assuming 10 items, that's 22 queries), I'd have exactly 4 queries. Like this: select orders ... where order_id = ? (1 query) select items ... where order_id = ? (1 query) select attributes ... join items where order_id = ? (1 query) select tags ... join items where order_id = ? (1 query) This would be done by the loader strategy (probably a variant of LazyLoader), which would issue a single query. The result of that query would be used to populate the attributes collection of each item on the order. ... So i dont see how the result set is HUGE in one case and not the other (assuming HUGE means, number of rows. if number of columns, SQLA ignores columns for entities which it already has during a load). I think my new example above should clear up the confusion. However, the old example (using eager loading) would return duplicate copies of the item data for each attribute. If there are a lot of columns in the items table, the size of the result set can get quite large using this type of eager load, and it's pretty inefficient since it's returning a duplicate copy of the item with each attribute. The strategy I'm looking for eliminates all that duplicate data at the expense of a single extra query. In the case of having multiple relations (e.g. attributes and tags) the eager-load result set grows exponentially, while the strategy I'm looking for only requires a single query per relation but loads no duplicate data. Theoretically this is the most efficient solution possible assuming that all data must be loaded (i.e. every item, attribute and tag). Normally, if you wanted the attributes to eagerload off the related items, but not from the order, you would specify eagerloading on only those attributes which you want eagerloaded. Yes, I understand that. It's not what I'm asking for though. Thanks. ~ Daniel --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: Optimizing joined entity loads
Clarification: If I set both Item.attributes and Item.tags to eager-load, then my result set size is the product of len(attributes) * len(tags), which is where the result set becomes HUGE. I jumped right from the eager-load to the completely non-optimized (no eager loading) scenario: This is a description of the queries before the optimization: select orders (1 query) select items (1 query) select attributes (1 query per item) select tags (1 query per item) Here's the query list for the eager-load-attributes-and-tags scenario: select orders (1 query) select items eager-loading attributes and tags (1 query) So 2 queries in that scenario, but the second query takes FOREVER to execute and returns a HUGE result set. ~ Daniel --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: Optimizing joined entity loads
So this is some variant of, i have a bunch of objects and I'd like to ... snip lots of reasons why this should not be a standard feature Yeah, I understand what I'm asking for here, and I would never expect this kind of optimization to kick in by default. Instead, it would only be used in those cases where profiling has shown that there is a bottleneck. I have profiled my code to find out where the bottlenecks are, so I think I fall into the category of individuals who actually know how to use a feature like this. I think Hibernate might have this kind of loading available... I wouldn't be surprised. From my use of Hibernate (which was quite a while ago now) my inclination was that it had lots of options for complex and highly customized optimizations, like the one I'm trying to do. You'd start using plain session.query() to get the rows you want, hash together the collections as desired, and then apply them to the already loaded objects using the attributes.set_committed_value() function which is used for this purpose (pseudocode follows)(also use trunk rev 6066 since I had to fix set_committed_value()): I suppose r6066 is a 0.5 revision number? How well does set_committed_value() work in SA 0.4.7 ? I haven't upgraded to SA 0.5 yet, and I'd rather not do that right now if I can avoid it. However, if this can't be done with 0.4... Thanks a lot for the tips on how to approach this problem. That's exactly what I needed. ~ Daniel --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---