[sqlalchemy] Re: Sharding with _BindParamClause
Michael Bayer wrote: check out r0ddd638f1d90 in mercurial. I've added the function from the example below, plus support for in_op(), to the attribute_shard example. The old ClauseVisitor method is removed and replaced with this more robust method. Very nice! Thanks, Michael. /George -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
[sqlalchemy] Re: Sharding with _BindParamClause
On Apr 2, 4:43 pm, George V. Reilly george.v.rei...@gmail.com wrote: Michael Bayer wrote: check out r0ddd638f1d90 in mercurial. I've added the function from the example below, plus support for in_op(), to the attribute_shard example. The old ClauseVisitor method is removed and replaced with this more robust method. Very nice! Thanks, Michael. I blogged about this at http://blogs.cozi.com/tech/2010/04/sqlalchemy-sharding.html /George -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] Re: Sharding with _BindParamClause
George V. Reilly wrote: I really don't like the isinstance(other_side, sqlalchemy.sql.expression._BindParamClause)) in the middle of _check_side. Is there a cleaner way to do this? I found that a combination of * the above _check_side and two-sided visit_binary * doing a better job of declaring ForeignKey relationships in Columns * some explicit primaryjoins in calls to relation() cleaned up all the cases where SA wasn't providing the ids in queries Perhaps the sharding sample in SA 0.6 could be expanded? I think what's needed here is a little more savvy to what the visitors package provides. ClauseVisitor provides support for some of the hefty transforming objects SQLA has internally in the sql util package, but for finding things you're much better off using the functional API, which is a lot more flexible. Anytime you find yourself using isinstance(), which is in fact sometimes necessary, you can instead try to get the visitor dispatch system to give you that type instead. Below I've adapted a recipe that is in the beaker example to also include columns and operators from binary expressions. We can certainly add this function to the sharding example. Ideally the sharding API would come with some helper functions, if we could identify some that are of very general use. from sqlalchemy.sql import visitors def get_comparisons(query): binds = {} columns = set() result = [] def visit_bindparam(bind): value = query._params.get(bind.key, bind.value) # lazyloader may dig a callable in here, intended # to late-evaluate params after autoflush is called. # convert to a scalar value. if callable(value): value = value() binds[bind] = value def visit_column(column): columns.add(column) def visit_binary(binary): if binary.left in columns and binary.right in binds: result.append((binary.left, binary.operator, binds[binary.right])) elif binary.left in binds and binary.right in columns: result.append((binary.right, binary.operator, binds[binary.left])) if query._criterion is not None: visitors.traverse_depthfirst(query._criterion, {}, {'bindparam':visit_bindparam, 'binary':visit_binary, 'column':visit_column } ) return result if __name__ == '__main__': from sqlalchemy import * from sqlalchemy.orm import * metadata = MetaData() users = Table('users', metadata, Column('id', Integer, primary_key=True, ), Column('name', String(30), nullable=False), ) addresses = Table('addresses', metadata, Column('id', Integer, primary_key=True, ), Column('user_id', None, ForeignKey('users.id')), Column('email_address', String(50), nullable=False) ) class User(object): pass class Address(object): pass mapper(Address, addresses, properties={ 'user':relationship(User) }) mapper(User, users) sess = create_session() q = sess.query(User).\ filter(User.id==Address.user_id).\ filter(User.name=='foo').\ filter(Address.email_address.like(bindparam('foo'))).\ params(foo='edward') for col, op, value in get_comparisons(q): print col.name, op, value -- /George V. Reilly, Seattle -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
Re: [sqlalchemy] Re: Sharding with _BindParamClause
Michael Bayer wrote: George V. Reilly wrote: I really don't like the isinstance(other_side, sqlalchemy.sql.expression._BindParamClause)) in the middle of _check_side. Is there a cleaner way to do this? I found that a combination of * the above _check_side and two-sided visit_binary * doing a better job of declaring ForeignKey relationships in Columns * some explicit primaryjoins in calls to relation() cleaned up all the cases where SA wasn't providing the ids in queries Perhaps the sharding sample in SA 0.6 could be expanded? check out r0ddd638f1d90 in mercurial. I've added the function from the example below, plus support for in_op(), to the attribute_shard example. The old ClauseVisitor method is removed and replaced with this more robust method. I think what's needed here is a little more savvy to what the visitors package provides. ClauseVisitor provides support for some of the hefty transforming objects SQLA has internally in the sql util package, but for finding things you're much better off using the functional API, which is a lot more flexible. Anytime you find yourself using isinstance(), which is in fact sometimes necessary, you can instead try to get the visitor dispatch system to give you that type instead. Below I've adapted a recipe that is in the beaker example to also include columns and operators from binary expressions. We can certainly add this function to the sharding example. Ideally the sharding API would come with some helper functions, if we could identify some that are of very general use. from sqlalchemy.sql import visitors def get_comparisons(query): binds = {} columns = set() result = [] def visit_bindparam(bind): value = query._params.get(bind.key, bind.value) # lazyloader may dig a callable in here, intended # to late-evaluate params after autoflush is called. # convert to a scalar value. if callable(value): value = value() binds[bind] = value def visit_column(column): columns.add(column) def visit_binary(binary): if binary.left in columns and binary.right in binds: result.append((binary.left, binary.operator, binds[binary.right])) elif binary.left in binds and binary.right in columns: result.append((binary.right, binary.operator, binds[binary.left])) if query._criterion is not None: visitors.traverse_depthfirst(query._criterion, {}, {'bindparam':visit_bindparam, 'binary':visit_binary, 'column':visit_column } ) return result if __name__ == '__main__': from sqlalchemy import * from sqlalchemy.orm import * metadata = MetaData() users = Table('users', metadata, Column('id', Integer, primary_key=True, ), Column('name', String(30), nullable=False), ) addresses = Table('addresses', metadata, Column('id', Integer, primary_key=True, ), Column('user_id', None, ForeignKey('users.id')), Column('email_address', String(50), nullable=False) ) class User(object): pass class Address(object): pass mapper(Address, addresses, properties={ 'user':relationship(User) }) mapper(User, users) sess = create_session() q = sess.query(User).\ filter(User.id==Address.user_id).\ filter(User.name=='foo').\ filter(Address.email_address.like(bindparam('foo'))).\ params(foo='edward') for col, op, value in get_comparisons(q): print col.name, op, value -- /George V. Reilly, Seattle -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
[sqlalchemy] Re: Sharding with _BindParamClause
On Mar 30, 4:42 pm, Michael Bayer mike...@zzzcomputing.com wrote: George V. Reilly wrote: We're using SQLAlchemy sharding to partition accounts across a couple of databases. We want to add more partitions, but first we need to eliminate some unnecessary cross-partition queries. This works well most of the time, but we're finding that some queries do not have a value. These are all of the form SELECT shopping_list_items.version AS shopping_list_items_version FROM shopping_list_items WHERE shopping_list_items.account_id = :param_1 AND shopping_list_items.shopping_list_item_id = :param_2 and :param1 is of the form _BindParamClause(u'%(63636624 param)s', None, type_=UUID()) Typically, I'm seeing this come out of the innards of SQLAlchemy, as one of several queries triggered by, say, a session.merge(). The only Query() I can see getting generated that would have non-valued bindparams would be during a _get(). The values should be present in query._params. If you need more info I can dig in to recall how the keys of that dictionary are formatted in this case. Thanks, Michael. Here's what I came up with: class FindShardableId(sqlalchemy.sql.ClauseVisitor): def __init__(self, ids, key_fields, get_shard, params): self.ids = ids self.key_fields = key_fields self.get_shard = get_shard self.params = params def visit_binary(self, binary): if not self._check_side(binary, binary.left, binary.right): # Lazy load properties tend to be reversed, with the constant on the left self._check_side(binary, binary.right, binary.left) def _check_side(self, binary, side, other_side): if isinstance(side, sqlalchemy.Column) and side.name in self.key_fields: if binary.operator == sqlalchemy.sql.operators.eq: value = getattr(other_side, value, None) if (value is None and isinstance(other_side, sqlalchemy.sql.expression._BindParamClause)): value = self.params.get(other_side.key) if value is not None: self.ids.append(self.get_shard(value)) return True elif binary.operator == sqlalchemy.sql.operators.in_op: for bind in other_side.clauses: self.ids.append(self.get_shard(bind.value)) return True class QuerySharder(object): def sessionmaker(self, **sessionmaker_args): Session = sqlalchemy.orm.sessionmaker( class_ = sqlalchemy.orm.shard.ShardedSession, **sessionmaker_args) Session.configure( shards=self.shards, shard_chooser=self._shard_chooser, id_chooser=self._id_chooser, query_chooser=self._query_chooser) return Session def _query_chooser(self, query): ids = [] if query._criterion is not None: FindShardableId( ids, set([account_id, account_guid]), lambda account_id: self.shard_manager.shard_id_from_guid(account_id), query._params ).traverse(query._criterion) if len(ids) == 0: logging.warn(\n\n! Executing query against all shards; this may not be optimal:\n\t{0}\n \tParams: {1}\n.format( str(query), str(query._params))) return self.shards.keys() else: return ids I really don't like the isinstance(other_side, sqlalchemy.sql.expression._BindParamClause)) in the middle of _check_side. Is there a cleaner way to do this? I found that a combination of * the above _check_side and two-sided visit_binary * doing a better job of declaring ForeignKey relationships in Columns * some explicit primaryjoins in calls to relation() cleaned up all the cases where SA wasn't providing the ids in queries Perhaps the sharding sample in SA 0.6 could be expanded? -- /George V. Reilly, Seattle -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.