[sqlalchemy] Re: sharding id_chooser query_chooser
For example, you have user, post, comment table. Sharding is done by user_id and shard_lookup is done via lookup table. If one is going to create a post, you would lookup the user_id in the lookup table and insert the post entry into the shard where the user_id belongs to. How would you translate this type of behaviour in sqlalchemy. eg. shardN - table1, ..., tableM #each tables have field called username, and username field from user table is not a foreign key to post or comment table. There are no foreign keys. shard1 - user, post, comment shard2 - user, post, comment lookup_table has the following field: username shardname Now, if you want to create and save a post entry, you would look it up in lookup_table with the username to find the shardname and save it on that shard. Since shard lookup is done based on username for all three tables (user, post, comment), you can have one common function, def shard_chooser(mapper, instance, clause=None), for all three tables. What is the case for the other ShardSession related functions for the user, post, comment tables, would each table would have their own verison of def id_chooser(query, ident) def query_chooser(query) ? And also not sure where and how you would associate lookup table, username and shardname mapping, into these functions. On Jun 27, 6:06 am, King Simon-NFHD78 [EMAIL PROTECTED] wrote: Lilo wrote: My understanding of this query_chooser is that it's used when you want to execute orm's sql rather than raw sql. I don't quite understand what is visit_binary function do from attribute_shard.py example. What does it mean binary.operator, binary.left, binary.right.clause and query._criterion? The sharding design behind our application is that we have a master lookup table and shards. What shard to execute sql is based on querying master lookup table. taken from sqlalchemy attribute_shard.py example: def query_chooser(query): ids = [] # here we will traverse through the query's criterion, searching # for SQL constructs. we'll grab continent names as we find them # and convert to shard ids class FindContinent(sql.ClauseVisitor): def visit_binary(self, binary): if binary.left is weather_locations.c.continent: if binary.operator == operators.eq: ids.append(shard_lookup[binary.right.value]) elif binary.operator == operators.in_op: for bind in binary.right.clauses: ids.append(shard_lookup[bind.value]) FindContinent().traverse(query._criterion) if len(ids) == 0: return ['north_america', 'asia', 'europe', 'south_america'] else: return ids thank you. Hi, (I'm probably going to get the details wrong here, but hopefully the general idea will be right) SQLAlchemy represents SQL expressions as objects, a bit like a parse tree. For example, there are classes that represent tables, joins, functions and so on. It uses a Visitor pattern (http://en.wikipedia.org/wiki/Visitor_pattern) to traverse these structures. A binary clause is an SQL expression with an operator, a left half and a right half. For example, in the clause 'id = 5', binary.left is 'id', binary.right is '5', and binary.operator is '=' (or rather, operators.eq, which is the object that represents '='). The query_chooser function above uses a Visitor to look through all the SQL expressions that make up the query that is about to be executed. Because the only overridden method is 'visit_binary', anything other than binary clauses are ignored. The method body could be written in long-hand as: If the left part of the expression is 'weather_locations.continent': If the expression is 'continent = XXX': add the shard for continent XXX Else if the expression is 'continent IN (XXX, YYY)': add the shards for XXX and YYY (operators.in_op corresponds to the 'IN' operator, and binary.right.clauses contains the right-hand-side of that expression) The fallback case (if len(ids) == 0) happens if the visitor failed to find any expressions that it could handle, in which case all the shards will be queried. I don't understand your situation well enough to know how to adapt the example. If your master lookup table is basically doing the same job as the shard_lookup dictionary in the example, then you could replace shard_lookup above with a function call that does the query. I hope that helps, Simon --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: sharding id_chooser query_chooser
Lilo wrote: My understanding of this query_chooser is that it's used when you want to execute orm's sql rather than raw sql. I don't quite understand what is visit_binary function do from attribute_shard.py example. What does it mean binary.operator, binary.left, binary.right.clause and query._criterion? The sharding design behind our application is that we have a master lookup table and shards. What shard to execute sql is based on querying master lookup table. taken from sqlalchemy attribute_shard.py example: def query_chooser(query): ids = [] # here we will traverse through the query's criterion, searching # for SQL constructs. we'll grab continent names as we find them # and convert to shard ids class FindContinent(sql.ClauseVisitor): def visit_binary(self, binary): if binary.left is weather_locations.c.continent: if binary.operator == operators.eq: ids.append(shard_lookup[binary.right.value]) elif binary.operator == operators.in_op: for bind in binary.right.clauses: ids.append(shard_lookup[bind.value]) FindContinent().traverse(query._criterion) if len(ids) == 0: return ['north_america', 'asia', 'europe', 'south_america'] else: return ids thank you. Hi, (I'm probably going to get the details wrong here, but hopefully the general idea will be right) SQLAlchemy represents SQL expressions as objects, a bit like a parse tree. For example, there are classes that represent tables, joins, functions and so on. It uses a Visitor pattern (http://en.wikipedia.org/wiki/Visitor_pattern) to traverse these structures. A binary clause is an SQL expression with an operator, a left half and a right half. For example, in the clause 'id = 5', binary.left is 'id', binary.right is '5', and binary.operator is '=' (or rather, operators.eq, which is the object that represents '='). The query_chooser function above uses a Visitor to look through all the SQL expressions that make up the query that is about to be executed. Because the only overridden method is 'visit_binary', anything other than binary clauses are ignored. The method body could be written in long-hand as: If the left part of the expression is 'weather_locations.continent': If the expression is 'continent = XXX': add the shard for continent XXX Else if the expression is 'continent IN (XXX, YYY)': add the shards for XXX and YYY (operators.in_op corresponds to the 'IN' operator, and binary.right.clauses contains the right-hand-side of that expression) The fallback case (if len(ids) == 0) happens if the visitor failed to find any expressions that it could handle, in which case all the shards will be queried. I don't understand your situation well enough to know how to adapt the example. If your master lookup table is basically doing the same job as the shard_lookup dictionary in the example, then you could replace shard_lookup above with a function call that does the query. I hope that helps, Simon --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: sharding id_chooser query_chooser
My understanding of this query_chooser is that it's used when you want to execute orm's sql rather than raw sql. I don't quite understand what is visit_binary function do from attribute_shard.py example. What does it mean binary.operator, binary.left, binary.right.clause and query._criterion? The sharding design behind our application is that we have a master lookup table and shards. What shard to execute sql is based on querying master lookup table. taken from sqlalchemy attribute_shard.py example: def query_chooser(query): ids = [] # here we will traverse through the query's criterion, searching # for SQL constructs. we'll grab continent names as we find them # and convert to shard ids class FindContinent(sql.ClauseVisitor): def visit_binary(self, binary): if binary.left is weather_locations.c.continent: if binary.operator == operators.eq: ids.append(shard_lookup[binary.right.value]) elif binary.operator == operators.in_op: for bind in binary.right.clauses: ids.append(shard_lookup[bind.value]) FindContinent().traverse(query._criterion) if len(ids) == 0: return ['north_america', 'asia', 'europe', 'south_america'] else: return ids thank you. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---
[sqlalchemy] Re: sharding id_chooser query_chooser
On Jun 25, 2008, at 6:30 PM, lilo wrote: I am trying to understand what id_chooser and query_chooser do. Id_chooser basically uses a instance primary key to determine what shard the intance should be saved to? id_chooser receives a primary key identifier, and then returns a list of shards in which to search for that primary key. If you dont know which shard to look on, it can return all the shards and they'll all be checked in the order they were given. Obviously its better if it returns a single shard identifier. If your app got hit a lot with get- by-primary-keys and you couldn't determine the best shard, you might want to return the shards in a different order each time to balance load. My primary keys(globally unique) are made of up more than one fields. Would that be a problem with Id_chooser? the argument to id_chooser is a list, forming a composite identiifer. How/when would do you query_chooser? the Query calls upon query_chooser when it is ready to return instances, passing in itself as an argument. the function returns the list of shards in which to execute the query, and the total results of all those shards are aggregated together. Can you set id_chooser and query_chooser to be None in: create_session.configure(shards = {blah}, shard_chooser = shard_chooser, id_chooser = None, query_chooser = None) ? nope. sharding requires these in order to do its work. If you want a default chooser function, have them return a list of all shard names. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~--~~~~--~~--~--~---