[sqlalchemy] Re: sharding id_chooser query_chooser

2008-07-03 Thread lilo

For example, you have user, post, comment table.  Sharding is done by
user_id and shard_lookup is done via lookup table.  If one is going to
create a post, you would lookup the user_id in the lookup table and
insert the post entry into the shard where the user_id belongs to.
How would you translate this type of behaviour in sqlalchemy.

eg.
shardN - table1, ..., tableM
#each tables have field called username, and username field from user
table is not a foreign key to post or comment table.  There are no
foreign keys.
shard1 - user, post, comment
shard2 - user, post, comment

lookup_table has the following field:

username
shardname

Now, if you want to create and save a post entry, you would look it up
in lookup_table with the username to find the shardname and save it on
that shard.

Since shard lookup is done based on username for all three tables
(user, post, comment), you can have one common function,
def shard_chooser(mapper, instance, clause=None), for all three
tables.

What is the case for the other ShardSession related functions for the
user, post, comment tables, would each table would have their own
verison of

def id_chooser(query, ident)
def query_chooser(query)

?  And also not sure where and how you would associate lookup table,
username and shardname mapping, into these functions.

On Jun 27, 6:06 am, King Simon-NFHD78 [EMAIL PROTECTED]
wrote:
 Lilo wrote:

  My understanding of this query_chooser is that it's used when you want
  to execute orm's sql rather than raw sql.

  I don't quite understand what is visit_binary function do from
  attribute_shard.py example.  What does it mean binary.operator,
  binary.left, binary.right.clause and query._criterion?

  The sharding design behind our application is that we have a master
  lookup table and shards.  What shard to execute sql is based on
  querying master lookup table.

  taken from sqlalchemy attribute_shard.py example:

  def query_chooser(query):
  ids = []

  # here we will traverse through the query's criterion, searching
  # for SQL constructs.  we'll grab continent names as we find them
  # and convert to shard ids
  class FindContinent(sql.ClauseVisitor):
  def visit_binary(self, binary):
  if binary.left is weather_locations.c.continent:
  if binary.operator == operators.eq:
  ids.append(shard_lookup[binary.right.value])
  elif binary.operator == operators.in_op:
  for bind in binary.right.clauses:
  ids.append(shard_lookup[bind.value])

  FindContinent().traverse(query._criterion)
  if len(ids) == 0:
  return ['north_america', 'asia', 'europe', 'south_america']
  else:
  return ids

  thank you.

 Hi,

 (I'm probably going to get the details wrong here, but hopefully the
 general idea will be right)

 SQLAlchemy represents SQL expressions as objects, a bit like a parse
 tree. For example, there are classes that represent tables, joins,
 functions and so on. It uses a Visitor pattern
 (http://en.wikipedia.org/wiki/Visitor_pattern) to traverse these
 structures.

 A binary clause is an SQL expression with an operator, a left half and a
 right half. For example, in the clause 'id = 5', binary.left  is 'id',
 binary.right is '5', and binary.operator is '=' (or rather,
 operators.eq, which is the object that represents '=').

 The query_chooser function above uses a Visitor to look through all the
 SQL expressions that make up the query that is about to be executed.
 Because the only overridden method is 'visit_binary', anything other
 than binary clauses are ignored.

 The method body could be written in long-hand as:

 If the left part of the expression is 'weather_locations.continent':
If the expression is 'continent = XXX':
   add the shard for continent XXX
Else if the expression is 'continent IN (XXX, YYY)':
   add the shards for XXX and YYY

 (operators.in_op corresponds to the 'IN' operator, and
 binary.right.clauses contains the right-hand-side of that expression)

 The fallback case (if len(ids) == 0) happens if the visitor failed to
 find any expressions that it could handle, in which case all the shards
 will be queried.

 I don't understand your situation well enough to know how to adapt the
 example. If your master lookup table is basically doing the same job as
 the shard_lookup dictionary in the example, then you could replace
 shard_lookup above with a function call that does the query.

 I hope that helps,

 Simon
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: sharding id_chooser query_chooser

2008-06-27 Thread King Simon-NFHD78

Lilo wrote:
 
 My understanding of this query_chooser is that it's used when you want
 to execute orm's sql rather than raw sql.
 
 I don't quite understand what is visit_binary function do from
 attribute_shard.py example.  What does it mean binary.operator,
 binary.left, binary.right.clause and query._criterion?
 
 The sharding design behind our application is that we have a master
 lookup table and shards.  What shard to execute sql is based on
 querying master lookup table.
 
 taken from sqlalchemy attribute_shard.py example:
 
 def query_chooser(query):
 ids = []
 
 # here we will traverse through the query's criterion, searching
 # for SQL constructs.  we'll grab continent names as we find them
 # and convert to shard ids
 class FindContinent(sql.ClauseVisitor):
 def visit_binary(self, binary):
 if binary.left is weather_locations.c.continent:
 if binary.operator == operators.eq:
 ids.append(shard_lookup[binary.right.value])
 elif binary.operator == operators.in_op:
 for bind in binary.right.clauses:
 ids.append(shard_lookup[bind.value])
 
 FindContinent().traverse(query._criterion)
 if len(ids) == 0:
 return ['north_america', 'asia', 'europe', 'south_america']
 else:
 return ids
 
 thank you.
 

Hi,

(I'm probably going to get the details wrong here, but hopefully the
general idea will be right)

SQLAlchemy represents SQL expressions as objects, a bit like a parse
tree. For example, there are classes that represent tables, joins,
functions and so on. It uses a Visitor pattern
(http://en.wikipedia.org/wiki/Visitor_pattern) to traverse these
structures.

A binary clause is an SQL expression with an operator, a left half and a
right half. For example, in the clause 'id = 5', binary.left  is 'id',
binary.right is '5', and binary.operator is '=' (or rather,
operators.eq, which is the object that represents '=').

The query_chooser function above uses a Visitor to look through all the
SQL expressions that make up the query that is about to be executed.
Because the only overridden method is 'visit_binary', anything other
than binary clauses are ignored.

The method body could be written in long-hand as:

If the left part of the expression is 'weather_locations.continent':
   If the expression is 'continent = XXX':
  add the shard for continent XXX
   Else if the expression is 'continent IN (XXX, YYY)':
  add the shards for XXX and YYY


(operators.in_op corresponds to the 'IN' operator, and
binary.right.clauses contains the right-hand-side of that expression)

The fallback case (if len(ids) == 0) happens if the visitor failed to
find any expressions that it could handle, in which case all the shards
will be queried.

I don't understand your situation well enough to know how to adapt the
example. If your master lookup table is basically doing the same job as
the shard_lookup dictionary in the example, then you could replace
shard_lookup above with a function call that does the query.

I hope that helps,

Simon

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: sharding id_chooser query_chooser

2008-06-26 Thread lilo

My understanding of this query_chooser is that it's used when you want
to execute orm's sql rather than raw sql.

I don't quite understand what is visit_binary function do from
attribute_shard.py example.  What does it mean binary.operator,
binary.left, binary.right.clause and query._criterion?

The sharding design behind our application is that we have a master
lookup table and shards.  What shard to execute sql is based on
querying master lookup table.

taken from sqlalchemy attribute_shard.py example:

def query_chooser(query):
ids = []

# here we will traverse through the query's criterion, searching
# for SQL constructs.  we'll grab continent names as we find them
# and convert to shard ids
class FindContinent(sql.ClauseVisitor):
def visit_binary(self, binary):
if binary.left is weather_locations.c.continent:
if binary.operator == operators.eq:
ids.append(shard_lookup[binary.right.value])
elif binary.operator == operators.in_op:
for bind in binary.right.clauses:
ids.append(shard_lookup[bind.value])

FindContinent().traverse(query._criterion)
if len(ids) == 0:
return ['north_america', 'asia', 'europe', 'south_america']
else:
return ids

thank you.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Re: sharding id_chooser query_chooser

2008-06-25 Thread Michael Bayer


On Jun 25, 2008, at 6:30 PM, lilo wrote:


 I am trying to understand what id_chooser and query_chooser do.
 Id_chooser basically uses a instance primary key to determine what
 shard the intance should be saved to?

id_chooser receives a primary key identifier, and then returns a list  
of shards in which to search for that primary key.  If you dont know  
which shard to look on, it can return all the shards and they'll all  
be checked in the order they were given.   Obviously its better if it  
returns a single shard identifier.  If your app got hit a lot with get- 
by-primary-keys and you couldn't determine the best shard, you might  
want to return the shards in a different order each time to balance  
load.


 My primary keys(globally
 unique) are made of up more than one fields.  Would that be a problem
 with Id_chooser?

the argument to id_chooser is a list, forming a composite identiifer.

 How/when would do you query_chooser?

the Query calls upon query_chooser when it is ready to return  
instances, passing in itself as an argument.  the function returns the  
list of shards in which to execute the query, and the total results of  
all those shards are aggregated together.

 Can you set id_chooser and query_chooser to be None in:

 create_session.configure(shards = {blah}, shard_chooser =
 shard_chooser, id_chooser = None, query_chooser = None) ?

nope.  sharding requires these in order to do its work.  If you want a  
default chooser function, have them return a list of all shard names.


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---