Hi

I noticed there are new blueprints for libdrizzle sharding functions
on launchpad
https://blueprints.launchpad.net/drizzle/+spec/libdrizzle-sharding-phase1

It mentions the ketama consistent sharding technique commonly used
with memcached. I don't think (but I don't claim to know, so please
comment) that this is at all applicable to Drizzle.

As explained in
http://www.mikeperham.com/2009/01/14/consistent-hashing-in-memcache-client/
...the point in using a consistent hashing technique is to minimize
cache misses after re-shard. But Drizzle is not a cache. Let me
explain:

In memcached, and using naive sharding [key % num_servers], when you
have 2 servers and add a 3rd, it means that the sharding key changes
for all your records/objects, and all data is lost. Well, in fact 1/3
of the keys will still map to the old server, and as you add more
servers this number decreases to 1/4, 1/5 and so on... The point with
using consistent hashing is to minimize cache misses, so that after
adding servers, approximately 2/3 (and then 3/4, 4/5, and so on...) of
the keys will still map to the old server and find the old record. But
for the remaining 1/n fraction, the records are still lost. It is a
cache, this is acceptable behavior., and it is very simple,
no-maintenance solution.

Drizzle is a database. So we cannot just discard 1/n of the data after
adding a server. The data has to actually be moved from old location
to the new ones. Consistent hashing doesn't make it any easier to know
which data should go where, in fact it makes it harder.

Imho the solutions available for Drizzle are:

1) back to naive approach (which I believe the stock memcached client
uses). When scaling out, you just move the data record, by record into
their correct shards. I think MySQL Cluster does something similar.
The main drawback here is that if the sharding key doesn't result in
an even spread of data, some shards are full and some empty and you
cannot do much about it.

2) virtual buckets.
 - You start with 1 server. There can already be data in the database.
 - You divide your data into N virtual shards. N >> max number of
shards you'll ever have. Say N=10k or more. (Ask Facebook :-)
 - At this point all data is still physically in one and the same InnoDB table.
 - When scaling out, you add more servers and move virtual buckets one
by one to new servers.
 - You need to maintain a mapping from virtual bucket to actual
Drizzle instances. All clients and servers need to agree on this. This
could be a system table on one or all Drizzle instances.
 - you can move virtual buckets back and forth to re-balance your data.

In both 1 and 2 the act of moving data is complicated, it has to
happen somehow "atomically":
 - clients use old shard layout
 - copy data to new shards
 - keep in sync old shards and new shards
 - start using new shard layout
 - discard moved data from old shards

henrik



-- 
[email protected]
+358-40-8211286 skype: henrik.ingo irc: hingo
www.openlife.cc

My LinkedIn profile: http://www.linkedin.com/profile/view?id=9522559

_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Reply via email to