Hi Abhishek, On Wed, 2011-04-06 at 12:38 +0530, Abhishek Singh wrote:
> First blueprint ( Phase 1 of the project) [1] is basically concerned > about the choosing the hashing algorithm for mapping client to shards. > Here you have proposed to use consistent hashing algorithm similar to > libketema for memcached. I would like to do some modification in this > regard. I would propose to use vbucket style partitioning for that, > which hashing ( I could re-use ketama inside libhashkit in-case) with > static mapping of vbuckets to shards. Since this algorithm is much > more robust than the probabilistically good ketama algorithm. Some of > the advantages are: > * Service request is never sent on the wrong server. > * Servers refuse to service commands for which they aren't > responsible. > * Servers do not know about each other. > * Data sets can be handed from one server another atomically. > * No temporal constraints involved. > * Absolute Consistency guaranteed. > * Network overhead in the normal case is nil. Sounds great to me :) > Second blueprint(Phase 2 of the project) [2] asks for "re-shuffling > sharded data between between servers, when servers are either added or > removed ". > In my opinion this phase is handled inside the 1st phase when we use > vbuckets. In-case of vbuckets we have replication phase where vbuckets > are re-mapped to newly added servers. Yes, I think you are correct there, as long as this is documented in a general use kind of way. > Third blueprint( Phase 3 of the project) [3] asks for "support groups > of servers so that each shard could contain a master and multiple > slaves for round-robin read performance and redundancy ". > So this Phase work has to done regarding failure handling. There are > basically 3 ways to have master-slave replication in-case of Drizzle: > * 1:n Replication: The first strategy (1:n) refers to a master > servicing multiple slaves concurrently [4]. > * Chained Replication: The second strategy (chained) refers to a > single master servicing only a single slave, but having that > slave have a further downstream slave of its own. This offers > the advantage of having a single stream of mutation events > coming out of a server, while still maintaining two copies of > all records. This has the disadvantage of compounding > replication latency as you traverse the chain [5]. > * Multi - Master Replication: It's out of the box functionality > available with Drizzle and it is still in beta phase [6]. > Andrews I would like to ask on question in this regard. How actually > would we get the key related to each client? Clients aren't going to > manually supply key every-time they try to execute sharded query. This > key has to generated on the basis of user data automatically, and then > redirecting the client to the appropriate shard based upon that key > using vbuckets. I'm pretty sure the only way this can work is if the client supplies the key manually. I don't think this is a big ask. If it is something the client doesn't know the key for it will have to do a lookup on another table to find out what it is instead of using secondary indexes. > With some googling around I got to know some of the basic techniques > used in this regard, they are: > * Shard by the primary value on a table: This straight-forward > approach and easiest to implement. Here we have to calculate > MD5 hash of primary key and there-after allocate a shard to > client based upon that. This is only effective when data is > reasonably well distributed. > * Shard by the modulus of a key value: the modulus function > effectively distributes across your shards on a “round-robin” > basis, creating a very even distribution of new key values. > * Maintain a master shard index: This technique involves using a > single master table that maps various values to specific > shards. It is very flexible, and meets a large variety of > application situations. > I would appreciate suggestions in this regard. I'm not a fan of the third solution for this version (maybe after GSoC we can look into that). For now, keep it simple so it can be built upon later. I was thinking a combination of the first two approaches when I dreamt it up... so a modulus of the MD5. A modulus of the key value cannot always give a great distribution if your key always jumps by X value for example. I'll explain why: Imagine a situation where you have multiple clients writing to one sharded table. We can't use auto-increment because we need the key values to be predictable. So for an auto-number you would likely either have: 1. clients pre-fetch X number pre-defined of keys from somewhere to use and no other client can use this set of keys (this is what MySQL Cluster does) 2. Having an predictable increment jump (so with 32 clients client 1 would increment by 32, client 2 by 32+1, etc...) For either of these I suspect a simple modulus of the value will not give a perfectly even distribution. Where as a modulus of the MD5 will give a much better distribution (and work better on strange data types). Kind Regards -- Andrew Hutchings - LinuxJedi - http://www.linuxjedi.co.uk/ _______________________________________________ Mailing list: https://launchpad.net/~drizzle-discuss Post to : [email protected] Unsubscribe : https://launchpad.net/~drizzle-discuss More help : https://help.launchpad.net/ListHelp

