On Thu, May 5, 2011 at 12:20 AM, Anurag Priyam <[email protected]> wrote: > Hey Henrik, > >> I noticed there are new blueprints for libdrizzle sharding functions >> on launchpad >> https://blueprints.launchpad.net/drizzle/+spec/libdrizzle-sharding-phase1 > > That new blueprints is slightly dated :|. You will find a more recent > discussion on libdrizzle sharding support in the mailing list > archives.
Andrew pointed me to that on IRC. I joined the mailing list a few days after that. The past discussion looks good. >> It mentions the ketama consistent sharding technique commonly used >> with memcached. I don't think (but I don't claim to know, so please >> comment) that this is at all applicable to Drizzle. > > I will slightly disagree with the 'at all' point here. > > [...] > >> ...the point in using a consistent hashing technique is to minimize >> cache misses after re-shard. But Drizzle is not a cache. Let me > > In case of Drizzle, minimizing cache miss roughly translates to > minimizing the amount of data that needs to be re-sharded. This is a good point! My counterclaim will be that the virtual bucket approach can reach the same minimum, but yes, it is an argument in favor of Ketama. >> servers this number decreases to 1/4, 1/5 and so on... The point with >> using consistent hashing is to minimize cache misses, so that after >> adding servers, approximately 2/3 (and then 3/4, 4/5, and so on...) of >> the keys will still map to the old server and find the old record. But >> for the remaining 1/n fraction, the records are still lost. It is a >> cache, this is acceptable behavior., and it is very simple, >> no-maintenance solution. > > A figure of 1/3 (that does not map to the same server) is a little > unsettling; 2/1000, or 3/1000 is more comforting. The more the number > of node, the less (percentage of) data you need to redistribute. > > So I feel that sharding with ketama is still relevant to Drizzle. IMO, > the probabilistic distribution of data in a hash based partitioning > scheme is not very attractive for a DBMS. It can be ok for many use cases. For many key-value use cases it is a good way to distribute both data and load evenly. But it is not a one size fits all solution - it is good if the user can also define how to shard the data (in the same way and for the same reasons you can with partitions). henrik -- [email protected] +358-40-8211286 skype: henrik.ingo irc: hingo www.openlife.cc My LinkedIn profile: http://www.linkedin.com/profile/view?id=9522559 _______________________________________________ Mailing list: https://launchpad.net/~drizzle-discuss Post to : [email protected] Unsubscribe : https://launchpad.net/~drizzle-discuss More help : https://help.launchpad.net/ListHelp

