On Thu, May 5, 2011 at 12:20 AM, Anurag Priyam <[email protected]> wrote:
> Hey Henrik,
>
>> I noticed there are new blueprints for libdrizzle sharding functions
>> on launchpad
>> https://blueprints.launchpad.net/drizzle/+spec/libdrizzle-sharding-phase1
>
> That new blueprints is slightly dated :|. You will find a more recent
> discussion on libdrizzle sharding support in the mailing list
> archives.

Andrew pointed me to that on IRC. I joined the mailing list a few days
after that. The past discussion looks good.

>> It mentions the ketama consistent sharding technique commonly used
>> with memcached. I don't think (but I don't claim to know, so please
>> comment) that this is at all applicable to Drizzle.
>
> I will slightly disagree with the 'at all' point here.
>
> [...]
>
>> ...the point in using a consistent hashing technique is to minimize
>> cache misses after re-shard. But Drizzle is not a cache. Let me
>
> In case of Drizzle, minimizing cache miss roughly translates to
> minimizing the amount of data that needs to be re-sharded.

This is a good point! My counterclaim will be that the virtual bucket
approach can reach the same minimum, but yes, it is an argument in
favor of Ketama.

>> servers this number decreases to 1/4, 1/5 and so on... The point with
>> using consistent hashing is to minimize cache misses, so that after
>> adding servers, approximately 2/3 (and then 3/4, 4/5, and so on...) of
>> the keys will still map to the old server and find the old record. But
>> for the remaining 1/n fraction, the records are still lost. It is a
>> cache, this is acceptable behavior., and it is very simple,
>> no-maintenance solution.
>
> A figure of 1/3 (that does not map to the same server) is a little
> unsettling; 2/1000, or 3/1000 is more comforting. The more the number
> of node, the less (percentage of) data you need to redistribute.
>
> So I feel that sharding with ketama is still relevant to Drizzle. IMO,
> the probabilistic distribution of data in a hash based partitioning
> scheme is not very attractive for a DBMS.

It can be ok for many use cases. For many key-value use cases it is a
good way to distribute both data and load evenly. But it is not a one
size fits all solution - it is good if the user can also define how to
shard the data (in the same way and for the same reasons you can with
partitions).

henrik


-- 
[email protected]
+358-40-8211286 skype: henrik.ingo irc: hingo
www.openlife.cc

My LinkedIn profile: http://www.linkedin.com/profile/view?id=9522559

_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Reply via email to