Hi Abhishek,

On Fri, 2011-04-01 at 22:01 +0000, Abhishek Kumar Singh wrote:
> Earlier Andrew Hutchings (irc-nick: LinuxJedi) was listed on wiki page
> as mentor for this project. But now since he isn't available for that
> project, so I would request somebody from Drizzle community to be mentor
> for "libdrizzle native sharding" project.

So, the problem that was blocking my being a GSoC mentor appears to be
resolved now.  I will update the wiki shortly to reflect this.

> I have already kicked off my work for this project. I have development
> environment ready and working on to solve some of the bugs.

Excellent.

> For this project I was thinking of implementing a plugin. This plugin
> would basically serve two purposes: (a) Shard Selection and (b) Shard
> Resolution. Typically when talking about database sharding there are a
> couple of decisions an implementation needs to make:
> 
> How do we assign a shard to a new user?  (Shard Selection)
> How do we resolve the shard that a current object lives in? (Shard
> Resolution)
> 
> So the plugin will be using what we call an Index database that holds a
> minimum of two tables. One which the plugin creates is called which
> contains a record for every shard in the system along with a capacity of
> the shard and its usage field. This table is queried every time a new
> user, is created within the system. We assign the new user to the shard
> with the lowest usage to capacity ratio. This allows for shards to be
> located on different types of hardware that should take a smaller or
> larger number of users. The other table is supplied by the application
> using the plugin and provides a mapping of the user to the shard to be
> used.  Then whenever a request begins for an object the application
> should query this table and retrieve the shard to use and then pass that
> to the plugin to switch to that database.

So, my only concern with this would be every query would turn into two
queries to what is likely two different servers which (depending on
network type) could mean a bit of overhead.  One thing many of us
discovered whilst working on MySQL Cluster is when using traditional
gigabit ethernet networks is that round-trips aren't cheap.  I suppose
this could be partly resolved by having a cache inside libdrizzle.

If this was a plugin option along with just a basic hash lookup (very
much like libmemcached) then I think that would be cool.

> Yesterday I had discussion with Stewart Smith(irc-nick: stewart)
> regarding this project, then he gave me some ideas regarding libdrizzle
> re-sharding, i.e. redistribution of the data across the shards (either
> to achieve proper load balancing or to satisfy application invariants).
> Should I discuss about how I am thinking about to implement libdrizzle
> re-sharding in my GSOC application too?

Absolutely :)

Kind Regards
-- 
Andrew Hutchings - LinuxJedi - http://www.linuxjedi.co.uk/


_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Reply via email to