Re: [Drizzle-discuss] [gsoc] libdrizzle native sharding

Anurag Priyam Tue, 05 Apr 2011 10:29:10 -0700

Hi Stewart,

On Mon, Apr 4, 2011 at 6:16 PM, Stewart Smith <[email protected]> wrote:
[...]
> NDB chose 240 as the number to map to as it's quite factorable. e.g. if
> you had 2 machines, 120 partitions each. This makes going up to 240
> machines rather easy, you just relocate a partition.


You mean, the data is mapped to 240 partitions? What happens when you
have more than 240 nodes? I could not find anything relevant to read
on it. Any links?

At the surface, this looks quite similar to vbucket. Data maps to a
vbucket (a partition) and any server can host that partition. To add
machines, you just relocate the vbucket.

I think the big idea here is a two stage mapping. Data does not
directly map to a node, rather a virtual bucket/node/server, and that
maps to the actual server/node.

> The big benefit is the reduction of round-trips.

Right. Same for vbuckets.

> considering that machines running clients connect regularly, caching the
> sharding information on them is certainly not out of the question (it
> wouldn't be large).

Just to be sure, by sharding information do you mean what shard does a
key map to?

> The mapping can also change, and could quite easily be implemented for
> moving a shard to a new machine. We'd just need a way in the server to
> return that a database is read only (attempt r/w op on a shard, get back
> "currently read only") while doing the migration. After migration,
> ideally we could use an error saying "shard has relocated" at which
> point the client could update its mapping and connect to the correct
> server.

That seems a bit inconsistent with your liking for round-trips
reduction :). Why do we need to query a server to know if the shard
has moved or not? Client side can (or, should) hold that information.
vbucket scheme would.

Why do you want to make the entire database read/write only? Is there
anything wrong with a read, write state being associated with a data
partition (here vbucket)?

> I say "database" but in future this could be CATALOG (and this would
> enforce the no-cross-shard queries rule).

A catalog can have multiple tables. And I can have half of it on one
machine, and half of it on the other. How does it avoid
cross-shard-queries?

> I also don't like a mechanism that would require another round trip to
> find out which server to connect to in order to run the actual query (it
> also pretty much just moves your scaling and availability problem around
> rather than solving it at all).

vbucket does that :).

> I also think that you shouldn't try to solve every problem in the scope
> of this project (e.g. migrating shards/vbuckets, dealing with r/o
> replicas). Getting the first steps solid and efficient can be enough
> work.

Agreed. Its also easy to go awry if one tries to deal with too many
things. However, I have a couple of point here.

First, I started by painting the bigger picture just to get an idea.
What I felt is that the topics I touched were important enough to be
considered in devising a 'working' sharding solution. The point only
was to show that the scheme I came up with is flexible enough to deal
with all that.

Second, I am on the project to scratch my own itch, so I am actually
targeting a longer commitment. The idea was to lay down things I felt
were in the scope of the project, and then see, how I can go about it
step by step, including what is deliverable in the scope of GSoC, and
what before and after. I was thinking of touching this point a little
later.

> Setting good steps along the way is useful not only for setting goals,
> but that as each step is completed (including the first), something
> useful enters the tree.

I will follow up on this ASAP, on how I plan to go step by step about it.

> hope this helps,

Definitely :). I hope that is not a part of your signature so that I
have to say yes/no everytime :P.

-- 
Anurag Priyam
http://about.me/yeban/

_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Drizzle-discuss] [gsoc] libdrizzle native sharding

Reply via email to