Hi Stewart, On Mon, Apr 4, 2011 at 6:16 PM, Stewart Smith <[email protected]> wrote: [...] > NDB chose 240 as the number to map to as it's quite factorable. e.g. if > you had 2 machines, 120 partitions each. This makes going up to 240 > machines rather easy, you just relocate a partition.
You mean, the data is mapped to 240 partitions? What happens when you have more than 240 nodes? I could not find anything relevant to read on it. Any links? At the surface, this looks quite similar to vbucket. Data maps to a vbucket (a partition) and any server can host that partition. To add machines, you just relocate the vbucket. I think the big idea here is a two stage mapping. Data does not directly map to a node, rather a virtual bucket/node/server, and that maps to the actual server/node. > The big benefit is the reduction of round-trips. Right. Same for vbuckets. > considering that machines running clients connect regularly, caching the > sharding information on them is certainly not out of the question (it > wouldn't be large). Just to be sure, by sharding information do you mean what shard does a key map to? > The mapping can also change, and could quite easily be implemented for > moving a shard to a new machine. We'd just need a way in the server to > return that a database is read only (attempt r/w op on a shard, get back > "currently read only") while doing the migration. After migration, > ideally we could use an error saying "shard has relocated" at which > point the client could update its mapping and connect to the correct > server. That seems a bit inconsistent with your liking for round-trips reduction :). Why do we need to query a server to know if the shard has moved or not? Client side can (or, should) hold that information. vbucket scheme would. Why do you want to make the entire database read/write only? Is there anything wrong with a read, write state being associated with a data partition (here vbucket)? > I say "database" but in future this could be CATALOG (and this would > enforce the no-cross-shard queries rule). A catalog can have multiple tables. And I can have half of it on one machine, and half of it on the other. How does it avoid cross-shard-queries? > I also don't like a mechanism that would require another round trip to > find out which server to connect to in order to run the actual query (it > also pretty much just moves your scaling and availability problem around > rather than solving it at all). vbucket does that :). > I also think that you shouldn't try to solve every problem in the scope > of this project (e.g. migrating shards/vbuckets, dealing with r/o > replicas). Getting the first steps solid and efficient can be enough > work. Agreed. Its also easy to go awry if one tries to deal with too many things. However, I have a couple of point here. First, I started by painting the bigger picture just to get an idea. What I felt is that the topics I touched were important enough to be considered in devising a 'working' sharding solution. The point only was to show that the scheme I came up with is flexible enough to deal with all that. Second, I am on the project to scratch my own itch, so I am actually targeting a longer commitment. The idea was to lay down things I felt were in the scope of the project, and then see, how I can go about it step by step, including what is deliverable in the scope of GSoC, and what before and after. I was thinking of touching this point a little later. > Setting good steps along the way is useful not only for setting goals, > but that as each step is completed (including the first), something > useful enters the tree. I will follow up on this ASAP, on how I plan to go step by step about it. > hope this helps, Definitely :). I hope that is not a part of your signature so that I have to say yes/no everytime :P. -- Anurag Priyam http://about.me/yeban/ _______________________________________________ Mailing list: https://launchpad.net/~drizzle-discuss Post to : [email protected] Unsubscribe : https://launchpad.net/~drizzle-discuss More help : https://help.launchpad.net/ListHelp

