Re: NoSQL Movement?

mk Thu, 04 Mar 2010 06:50:14 -0800

Duncan Booth wrote:

If you look at some of the uses of bigtable you may begin to understandthe tradeoffs that are made with sql. When you use bigtable you haverecords with fields, and you have indices, but there are limitations onthe kinds of queries you can perform: in particular you cannot do joins,but more subtly there is no guarantee that the index is up to date (soyou might miss recent updates or even get data back from a query whenthe data no longer matches the query).

Hmm, I do understand that bigtable is used outside of traditional'enterprisey' contexts, but suppose you did want to do an equivalent ofjoin; is it at all practical or even possible?

I guess when you're forced to use denormalized data, you have tosimultaneously update equivalent columns across many tables yourself,right? Or is there some machinery to assist in that?

By sacrificing some of SQL's power, Google get big benefits: namelyupdating data is a much more localised option. Instead of an updatehaving to lock the indices while they are updated, updates to differentrecords can happen simultaneously possibly on servers on the oppositesides of the world. You can have many, many servers all using the samedata although they may not have identical or completely consistent viewsof that data.

And you still have the global view of the table spread across, say, 2servers, one located in Australia, second in US?

Bigtable impacts on how you store the data: for example you need toavoid reducing data to normal form (no joins!), its much better andcheaper just to store all the data you need directly in each record.Also aggregate values need to be at least partly pre-computed and storedin the database.


So you basically end up with a few big tables or just one big table really?

Suppose on top of 'tweets' table you have 'dweebs' table, and tweets anddweebs sometimes do interact. How would you find such interacting pairs?Would you say "give me some tweets" to tweets table, extract all thedweeb_id keys from tweets and then retrieve all dweebs from dweebs table?

Boiling this down to a concrete example, imagine you wanted to implementa system like twitter. Think carefully about how you'd handle asufficiently high rate of new tweets reliably with a sql database. Nowthink how you'd do the same thing with bigtable: most tweets don'tinteract, so it becomes much easier to see how the load is spread acrossthe servers: each user has the data relevant to them stored near theserver they are using and index changes propagate gradually to the restof the system.

I guess that in a purely imaginary example, you could also combine twodatabases? Say, a tweet bigtable db contains tweet, but with column ofclassical customer_id key that is also a key in traditional RDBMSreferencing particular customer?


Regards,
mk


--
http://mail.python.org/mailman/listinfo/python-list

Re: NoSQL Movement?

Reply via email to