On Fri, Apr 8, 2011 at 8:56 AM, Jason King <[email protected]> wrote: > I would hope to have 1/100th of that level of traffic.
Sure, but realistically what are you expecting and how quickly do you expect to grow? You can spend a HUGE amount of time worrying about something that will never happen, and the reason I'm going down this path is it makes a massive difference in how i might answer the question, or might even mean I'm not able to answer it at all because I don't have experience with massive scaling. Also, as I'll sure I'll prattle on about in the rest of this email, there are at least 1000 things involved with scaling that have absolutely nothing to do with your database design. As an aside if you're concerned with massive scale out of the gate I probably wouldn't start greenfield development with a relational database anyway, or just build the whole thing on Google App Engine, but that's a whole 'nother tangent. > And while it most likely won't, I'd sleep better at night knowing that what > could be my busiest recordset is built in a way that could easily scale, and > is efficient from the start. But that's exactly what I'm asking the questions I'm asking. It's very ill-advised to start at ground zero and say "this might get huge at some point, better look into what Facebook does!" because you'll wind up doing all sorts of things that simply are not applicable to the vast majority of apps on the internet, and prematurely solving all sorts of problems that you may never have. Efficiency is completely relative, and is scalability, and certainly you'd have to be prepared to scale every part of your infrastructure. The database would probably be the hairiest piece (always is with an RDBMS), but there's no magical database design that will future-proof you against scaling issues. I'm a *little* hesitant to say "build the app and worry about scaling later" because that is sometimes taken as "don't worry about efficiency" which isn't at all what I mean. By "worry about it when you need to" I mean just that--don't do things inefficiently now, but also don't burn too many cycles worrying about massive scale since depending on where you're at with everything when that happens you may make different decisions than you would now. > If you had to architect the table structure for a site that would have that > kind of traffic, how would you do it? I have no earthly idea because I've never had to fathom that kind of traffic on my apps. :-) And this is also why I'd strongly urge you to be careful about getting distracted (as Al said--just saw his email come in) by stuff that may never apply. Yes, of course, we all wish you the wildest of success with your endeavor, so please don't take my reality check as anything but just that. What's some of what Facebook did to handle their traffic? * They built their own database server (Cassandra) * They wrote a PHP -> C++ translator to squeeze a bit more speed out of their code * They wrote their own web server (at least I think Tornado is something they wrote) * They use Hive/Hadoop for data warehousing * They use Varnish for front-end caching The list goes on ... again they have problems most of us will never have to try and solve. Well, Alan and Andy deal with this stuff on a daily basis so maybe they'll have some thoughts. :-) All this being said they *do* use MySQL as well but it's a modified version of MySQL that interacts directly with memcached because plain MySQL would never scale to the levels they need. Basically from what I've read data is *stored* in MySQL but OLTP is done directly from memcached (at least last I knew; I'm sure their stuff changes rapidly). Here's some Facebook statistics to put things in perspective. These come from http://highscalability.com/blog/2010/11/4/facebook-at-13-million-queries-per-second-recommends-minimiz.html * Query response times: 4ms reads, 5ms writes. * Rows read per second: 450M peak * Network bytes per second: 38GB peak * Queries per second: 13M peak * Rows changed per second: 3.5M peak * InnoDB disk ops per second: 5.2M peak 90 minute tech talk from one of the guys on one of the FB database teams here: http://www.livestream.com/facebookevents/video?clipId=flv_cc08bf93-7013-41e3-81c9-bfc906ef8442 > I'm leaning towards one principle table that contains all friends, and if > that table grows to a point that I need multiple tables, I will simply > duplicate the table, and call it 'table 2'. Then I could simply note in each > member's core table which table has their friends so that when I need to > lookup the friend's list of a certain member, I just query whichever table > has their data. I'd suggest looking into database scaling methodologies in general. If you start having issues where you're creating second tables, you'll be in the land of clustering, sharding, etc. at that point. Adding another table on the same box won't fix your problem if that's your problem. I know I didn't answer much of anything in here but I hope we can continue the discussion and focus in on what you think you need now vs. all the rest of the stuff I blathered about above. ;-) If it were me, and I was definitely going to use MySQL, I'd just do basic good RDMBS design and worry about scaling if/when I have to. By this I mean you have no reason yet to not follow good relational database practices as far as normalization, designing your one-to-many and many-to-many relationships, etc. -- Matthew Woodward [email protected] http://blog.mattwoodward.com identi.ca / Twitter: @mpwoodward Please do not send me proprietary file formats such as Word, PowerPoint, etc. as attachments. http://www.gnu.org/philosophy/no-word-attachments.html -- official tag/function reference: http://openbd.org/manual/ mailing list - http://groups.google.com/group/openbd?hl=en
