On Fri, Apr 8, 2011 at 8:56 AM, Jason King <[email protected]> wrote:
> I would hope to have 1/100th of that level of traffic.

Sure, but realistically what are you expecting and how quickly do you
expect to grow? You can spend a HUGE amount of time worrying about
something that will never happen, and the reason I'm going down this
path is it makes a massive difference in how i might answer the
question, or might even mean I'm not able to answer it at all because
I don't have experience with massive scaling.

Also, as I'll sure I'll prattle on about in the rest of this email,
there are at least 1000 things involved with scaling that have
absolutely nothing to do with your database design. As an aside if
you're concerned with massive scale out of the gate I probably
wouldn't start greenfield development with a relational database
anyway, or just build the whole thing on Google App Engine, but that's
a whole 'nother tangent.

> And while it most likely won't, I'd sleep better at night knowing that what
> could be my busiest recordset is built in a way that could easily scale, and
> is efficient from the start.

But that's exactly what I'm asking the questions I'm asking. It's very
ill-advised to start at ground zero and say "this might get huge at
some point, better look into what Facebook does!" because you'll wind
up doing all sorts of things that simply are not applicable to the
vast majority of apps on the internet, and prematurely solving all
sorts of problems that you may never have.

Efficiency is completely relative, and is scalability, and certainly
you'd have to be prepared to scale every part of your infrastructure.
The database would probably be the hairiest piece (always is with an
RDBMS), but there's no magical database design that will future-proof
you against scaling issues.

I'm a *little* hesitant to say "build the app and worry about scaling
later" because that is sometimes taken as "don't worry about
efficiency" which isn't at all what I mean. By "worry about it when
you need to" I mean just that--don't do things inefficiently now, but
also don't burn too many cycles worrying about massive scale since
depending on where you're at with everything when that happens you may
make different decisions than you would now.

> If you had to architect the table structure for a site that would have that
> kind of traffic, how would you do it?

I have no earthly idea because I've never had to fathom that kind of
traffic on my apps. :-) And this is also why I'd strongly urge you to
be careful about getting distracted (as Al said--just saw his email
come in) by stuff that may never apply. Yes, of course, we all wish
you the wildest of success with your endeavor, so please don't take my
reality check as anything but just that.

What's some of what Facebook did to handle their traffic?
* They built their own database server (Cassandra)
* They wrote a PHP -> C++ translator to squeeze a bit more speed out
of their code
* They wrote their own web server (at least I think Tornado is
something they wrote)
* They use Hive/Hadoop for data warehousing
* They use Varnish for front-end caching

The list goes on ... again they have problems most of us will never
have to try and solve. Well, Alan and Andy deal with this stuff on a
daily basis so maybe they'll have some thoughts. :-)

All this being said they *do* use MySQL as well but it's a modified
version of MySQL that interacts directly with memcached because plain
MySQL would never scale to the levels they need. Basically from what
I've read data is *stored* in MySQL but OLTP is done directly from
memcached (at least last I knew; I'm sure their stuff changes
rapidly).

Here's some Facebook statistics to put things in perspective. These
come from 
http://highscalability.com/blog/2010/11/4/facebook-at-13-million-queries-per-second-recommends-minimiz.html
* Query response times: 4ms reads, 5ms writes.
* Rows read per second: 450M peak
* Network bytes per second: 38GB peak
* Queries per second: 13M peak
* Rows changed per second: 3.5M peak
* InnoDB disk ops per second: 5.2M peak

90 minute tech talk from one of the guys on one of the FB database teams here:
http://www.livestream.com/facebookevents/video?clipId=flv_cc08bf93-7013-41e3-81c9-bfc906ef8442

> I'm leaning towards one principle table that contains all friends, and if
> that table grows to a point that I need multiple tables, I will simply
> duplicate the table, and call it 'table 2'. Then I could simply note in each
> member's core table which table has their friends so that when I need to
> lookup the friend's list of a certain member, I just query whichever table
> has their data.

I'd suggest looking into database scaling methodologies in general. If
you start having issues where you're creating second tables, you'll be
in the land of clustering, sharding, etc. at that point. Adding
another table on the same box won't fix your problem if that's your
problem.

I know I didn't answer much of anything in here but I hope we can
continue the discussion and focus in on what you think you need now
vs. all the rest of the stuff I blathered about above. ;-)

If it were me, and I was definitely going to use MySQL, I'd just do
basic good RDMBS design and worry about scaling if/when I have to. By
this I mean you have no reason yet to not follow good relational
database practices as far as normalization, designing your one-to-many
and many-to-many relationships, etc.

-- 
Matthew Woodward
[email protected]
http://blog.mattwoodward.com
identi.ca / Twitter: @mpwoodward

Please do not send me proprietary file formats such as Word,
PowerPoint, etc. as attachments.
http://www.gnu.org/philosophy/no-word-attachments.html

-- 
official tag/function reference: http://openbd.org/manual/
 mailing list - http://groups.google.com/group/openbd?hl=en

Reply via email to