Re: Dealing with marketing types...

Andrew Dalke Sun, 12 Jun 2005 14:55:43 -0700

Paul Rubin wrote:
> Andrew Dalke <[EMAIL PROTECTED]> writes:
  ...
>> I found more details at
>> http://jeremy.zawodny.com/blog/archives/001866.html
>> 
>> It's a bunch of things - Perl, C, MySQL-InnoDB, MyISAM, Akamai,
>> memcached.  The linked slides say "lots of MySQL usage." 60 servers.
> 
> LM uses MySQL extensively but what I don't know is whether it serves
> up individual pages by the obvious bunch of queries like a smaller BBS
> might.  I have the impression that it's more carefully tuned than that.


The linked page links to a PDF describing the architecture.
The careful tuning comes in part from a high-performance caching
system - memcached.

>> I don't see that example as validating your statement that
>> LAMP doesn't scale for mega-numbers of hits any better than
>> whatever you might call "printing press" systems.
> 
> What example?  Slashdot?

Livejournal.  You gave it as a counter example to the LAMP
architecture used by /.

]  It seems to me that by using implementation methods that
] map more directly onto the hardware, a site with Slashdot's
] traffic levels could run on a single modest PC (maybe a laptop).
] I believe LiveJournal (which has something more like a million
] users) uses methods like that, as does ezboard. 

Since LJ uses a (highly hand-tuned) LAMP architecture, it isn't
an effective counterexample.

>  It uses way more hardware than it needs to,
> at least ten servers and I think a lot more.  If LJ is using 6x as
> many servers and taking 20x (?) as much traffic as Slashdot, then LJ
> is doing something more efficiently than Slashdot.  

I don't know where the 20x comes from.  Registered users?  I
read /. but haven't logged into it in 5+ years.  I know I
hit a lot /. more often than I do LJ (there's only one diary
I follow there).  The use is different as well; all people
hit one story / comments page, and the comments are ranked
based on reader-defined evaluations.  LJ has no one journal
that gets anywhere as many hits and there is no ranking scheme.

>> I'ld say that few sites have >100k users, much less
>> daily users with personalized information. As a totally made-up
>> number, only few dozens of sites (maybe a couple hundred?) would
>> need to worry about those issues.
> 
> Yes, but for those of us interested in how big sites are put together,
> those are the types of sites we have to think about ;-).

My apologies since I know this sounds snide, but then why didn't
you (re)read the LJ architecture overview I linked to above?
That sounds like something you would have been interested in
reading and would have directly provided information that
counters what you said in your followup.

The "ibm-poop-heads" article by Ryan Tomayko gives pointers to 
several other large-scale LAMP-based web sites.  You didn't
like the Google one.  I checked a couple of the others:

  IMDB -
  http://www.findarticles.com/p/articles/mi_zdpcm/is_200408/ai_ziff130634
  As you might expect, the site is now co-located with other Amazon.com
  sites, served up from machines running Linux and Apache, but ironically,
  most of the IMDb does not use a traditional database back end. Its
  message boards are built on PostgreSQL, and certain parts of IMDb
  Pro-including its advanced search-use MySQL, but most of the site is
  built with good old Perl script.

  del.icio.us
  Took some digging but I found
  http://lists.del.icio.us/pipermail/discuss/2004-November/001421.html
  "The database gets corrupted because the machine gets power-cycled,
  not through any fault of MySQL's."

The point is that LAMP systems do scale, both down and up.  It's
a polemic against "architecture astronauts" who believe the only
way to handle large sites (and /., LJ, IMDB, and del.icio.us are
larger than all but a few sites) is with some spiffy "enterprise"
architecture framework.

> I'd say
> there's more than a few hundred of them, but it's not like there's
> millions.  And some of them really can't afford to waste so much
> hardware--look at the constant Wikipedia fundraising pitches for more
> server iron because the Wikimedia software (PHP/MySQL, natch) can't
> handle the load.

Could that have, for example, bought EnterpriseWeb-O-Rama and done
any better/cheaper?  Could they have even started the project
had they gone that route?

> Yes, of course there is [exprience in large-scale web apps]. 
> Look at the mainframe transaction systems of the 60's-70's-80's, for
> example. Look at Google.

For the mainframe apps you'll have to toss anything processed
in batch mode, like payrolls.  What had the customization level
and scale comparable to 100K+ sites of today?  ATMs?  Stock trading?

Google is a one-off system.  At present there's no other system
I know of - especially one with that many users - where a single
user request can trigger searches from hundreds of machines.
That's all custom software.  Or should most servers implement
what is in essence a new distributed operating system just to
run a web site?

>  Then there's the tons of experience we all have with LAMP systems.  By
> putting some effort into seeing where the resources in those things go,
> I believe we can do a much better job.  In particular, those sites like
> Slashdot are really not update intensive in the normal database sense.
> They can be handled almost entirely with some serial log files plus some
> ram caching.  At that point almost all the SQL overhead and a lot of the
> context switching can go away.

Is /. an appropriate comparable?  I get the idea that it hasn't
changed much in the last, say, 5 years and the user base hasn't
grown much either.  What you propose requires programming effort.
If the system doesn't need work, if money in > money out (even with
expensive hardware), and if the extra work doesn't get much benefit,
then is it worthwhile to them to rearchitect the system?

Perhaps in a couple years it'll run on two machines (one as the
backup), with no change to the code, and simply because the
hardware is good enough and cheap enough.

                                Andrew
                                [EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Dealing with marketing types...

Reply via email to