Paul Rubin wrote: > Andrew Dalke <[EMAIL PROTECTED]> writes: ... >> I found more details at >> http://jeremy.zawodny.com/blog/archives/001866.html >> >> It's a bunch of things - Perl, C, MySQL-InnoDB, MyISAM, Akamai, >> memcached. The linked slides say "lots of MySQL usage." 60 servers. > > LM uses MySQL extensively but what I don't know is whether it serves > up individual pages by the obvious bunch of queries like a smaller BBS > might. I have the impression that it's more carefully tuned than that.
The linked page links to a PDF describing the architecture. The careful tuning comes in part from a high-performance caching system - memcached. >> I don't see that example as validating your statement that >> LAMP doesn't scale for mega-numbers of hits any better than >> whatever you might call "printing press" systems. > > What example? Slashdot? Livejournal. You gave it as a counter example to the LAMP architecture used by /. ] It seems to me that by using implementation methods that ] map more directly onto the hardware, a site with Slashdot's ] traffic levels could run on a single modest PC (maybe a laptop). ] I believe LiveJournal (which has something more like a million ] users) uses methods like that, as does ezboard. Since LJ uses a (highly hand-tuned) LAMP architecture, it isn't an effective counterexample. > It uses way more hardware than it needs to, > at least ten servers and I think a lot more. If LJ is using 6x as > many servers and taking 20x (?) as much traffic as Slashdot, then LJ > is doing something more efficiently than Slashdot. I don't know where the 20x comes from. Registered users? I read /. but haven't logged into it in 5+ years. I know I hit a lot /. more often than I do LJ (there's only one diary I follow there). The use is different as well; all people hit one story / comments page, and the comments are ranked based on reader-defined evaluations. LJ has no one journal that gets anywhere as many hits and there is no ranking scheme. >> I'ld say that few sites have >100k users, much less >> daily users with personalized information. As a totally made-up >> number, only few dozens of sites (maybe a couple hundred?) would >> need to worry about those issues. > > Yes, but for those of us interested in how big sites are put together, > those are the types of sites we have to think about ;-). My apologies since I know this sounds snide, but then why didn't you (re)read the LJ architecture overview I linked to above? That sounds like something you would have been interested in reading and would have directly provided information that counters what you said in your followup. The "ibm-poop-heads" article by Ryan Tomayko gives pointers to several other large-scale LAMP-based web sites. You didn't like the Google one. I checked a couple of the others: IMDB - http://www.findarticles.com/p/articles/mi_zdpcm/is_200408/ai_ziff130634 As you might expect, the site is now co-located with other Amazon.com sites, served up from machines running Linux and Apache, but ironically, most of the IMDb does not use a traditional database back end. Its message boards are built on PostgreSQL, and certain parts of IMDb Pro-including its advanced search-use MySQL, but most of the site is built with good old Perl script. del.icio.us Took some digging but I found http://lists.del.icio.us/pipermail/discuss/2004-November/001421.html "The database gets corrupted because the machine gets power-cycled, not through any fault of MySQL's." The point is that LAMP systems do scale, both down and up. It's a polemic against "architecture astronauts" who believe the only way to handle large sites (and /., LJ, IMDB, and del.icio.us are larger than all but a few sites) is with some spiffy "enterprise" architecture framework. > I'd say > there's more than a few hundred of them, but it's not like there's > millions. And some of them really can't afford to waste so much > hardware--look at the constant Wikipedia fundraising pitches for more > server iron because the Wikimedia software (PHP/MySQL, natch) can't > handle the load. Could that have, for example, bought EnterpriseWeb-O-Rama and done any better/cheaper? Could they have even started the project had they gone that route? > Yes, of course there is [exprience in large-scale web apps]. > Look at the mainframe transaction systems of the 60's-70's-80's, for > example. Look at Google. For the mainframe apps you'll have to toss anything processed in batch mode, like payrolls. What had the customization level and scale comparable to 100K+ sites of today? ATMs? Stock trading? Google is a one-off system. At present there's no other system I know of - especially one with that many users - where a single user request can trigger searches from hundreds of machines. That's all custom software. Or should most servers implement what is in essence a new distributed operating system just to run a web site? > Then there's the tons of experience we all have with LAMP systems. By > putting some effort into seeing where the resources in those things go, > I believe we can do a much better job. In particular, those sites like > Slashdot are really not update intensive in the normal database sense. > They can be handled almost entirely with some serial log files plus some > ram caching. At that point almost all the SQL overhead and a lot of the > context switching can go away. Is /. an appropriate comparable? I get the idea that it hasn't changed much in the last, say, 5 years and the user base hasn't grown much either. What you propose requires programming effort. If the system doesn't need work, if money in > money out (even with expensive hardware), and if the extra work doesn't get much benefit, then is it worthwhile to them to rearchitect the system? Perhaps in a couple years it'll run on two machines (one as the backup), with no change to the code, and simply because the hardware is good enough and cheap enough. Andrew [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list