Paul Rubin wrote:
It works by distributing content across end-nodes distributed throughout the infrastructure. I don't think Akamai make any secret of their architecture, so Google (:-) can help you there.Skip Montanaro <[EMAIL PROTECTED]> writes:
It's more than a bit unfair to compare Wikipedia with Ebay or Google. Even though Wikipedia may be running on high-performance hardware, it's unlikely that they have anything like the underlying network structure (replication, connection speed, etc), total number of cpus or monetary resources to throw at the problem that both Ebay and Google have. I suspect money trumps LAMP every time.
I certainly agree about the money and hardware resource comparison, which is why I thought the comparison with 1960's mainframes was possibly more interesting. You could not get anywhere near the performance of today's servers back then, no matter how much money you spent. Re connectivity, I wonder what kind of network speed is available to sites like Ebay that's not available to Jane Webmaster with a colo rack at some random big ISP. Also, you and Tim Danieliuk both mentioned caching in the network (e.g. Akamai). I'd be interested to know exactly how that works and how much difference it makes.
Of course it makes a huge difference, otherwise Google wouldn't have registered their domain name as a CNAME for an Akamai node set.
[OB PyCon] Jeremy Hylton, a Google employee and formerly a PythonLabs employee, will be at PyCon. Why don;t you come along and ask *him*?
Well there's nothing wrong with caching dynamic content when the underlying database isn't terribly volatile and there is no critical requirement for the absolute latest data. Last I heard Google weren't promising that their searches are up-to-the-microsecond in terms of information content.But the problems I'm thinking of are really obviously with the server itself. This is clear when you try to load a page and your browser immediately get the static text on the page, followed by a pause while the server waits for the dynamic stuff to come back from the database. Serving a Slashdotting-level load of pure static pages on a small box with Apache isn't too terrible ("Slashdotting" = the spike in hits that a web site gets when Slashdot's front page links to it). Doing that with dynamic pages seems to be much harder. Something is just bugging me about this. SQL servers provide a lot of capability (ACID for complicated queries and transactions, etc). that most web sites don't really need. They pay the price in performance anyway.
In terms on a network like Google's talking about "the server" doesn't really make sense: as Sun Microsystems have been saying for about twenty years now, "the network is the computer". There isn't "a server", it's "a distributed service with multiple centers of functionality".
We also know Google has thousands of CPUs (I heard 5,000 at one point and that was a couple years ago).
It's at least 100,000 and probably several times that ;-). I've heard every that search query does billions of cpu operations and crunches through 100's of megabytes of data (search on "apple banana" and there are hundreds of millions of pages with each word, so two lists of that size must be intersected). 100,000 was the published number of servers several years ago, and there were reasons to believe that they were purposely understating the real number.
So what's all this about "the server", then? ;-)
regards Steve -- Meet the Python developers and your c.l.py favorites March 23-25 Come to PyCon DC 2005 http://www.pycon.org/ Steve Holden http://www.holdenweb.com/ -- http://mail.python.org/mailman/listinfo/python-list