Paul Rubin wrote:

Skip Montanaro <[EMAIL PROTECTED]> writes:

It's more than a bit unfair to compare Wikipedia with Ebay or
Google.  Even though Wikipedia may be running on high-performance
hardware, it's unlikely that they have anything like the underlying
network structure (replication, connection speed, etc), total number
of cpus or monetary resources to throw at the problem that both Ebay
and Google have.  I suspect money trumps LAMP every time.


I certainly agree about the money and hardware resource comparison,
which is why I thought the comparison with 1960's mainframes was
possibly more interesting.  You could not get anywhere near the
performance of today's servers back then, no matter how much money you
spent.  Re connectivity, I wonder what kind of network speed is
available to sites like Ebay that's not available to Jane Webmaster
with a colo rack at some random big ISP.  Also, you and Tim Danieliuk
both mentioned caching in the network (e.g. Akamai).  I'd be
interested to know exactly how that works and how much difference it
makes.

It works by distributing content across end-nodes distributed throughout the infrastructure. I don't think Akamai make any secret of their architecture, so Google (:-) can help you there.

Of course it makes a huge difference, otherwise Google wouldn't have registered their domain name as a CNAME for an Akamai node set.

[OB PyCon] Jeremy Hylton, a Google employee and formerly a PythonLabs employee, will be at PyCon. Why don;t you come along and ask *him*?

But the problems I'm thinking of are really obviously with the server
itself.  This is clear when you try to load a page and your browser
immediately get the static text on the page, followed by a pause while
the server waits for the dynamic stuff to come back from the database.
Serving a Slashdotting-level load of pure static pages on a small box
with Apache isn't too terrible ("Slashdotting" = the spike in hits
that a web site gets when Slashdot's front page links to it).  Doing
that with dynamic pages seems to be much harder.  Something is just
bugging me about this.  SQL servers provide a lot of capability (ACID
for complicated queries and transactions, etc). that most web sites
don't really need.  They pay the price in performance anyway.

Well there's nothing wrong with caching dynamic content when the underlying database isn't terribly volatile and there is no critical requirement for the absolute latest data. Last I heard Google weren't promising that their searches are up-to-the-microsecond in terms of information content.

In terms on a network like Google's talking about "the server" doesn't really make sense: as Sun Microsystems have been saying for about twenty years now, "the network is the computer". There isn't "a server", it's "a distributed service with multiple centers of functionality".

We also know Google has thousands of CPUs (I heard 5,000 at one point and
that was a couple years ago).


It's at least 100,000 and probably several times that ;-).  I've heard
every that search query does billions of cpu operations and crunches
through 100's of megabytes of data (search on "apple banana" and there
are hundreds of millions of pages with each word, so two lists of that
size must be intersected).  100,000 was the published number of
servers several years ago, and there were reasons to believe that they
were purposely understating the real number.

So what's all this about "the server", then? ;-)

regards
 Steve
--
Meet the Python developers and your c.l.py favorites March 23-25
Come to PyCon DC 2005                      http://www.pycon.org/
Steve Holden                           http://www.holdenweb.com/
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to