Re: FW: MySQL patches from Google

David T. Ashley Wed, 25 Apr 2007 19:21:03 -0700

On 4/25/07, mos <[EMAIL PROTECTED]> wrote:

At 02:36 PM 4/25/2007, you wrote:
>On 4/25/07, Daevid Vincent <[EMAIL PROTECTED]> wrote:
>>
>>A co-worker sent this to me, thought I'd pass it along here. We do tons
of
>>failover/replication and would be eager to see mySQL implment the Google
>>patches in the stock distribution. If anyone needs mission critical,
>>scaleable, and failover clusters, it's Google -- so I have every
>>confidence
>>their patches are solid and worthy of inclusion...
>
>
>This isn't surprising for Google.  They've done the same thing to Linux.
>
>I don't know much about Google's infrastructure these days, but several
>years ago they had a server farm of about 2,000 identical x86 Linux
machines
>serving out search requests.  Each machine had a local hard disk
containing
>the most recent copy of the search database.

So you're saying they had a MySQL database on the same machine as the
webserver? Or maybe 1 webserver machine and one MySQL machine?
I would have thought a single MySQL database could handle the requests
from
25-50 webservers easily. Trying to  maintain 2000 copies of the same
database requires a lot of disk writes. I know Google today is rumored to
have over 100,000 web servers and it would be impossible to have that many
databases in sync at all times.



When I read the article some years ago, I got the impression that it was a
custom database solution (i.e. nothing to do with MySQL).

If you think about it, for a read-only database where the design was known
in advance, nearly anybody on this list could write a database solution in
'C' that would outperform MySQL (generality always has a cost).

Additionally, if you think about it, if you have some time to crunch on the
data and the data set doesn't change until the next data set is released,
you can probably optimize it in ways that are unavailable to MySQL because
of the high INSERT cost.  There might even be enough time to tune a hash
function that won't collide much on the data set involved so that the query
cost becomes O(1) rather than O(log N).  You can't do that in real time on
an INSERT.  It may take days to crunch data in that way.

My understanding was the Google's search servers had custom software
operating on a custom database format.  My understanding was also that
each search server had a full copy of the database (i.e. no additional
network traffic involved in providing search results).

As far as keeping 100,000 servers in sync, my guess would be that most of
the data is distilled for search by other machines and then it is rolled out
automatically in a way to keep just a small fraction of the search servers
offline at any one time.

Re: FW: MySQL patches from Google

Reply via email to