Sometimes partitioning is absolutely necessary. If you can't run a
cluster - how else can you really scale writes to the database? Some
companies can't use clustering because in 5.0.x (the "non-beta" release)
clustering is all done in memory - all tables have to be in memory (just
like the old heap tables). It isn't until 5.1.x that clustering allows
your data to be stored on disc. Many companies still consider 5.1 to
not be production ready. You might disagree but that is their
thinking. So, if you don't use clustering, how else are you going to
scale an application?
I suppose you can set up master-master replication - but that doesn't
really scale to a large extent. Some companies have huge applications
with hundreds of gigabytes or even terabytes of data. I think if you
read carefully through the presentations from the recent MySQL
conference by companies such as Digg and Flickr you will find that they
do partitioning as well as caching and such. I recall specifically
reading through a presentation by livejournal about how they split up
their load across multiple machines by the very partitioning we are
talking about.
I might be missing something. I can understand why you wouldn't want to
work on such a system as it certainly adds complexity to the entire
database. But that doesn't mean that it isn't something that isn't
necessary sometimes.
Just my two cents :)
Keith
Naz Gassiep wrote:
Data partitioning? Sorry, I disagree that partitioning a table into more
and more servers is the way to scale properly. Perhaps putting
databases' tables onto different servers with different hardware
designed to meat different usage patterns is a good idea, but data
partitioning was a very short lived idea in the world of databases and
I'm glad that as an idea it is dying in practice.
- Naz
Evaldas Imbrasas wrote:
Since the question was about *really* big websites, the answer is both
yes and no.
Yes, they do turn off RI on the database side, simply because it's not
possible to enforce RI on a database system where data is partitioned
across server farms (or shards) both vertically and horizontally. And
really big websites can't survive without the data partioning.
No, they don't usually turn off RI just to improve performance,
because the gains would be minimal, and for big websites, scalability
is a much bigger issue that performance (although sometimes one
depends on the other), and data partitioning is the way to go to solve
the scalability problem.
On 5/24/07, Naz Gassiep <[EMAIL PROTECTED]> wrote:
I'm working in a project at the moment that is using MySQL, and
people keep making assertions like this one:
"*Really* big sites don't ever have referential integrity. Or if the
few spots they do (like with financial transactions) it's implemented
on the application level (via, say, optimistic locking), never the
database level."
A large DB working with no RI would give me nightmares. Is it really
true that large sites turn RI off to improve performance? Am I just
being naive in thinking that everyone runs their DBs with RI in
production?
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]