Sometimes partitioning is absolutely necessary. If you can't run a cluster - how else can you really scale writes to the database? Some companies can't use clustering because in 5.0.x (the "non-beta" release) clustering is all done in memory - all tables have to be in memory (just like the old heap tables). It isn't until 5.1.x that clustering allows your data to be stored on disc. Many companies still consider 5.1 to not be production ready. You might disagree but that is their thinking. So, if you don't use clustering, how else are you going to scale an application? I suppose you can set up master-master replication - but that doesn't really scale to a large extent. Some companies have huge applications with hundreds of gigabytes or even terabytes of data. I think if you read carefully through the presentations from the recent MySQL conference by companies such as Digg and Flickr you will find that they do partitioning as well as caching and such. I recall specifically reading through a presentation by livejournal about how they split up their load across multiple machines by the very partitioning we are talking about.

I might be missing something. I can understand why you wouldn't want to work on such a system as it certainly adds complexity to the entire database. But that doesn't mean that it isn't something that isn't necessary sometimes.

Just my two cents  :)

Keith

Naz Gassiep wrote:
Data partitioning? Sorry, I disagree that partitioning a table into more
and more servers is the way to scale properly. Perhaps putting
databases' tables onto different servers with different hardware
designed to meat different usage patterns is a good idea, but data
partitioning was a very short lived idea in the world of databases and
I'm glad that as an idea it is dying in practice.
- Naz

Evaldas Imbrasas wrote:
Since the question was about *really* big websites, the answer is both
yes and no.

Yes, they do turn off RI on the database side, simply because it's not
possible to enforce RI on a database system where data is partitioned
across server farms (or shards) both vertically and horizontally. And
really big websites can't survive without the data partioning.

No, they don't usually turn off RI just to improve performance,
because the gains would be minimal, and for big websites, scalability
is a much bigger issue that performance (although sometimes one
depends on the other), and data partitioning is the way to go to solve
the scalability problem.


On 5/24/07, Naz Gassiep <[EMAIL PROTECTED]> wrote:
I'm working in a project at the moment that is using MySQL, and
people keep making assertions like this one:

"*Really* big sites don't ever have referential integrity. Or if the
few spots they do (like with financial transactions) it's implemented
on the application level (via, say, optimistic locking), never the
database level."

A large DB working with no RI would give me nightmares. Is it really
true that large sites turn RI off to improve performance? Am I just
being naive in thinking that everyone runs their DBs with RI in
production?




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to