RE: Integrity on large sites
In my experience this happens a lot if you put application programmers in charge of the database. I've upset quite a few in my time by introducing RI and then their horribly coded application falls over! -Original Message- From: Peter Brawley [mailto:[EMAIL PROTECTED] Sent: 24 May 2007 17:31 To: Naz Gassiep Cc: mysql@lists.mysql.com Subject: Re: Integrity on large sites Naz, *Really* big sites don't ever have referential integrity. Or if the few spots they do (like with financial transactions) it's implemented on the application level (via, say, optimistic locking), never the database level. Mebbe that view was common in the MySQL community in the time of version 3, when the emphasis was on one site managing one db. Agreed the concept is scary. Try that quote in an Oracle or MSSQL community :-) PB - Naz Gassiep wrote: I'm working in a project at the moment that is using MySQL, and people keep making assertions like this one: *Really* big sites don't ever have referential integrity. Or if the few spots they do (like with financial transactions) it's implemented on the application level (via, say, optimistic locking), never the database level. A large DB working with no RI would give me nightmares. Is it really true that large sites turn RI off to improve performance? Am I just being naive in thinking that everyone runs their DBs with RI in production? This email is confidential and may also be privileged. If you are not the intended recipient please notify us immediately by telephoning +44 (0)20 7452 5300 or email [EMAIL PROTECTED] You should not copy it or use it for any purpose nor disclose its contents to any other person. Touch Local cannot accept liability for statements made which are clearly the sender's own and are not made on behalf of the firm. Touch Local Limited Registered Number: 2885607 VAT Number: GB896112114 Cardinal Tower, 12 Farringdon Road, London EC1M 3NN +44 (0)20 7452 5300 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Integrity on large sites
B. Keith Murphy wrote: Here is the kicker. Each box was a top of the line Sun server that had 32 processors and 32 gigs of RAM. They could handle up to 64 procs and 64 gigs. And each cost well over a million dollars for the hardware alone. Running Oracle on it must have cost over 100,000 dollars for software licenses. Granted this was in 2001, but the licensing cost for Oracle haven't gone down any that I am aware of...and the hardware cost will still be quite steep to do this type of thing. You youngsters may not realize that there were billing applications serving millions of customers long, long before there were any kind of database management systems. They employed concepts called flat files and batch processing. And they ran on machines far weaker than anything any of you have on your desk today. Even under something like MS Windows, it would be absolutely possible to configure 3-5 high speed printers and knock out 100,000 bills per hour from an Intel single CPU box. You really have no appreciation of how much power you actually have at your disposal. Barry Newton -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Integrity on large sites
You youngsters may not realize that there were billing applications serving millions of customers long, long before there were any kind of database management systems. They employed concepts called flat files and batch processing. And they ran on machines far weaker than anything any of you have on your desk today. Even under something like MS Windows, it would be absolutely possible to configure 3-5 high speed printers and knock out 100,000 bills per hour from an Intel single CPU box. You really have no appreciation of how much power you actually have at your disposal. Perhaps you underestimate us, or me at least :-D . The precise reason I am arguing against sharding is because I know that performant design principles as well as optimization and other proper techniques make voodoo like sharding a clever solution to a problem that shouldn't exist with the raw power available in modern hardware. As I said in a previous post, my old laptop could handle a DB that cost the equivalent of a house to manage in a previous age of the IT history. - Naz. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Integrity on large sites
Hey there, thanks for your comments. There are issues where sharding may be appropriate, but you are talking about the heaviest of heavy duty loads. Not only that, hardware is getting to the point where it is surpassing our needs. Remember the days when it cost $200k to run a library database? Nowadays I could run such a DB on my old laptop that I just threw out. The issue is not *only* application complexity, but that is a *major* one, and ignoring it is not just a matter of budget allocation, it's the risk that the complexity hides system collapsing bugs. OneTel, a multi billion dollar telco in Aus that I worked for in 2001 (as a lemming) died partly because the billing system just fell over and died one day, bringing their cash flow to a dead stop. The thing was so complex that debugging it took longer than the cash reserve they had on hand held out for, so the company went belly up and died a gruesome death. I know that monolithic DBs are not manageable after a certain point, but sharding, in my books, is to be avoided wherever possible due to the availability of far better solutions. E.g., the use of table spaces to put each table on its own server. How many companies can say that one of their tables is so large that no single machine can hold it? This approach, database partitioning rather that data partitioning, allows you to design the hardware for each table's access patterns. The other advantage of this method is that an application that was coded with a single machine DB can be scaled to this solution without changing a single line in the app code. Incidentally, I come from the PostgreSQL world where if you truly *must* do data sharding, it can be done at the DB level, transparently to the app code. Regards, - Naz. B. Keith Murphy wrote: OK. Going to try this again. After reading through these emails I think I have learned a little more about the way you are thinking. I DO NOT want to start some kind of flame war. However, I disagree very strongly with what you are saying. Yes, you are right, sharding does require more complexity from the application layer. Sorry for all you developers out there (and I can safely say that I am NOT a developer!!). The fundamental issue for you, as I see it, is the increased complexity caused by sharding the application. That being said, I will say this...if you develop on some other RDBMS such as MS or Oracle is it possible to deleveop something like you are saying...an all-inclusive database that isn't sharded? Yep, when I worked at Netzero in 2001 for example we had two database servers running Oracle, one on the east coast in Virginia and one one the west coast in California. The east coast server was a backup of the west coast server. So one database server did the billing for all of Netzero's customers. Millions of customers..absolutely. All in one nice tidy box that I am sure was easier to develop the billing applications around. Here is the kicker. Each box was a top of the line Sun server that had 32 processors and 32 gigs of RAM. They could handle up to 64 procs and 64 gigs. And each cost well over a million dollars for the hardware alone. Running Oracle on it must have cost over 100,000 dollars for software licenses. Granted this was in 2001, but the licensing cost for Oracle haven't gone down any that I am aware of...and the hardware cost will still be quite steep to do this type of thing. So I ask you this.. Would it be better to go with that scenario or something like this: Implement the billing application using MySQL. Shard it. Create complexity. Your hardware cost saving alone will pay for multiple developers to handle any complexity increases. Any decent DBA is going to be able to handle multiple servers required to operate this setup. You will probably see a decrease in salary cost moving from Oracle to MySQL dbas. So for the bottom line of the company it is a overall win by far. It is only the inherent difficulty in moving complex systems from one type of DB to another that keep more companies from switching. Why hasn't this happend previously?? Because until version 4 of MySQL was stable there were many features not available in MySQL that were needed by these types of systems. It is my contention that as the clustering capabilities of MySQL continue to grow and mature (think of when version 6.0 goes stable) companies will move to MySQL in droves. THEN you have the ability to build a single virtual database (at least from the point of view of your application) that will scale simply and elegantly. As I said in the previous email it is only that 5.1 is in beta that keeps this from being available now. And many companies, such as Kaneva, are doing this right now. The only reason that companies like Digg and Flikr can exist and grow at such phenomenal rates is that they keep the cost of the development of the system to a minimum and the overhead of operating
Re: Integrity on large sites
It is my contention that as the clustering capabilities of MySQL continue to grow and mature (think of when version 6.0 goes stable) companies will move to MySQL in droves. THEN you have the ability to build a single virtual database (at least from the point of view of your application) that will scale simply and elegantly. As I said in the previous email it is only that 5.1 is in beta that keeps this from being available now. And many companies, such as Kaneva, are doing this right now. The only reason that companies like Digg and Flikr can exist and grow at such phenomenal rates is that they keep the cost of the development of the system to a minimum and the overhead of operating (licensing costs and hardware cost) down as low as possible. In addition, of course, they need the ability to scale out very quickly. Digg didn't get any significant funding until just recently. And yet they epitomize the web 2.0 companies. They did it by both keeping their cost down and having the ability to grow quickly. Couldn't have done it with Oracle or MS. Just my thoughts :) Right, sure... No-one cries when Digg loses an article. No gives a rats ass when they loose their comments on Flikr. Real systems with real data NEED features that actually exist in Oracle or SQL Server or any other decent DBMS, that, until recently (and still not quite there yet) just didn't exist in MySQL. Transactions? Proper constraints? (when does MySQL come with Check Constraints?) I'll say again: if you value your data, use constraints wherever possible and use transactions. Martijn Tonies Database Workbench - tool for InterBase, Firebird, MySQL, NexusDB, Oracle MS SQL Server Upscene Productions http://www.upscene.com My thoughts: http://blog.upscene.com/martijn/ Database development questions? Check the forum! http://www.databasedevelopmentforum.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Integrity on large sites
Hi Naz, Just to throw out (plug) an ongoing project: http://www.hivedb.org/ From the site: HiveDB is an open source framework for horizontally partitioning MySQL systems. Building scalable and high performance MySQL-backed systems requires a good deal of expertise in designing the system and implementing the code. One of the main strategies for scaling MySQL is by partitioning your data across many servers. While it is not difficult to accomplish this, it is difficult to do it in such a way that the system is easily maintained and transparent to the developer. We've been working on HiveDB precisely to avoid the large amount of (quite specialized) code in the application. Regards, Jeremy Naz Gassiep wrote: Wow. The problem with sharding I have is the large amount of code required in the app to make it work. IMHO the app should be agnostic to the underlying database system (by that I don't mean the DB in use such as MySQL or whatever or the schema, I mean the way the DB has been deployed) so that changes can be made to it without having to worry about impacting app code. This is one of my fundamental design imperatives. Then again, I'm not a regular MySQL user so I don't know what is and is not the norm in the MySQL world. - Naz. Evaldas Imbrasas wrote: You certainly have a right to disagree, but pretty much every scalability talk at the MySQL conference a few weeks ago was focused on data partitioning and sharding. And those talks very given by folks working for some of the most popular (top 100) websites in the world. It certainly looks like data partitioning is the way to go in the MySQL world at this point, probably at least until production-ready and feature-full MySQL Cluster is out. And even then large percentage of dotcom companies would use data partitioning instead since it can be implemented on commodity hardware. Once again, we're talking *really* big websites using MySQL (not Oracle or SQL Server or whatever) here. Most websites won't ever need to partition their production databases, and different RDMS might have different approaches for scalability. On 5/24/07, Naz Gassiep [EMAIL PROTECTED] wrote: Data partitioning? Sorry, I disagree that partitioning a table into more and more servers is the way to scale properly. Perhaps putting databases' tables onto different servers with different hardware designed to meat different usage patterns is a good idea, but data partitioning was a very short lived idea in the world of databases and I'm glad that as an idea it is dying in practice. -- high performance mysql consulting www.provenscaling.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Integrity on large sites
Naz, *Really* big sites don't ever have referential integrity. Or if the few spots they do (like with financial transactions) it's implemented on the application level (via, say, optimistic locking), never the database level. Mebbe that view was common in the MySQL community in the time of version 3, when the emphasis was on one site managing one db. Agreed the concept is scary. Try that quote in an Oracle or MSSQL community :-) PB - Naz Gassiep wrote: I'm working in a project at the moment that is using MySQL, and people keep making assertions like this one: *Really* big sites don't ever have referential integrity. Or if the few spots they do (like with financial transactions) it's implemented on the application level (via, say, optimistic locking), never the database level. A large DB working with no RI would give me nightmares. Is it really true that large sites turn RI off to improve performance? Am I just being naive in thinking that everyone runs their DBs with RI in production?
Re: Integrity on large sites
I'm working in a project at the moment that is using MySQL, and people keep making assertions like this one: *Really* big sites don't ever have referential integrity. Or if the few spots they do (like with financial transactions) it's implemented on the application level (via, say, optimistic locking), never the database level. A large DB working with no RI would give me nightmares. Is it really true that large sites turn RI off to improve performance? Am I just being naive in thinking that everyone runs their DBs with RI in production? If you don't value your data, then choose not to use RI. If you DO value your data, run with as much valid constraints as you can. After all, that's the whole idea behind constraints :-) Martijn Tonies Database Workbench - development tool for MySQL, and more! Upscene Productions http://www.upscene.com My thoughts: http://blog.upscene.com/martijn/ Database development questions? Check the forum! http://www.databasedevelopmentforum.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Integrity on large sites
Naz, Without going into detail about various projects I've seen, surfice it to say that I have wittnessed some true horrors. In defence however, the largest abomination I have ever witnessed was from an MS shop that had grown a database from a MS Access system upward and had then, bluntly bolted MySQL in to the mix so that they could expose it to the web (stop laughing ;P). It has however nothing to do with the specific database, just as you can write shoddy code in C++ or PHP, database abominations know no vendor boundaries. I think a large number of people reading this may agree when I say that commercial (you may read time money as the obvious subtexts) pressues to produce quick, cheap and working solutions are the real reason such things as documentation, proper requirements gathering and analysis, design and QA testing are the first against the wall when such pressures begin to bite or clients haggle on price. So, I'm afraid in concluesion Yes, you are being naive in thinking that everyone runs their DBs with RI in production. No they don't turn it off, they never build it in and if they do turn it off it's not for performance gains. The counter argument to that would be that it's fairly concievable that if you implemented a solution in a development enviroment with RI constraints, tested it carefully and completely, put it into production and perhaps ran it for a month or two then turned all the RI off that it would still hold water well enough to be a viable commercial solution. Not an argument I'd serious back but one you could make at any rate And finally Yes, it's a nightmare in such situations. Without whoring I should perhaps state at this juncture that my current employer does not produce such solutions. We have design and analysis procedures, a QA department, people with common-sense etc... to ensure that we avoid such things. Regards, Phil On 24/05/07, Naz Gassiep [EMAIL PROTECTED] wrote: I'm working in a project at the moment that is using MySQL, and people keep making assertions like this one: *Really* big sites don't ever have referential integrity. Or if the few spots they do (like with financial transactions) it's implemented on the application level (via, say, optimistic locking), never the database level. A large DB working with no RI would give me nightmares. Is it really true that large sites turn RI off to improve performance? Am I just being naive in thinking that everyone runs their DBs with RI in production? -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- Regards, Phil
Re: Integrity on large sites
Since the question was about *really* big websites, the answer is both yes and no. Yes, they do turn off RI on the database side, simply because it's not possible to enforce RI on a database system where data is partitioned across server farms (or shards) both vertically and horizontally. And really big websites can't survive without the data partioning. No, they don't usually turn off RI just to improve performance, because the gains would be minimal, and for big websites, scalability is a much bigger issue that performance (although sometimes one depends on the other), and data partitioning is the way to go to solve the scalability problem. On 5/24/07, Naz Gassiep [EMAIL PROTECTED] wrote: I'm working in a project at the moment that is using MySQL, and people keep making assertions like this one: *Really* big sites don't ever have referential integrity. Or if the few spots they do (like with financial transactions) it's implemented on the application level (via, say, optimistic locking), never the database level. A large DB working with no RI would give me nightmares. Is it really true that large sites turn RI off to improve performance? Am I just being naive in thinking that everyone runs their DBs with RI in production? -- - Evaldas Imbrasas http://www.imbrasas.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Integrity on large sites
Data partitioning? Sorry, I disagree that partitioning a table into more and more servers is the way to scale properly. Perhaps putting databases' tables onto different servers with different hardware designed to meat different usage patterns is a good idea, but data partitioning was a very short lived idea in the world of databases and I'm glad that as an idea it is dying in practice. - Naz Evaldas Imbrasas wrote: Since the question was about *really* big websites, the answer is both yes and no. Yes, they do turn off RI on the database side, simply because it's not possible to enforce RI on a database system where data is partitioned across server farms (or shards) both vertically and horizontally. And really big websites can't survive without the data partioning. No, they don't usually turn off RI just to improve performance, because the gains would be minimal, and for big websites, scalability is a much bigger issue that performance (although sometimes one depends on the other), and data partitioning is the way to go to solve the scalability problem. On 5/24/07, Naz Gassiep [EMAIL PROTECTED] wrote: I'm working in a project at the moment that is using MySQL, and people keep making assertions like this one: *Really* big sites don't ever have referential integrity. Or if the few spots they do (like with financial transactions) it's implemented on the application level (via, say, optimistic locking), never the database level. A large DB working with no RI would give me nightmares. Is it really true that large sites turn RI off to improve performance? Am I just being naive in thinking that everyone runs their DBs with RI in production? -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Integrity on large sites
Sometimes partitioning is absolutely necessary. If you can't run a cluster - how else can you really scale writes to the database? Some companies can't use clustering because in 5.0.x (the non-beta release) clustering is all done in memory - all tables have to be in memory (just like the old heap tables). It isn't until 5.1.x that clustering allows your data to be stored on disc. Many companies still consider 5.1 to not be production ready. You might disagree but that is their thinking. So, if you don't use clustering, how else are you going to scale an application? I suppose you can set up master-master replication - but that doesn't really scale to a large extent. Some companies have huge applications with hundreds of gigabytes or even terabytes of data. I think if you read carefully through the presentations from the recent MySQL conference by companies such as Digg and Flickr you will find that they do partitioning as well as caching and such. I recall specifically reading through a presentation by livejournal about how they split up their load across multiple machines by the very partitioning we are talking about. I might be missing something. I can understand why you wouldn't want to work on such a system as it certainly adds complexity to the entire database. But that doesn't mean that it isn't something that isn't necessary sometimes. Just my two cents :) Keith Naz Gassiep wrote: Data partitioning? Sorry, I disagree that partitioning a table into more and more servers is the way to scale properly. Perhaps putting databases' tables onto different servers with different hardware designed to meat different usage patterns is a good idea, but data partitioning was a very short lived idea in the world of databases and I'm glad that as an idea it is dying in practice. - Naz Evaldas Imbrasas wrote: Since the question was about *really* big websites, the answer is both yes and no. Yes, they do turn off RI on the database side, simply because it's not possible to enforce RI on a database system where data is partitioned across server farms (or shards) both vertically and horizontally. And really big websites can't survive without the data partioning. No, they don't usually turn off RI just to improve performance, because the gains would be minimal, and for big websites, scalability is a much bigger issue that performance (although sometimes one depends on the other), and data partitioning is the way to go to solve the scalability problem. On 5/24/07, Naz Gassiep [EMAIL PROTECTED] wrote: I'm working in a project at the moment that is using MySQL, and people keep making assertions like this one: *Really* big sites don't ever have referential integrity. Or if the few spots they do (like with financial transactions) it's implemented on the application level (via, say, optimistic locking), never the database level. A large DB working with no RI would give me nightmares. Is it really true that large sites turn RI off to improve performance? Am I just being naive in thinking that everyone runs their DBs with RI in production? -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Integrity on large sites
You certainly have a right to disagree, but pretty much every scalability talk at the MySQL conference a few weeks ago was focused on data partitioning and sharding. And those talks very given by folks working for some of the most popular (top 100) websites in the world. It certainly looks like data partitioning is the way to go in the MySQL world at this point, probably at least until production-ready and feature-full MySQL Cluster is out. And even then large percentage of dotcom companies would use data partitioning instead since it can be implemented on commodity hardware. Once again, we're talking *really* big websites using MySQL (not Oracle or SQL Server or whatever) here. Most websites won't ever need to partition their production databases, and different RDMS might have different approaches for scalability. On 5/24/07, Naz Gassiep [EMAIL PROTECTED] wrote: Data partitioning? Sorry, I disagree that partitioning a table into more and more servers is the way to scale properly. Perhaps putting databases' tables onto different servers with different hardware designed to meat different usage patterns is a good idea, but data partitioning was a very short lived idea in the world of databases and I'm glad that as an idea it is dying in practice. -- - Evaldas Imbrasas http://www.imbrasas.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Integrity on large sites
Wow. The problem with sharding I have is the large amount of code required in the app to make it work. IMHO the app should be agnostic to the underlying database system (by that I don't mean the DB in use such as MySQL or whatever or the schema, I mean the way the DB has been deployed) so that changes can be made to it without having to worry about impacting app code. This is one of my fundamental design imperatives. Then again, I'm not a regular MySQL user so I don't know what is and is not the norm in the MySQL world. - Naz. Evaldas Imbrasas wrote: You certainly have a right to disagree, but pretty much every scalability talk at the MySQL conference a few weeks ago was focused on data partitioning and sharding. And those talks very given by folks working for some of the most popular (top 100) websites in the world. It certainly looks like data partitioning is the way to go in the MySQL world at this point, probably at least until production-ready and feature-full MySQL Cluster is out. And even then large percentage of dotcom companies would use data partitioning instead since it can be implemented on commodity hardware. Once again, we're talking *really* big websites using MySQL (not Oracle or SQL Server or whatever) here. Most websites won't ever need to partition their production databases, and different RDMS might have different approaches for scalability. On 5/24/07, Naz Gassiep [EMAIL PROTECTED] wrote: Data partitioning? Sorry, I disagree that partitioning a table into more and more servers is the way to scale properly. Perhaps putting databases' tables onto different servers with different hardware designed to meat different usage patterns is a good idea, but data partitioning was a very short lived idea in the world of databases and I'm glad that as an idea it is dying in practice. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Integrity on large sites
OK. Going to try this again. After reading through these emails I think I have learned a little more about the way you are thinking. I DO NOT want to start some kind of flame war. However, I disagree very strongly with what you are saying. Yes, you are right, sharding does require more complexity from the application layer. Sorry for all you developers out there (and I can safely say that I am NOT a developer!!). The fundamental issue for you, as I see it, is the increased complexity caused by sharding the application. That being said, I will say this...if you develop on some other RDBMS such as MS or Oracle is it possible to deleveop something like you are saying...an all-inclusive database that isn't sharded? Yep, when I worked at Netzero in 2001 for example we had two database servers running Oracle, one on the east coast in Virginia and one one the west coast in California. The east coast server was a backup of the west coast server. So one database server did the billing for all of Netzero's customers. Millions of customers..absolutely. All in one nice tidy box that I am sure was easier to develop the billing applications around. Here is the kicker. Each box was a top of the line Sun server that had 32 processors and 32 gigs of RAM. They could handle up to 64 procs and 64 gigs. And each cost well over a million dollars for the hardware alone. Running Oracle on it must have cost over 100,000 dollars for software licenses. Granted this was in 2001, but the licensing cost for Oracle haven't gone down any that I am aware of...and the hardware cost will still be quite steep to do this type of thing. So I ask you this.. Would it be better to go with that scenario or something like this: Implement the billing application using MySQL. Shard it. Create complexity. Your hardware cost saving alone will pay for multiple developers to handle any complexity increases. Any decent DBA is going to be able to handle multiple servers required to operate this setup. You will probably see a decrease in salary cost moving from Oracle to MySQL dbas. So for the bottom line of the company it is a overall win by far. It is only the inherent difficulty in moving complex systems from one type of DB to another that keep more companies from switching. Why hasn't this happend previously?? Because until version 4 of MySQL was stable there were many features not available in MySQL that were needed by these types of systems. It is my contention that as the clustering capabilities of MySQL continue to grow and mature (think of when version 6.0 goes stable) companies will move to MySQL in droves. THEN you have the ability to build a single virtual database (at least from the point of view of your application) that will scale simply and elegantly. As I said in the previous email it is only that 5.1 is in beta that keeps this from being available now. And many companies, such as Kaneva, are doing this right now. The only reason that companies like Digg and Flikr can exist and grow at such phenomenal rates is that they keep the cost of the development of the system to a minimum and the overhead of operating (licensing costs and hardware cost) down as low as possible. In addition, of course, they need the ability to scale out very quickly. Digg didn't get any significant funding until just recently. And yet they epitomize the web 2.0 companies. They did it by both keeping their cost down and having the ability to grow quickly. Couldn't have done it with Oracle or MS. Just my thoughts :) Keith Naz Gassiep wrote: Wow. The problem with sharding I have is the large amount of code required in the app to make it work. IMHO the app should be agnostic to the underlying database system (by that I don't mean the DB in use such as MySQL or whatever or the schema, I mean the way the DB has been deployed) so that changes can be made to it without having to worry about impacting app code. This is one of my fundamental design imperatives. Then again, I'm not a regular MySQL user so I don't know what is and is not the norm in the MySQL world. - Naz. Evaldas Imbrasas wrote: You certainly have a right to disagree, but pretty much every scalability talk at the MySQL conference a few weeks ago was focused on data partitioning and sharding. And those talks very given by folks working for some of the most popular (top 100) websites in the world. It certainly looks like data partitioning is the way to go in the MySQL world at this point, probably at least until production-ready and feature-full MySQL Cluster is out. And even then large percentage of dotcom companies would use data partitioning instead since it can be implemented on commodity hardware. Once again, we're talking *really* big websites using MySQL (not Oracle or SQL Server or whatever) here. Most websites won't ever need to partition their production databases, and different