RE: Integrity on large sites

2007-05-25 Thread Rhys Campbell
In my experience this happens a lot if you put application programmers in
charge of the database. I've upset quite a few in my time by introducing RI
and then their horribly coded application falls over!

-Original Message-
From: Peter Brawley [mailto:[EMAIL PROTECTED]
Sent: 24 May 2007 17:31
To: Naz Gassiep
Cc: mysql@lists.mysql.com
Subject: Re: Integrity on large sites


Naz,

 *Really* big sites don't ever have referential integrity. Or if the 
few spots
 they do (like with financial transactions) it's implemented on the 
application
 level (via, say, optimistic locking), never the database level.

Mebbe that view was common in the MySQL community in the time of version 
3, when the emphasis was on one site managing one db. Agreed the concept 
is scary. Try that quote in an Oracle or MSSQL community :-)

PB

-


Naz Gassiep wrote:
 I'm working in a project at the moment that is using MySQL, and people
keep making assertions like this one:

 *Really* big sites don't ever have referential integrity. Or if the few
spots they do (like with financial transactions) it's implemented on the
application level (via, say, optimistic locking), never the database level.

 A large DB working with no RI would give me nightmares. Is it really true
that large sites turn RI off to improve performance? Am I just being naive
in thinking that everyone runs their DBs with RI in production?


   

This email is confidential and may also be privileged. If you are not the 
intended recipient please notify us immediately by telephoning +44 (0)20 7452 
5300 or email [EMAIL PROTECTED] You should not copy it or use it for any 
purpose nor disclose its contents to any other person. Touch Local cannot 
accept liability for statements made which are clearly the sender's own and are 
not made on behalf of the firm.

Touch Local Limited
Registered Number: 2885607
VAT Number: GB896112114
Cardinal Tower, 12 Farringdon Road, London EC1M 3NN
+44 (0)20 7452 5300


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Integrity on large sites

2007-05-25 Thread Barry Newton

B. Keith Murphy wrote:
Here is the kicker.  Each box was a top of the line Sun server that 
had 32 processors and 32 gigs of RAM.  They could handle up to 64 
procs and 64 gigs.  And each cost well over a million dollars for 
the hardware alone.  Running Oracle on it must have cost over 
100,000 dollars for software licenses.  Granted this was in 2001, 
but the licensing cost for Oracle haven't gone down any that I am 
aware of...and the hardware cost will still be quite steep to do 
this type of thing.


You youngsters may not realize that there were billing applications 
serving millions of customers long, long before there were any kind 
of database management systems.  They employed concepts called flat 
files and batch processing.  And they ran on machines far weaker 
than anything any of you have on your desk today.  Even under 
something like MS Windows, it would be absolutely possible to 
configure 3-5 high speed printers and knock out 100,000 bills per 
hour from an Intel single CPU box.  You really have no appreciation 
of how much power you actually have at your disposal.



Barry Newton



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Integrity on large sites

2007-05-25 Thread Naz Gassiep

 You youngsters may not realize that there were billing applications
 serving millions of customers long, long before there were any kind of
 database management systems.  They employed concepts called flat
 files and batch processing.  And they ran on machines far weaker
 than anything any of you have on your desk today.  Even under
 something like MS Windows, it would be absolutely possible to
 configure 3-5 high speed printers and knock out 100,000 bills per hour
 from an Intel single CPU box.  You really have no appreciation of how
 much power you actually have at your disposal. 

Perhaps you underestimate us, or me at least  :-D . The precise reason I
am arguing against sharding is because I know that performant design
principles as well as optimization and other proper techniques make
voodoo like sharding a clever solution to a problem that shouldn't exist
with the raw power available in modern hardware. As I said in a previous
post, my old laptop could handle a DB that cost the equivalent of a
house to manage in a previous age of the IT history.
- Naz.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Integrity on large sites

2007-05-25 Thread Naz Gassiep
Hey there, thanks for your comments. There are issues where sharding may
be appropriate, but you are talking about the heaviest of heavy duty
loads. Not only that, hardware is getting to the point where it is
surpassing our needs. Remember the days when it cost $200k to run a
library database? Nowadays I could run such a DB on my old laptop that I
just threw out.

The issue is not *only* application complexity, but that is a *major*
one, and ignoring it is not just a matter of budget allocation, it's the
risk that the complexity hides system collapsing bugs. OneTel, a multi
billion dollar telco in Aus that I worked for in 2001 (as a lemming)
died partly because the billing system just fell over and died one day,
bringing their cash flow to a dead stop. The thing was so complex that
debugging it took longer than the cash reserve they had on hand held out
for, so the company went belly up and died a gruesome death.

I know that monolithic DBs are not manageable after a certain point, but
sharding, in my books, is to be avoided wherever possible due to the
availability of far better solutions. E.g., the use of table spaces to
put each table on its own server. How many companies can say that one of
their tables is so large that no single machine can hold it? This
approach, database partitioning rather that data partitioning, allows
you to design the hardware for each table's access patterns. The other
advantage of this method is that an application that was coded with a
single machine DB can be scaled to this solution without changing a
single line in the app code.

Incidentally, I come from the PostgreSQL world where if you truly *must*
do data sharding, it can be done at the DB level, transparently to the
app code.

Regards,
- Naz.

B. Keith Murphy wrote:
 OK.  Going to try this again.  After reading through these emails I
 think I have learned a little more about the way you are thinking.
 I DO NOT want to start some kind of flame war.
 However, I disagree very strongly with what you are saying.  Yes, you
 are right, sharding does require more complexity from the application
 layer.  Sorry for all you developers out there (and I can safely say
 that I am NOT a developer!!).
 The fundamental issue for you, as I see it, is the increased
 complexity caused by sharding the application.

 That being said, I will say this...if you develop on some other RDBMS
 such as MS or Oracle is it possible to deleveop something like you are
 saying...an all-inclusive database that isn't sharded?  Yep, when I
 worked at Netzero in 2001 for example we had two database servers
 running Oracle, one on the east coast in Virginia and one one the west
 coast in California.  The east coast server was a backup of the west
 coast server.  So one database server did the billing for all of
 Netzero's customers.  Millions of customers..absolutely.  All in one
 nice tidy box that I am sure was easier to develop the billing
 applications around.

 Here is the kicker.  Each box was a top of the line Sun server that
 had 32 processors and 32 gigs of RAM.  They could handle up to 64
 procs and 64 gigs.  And each cost well over a million dollars for the
 hardware alone.  Running Oracle on it must have cost over 100,000
 dollars for software licenses.  Granted this was in 2001, but the
 licensing cost for Oracle haven't gone down any that I am aware
 of...and the hardware cost will still be quite steep to do this type
 of thing.
 So I ask you this..

 Would it be better to go with that scenario or something like this:

 Implement the billing application using MySQL.  Shard it.  Create
 complexity.  Your hardware cost saving alone will pay for multiple
 developers to handle any complexity increases.  Any decent DBA is
 going to be able to handle multiple servers required to operate this
 setup.  You will probably see a decrease in salary cost moving from
 Oracle to MySQL dbas.
 So for the bottom line of the company it is a overall win by far.  It
 is only the inherent difficulty in moving complex systems from one
 type of DB to another that keep more companies from switching.  Why
 hasn't this happend previously??  Because until version 4 of MySQL was
 stable there were many features not available in MySQL that were
 needed by these types of systems.

 It is my contention that as the clustering capabilities of MySQL
 continue to grow and mature (think of when version 6.0 goes stable)
 companies will move to MySQL in droves.  THEN you have the ability to
 build a single virtual database (at least from the point of view of
 your application) that will scale simply and elegantly.  As I said in
 the previous email it is only that 5.1 is in beta that keeps this from
 being available now.  And many companies, such as Kaneva, are doing
 this right now.
 The only reason that companies like Digg and Flikr can exist and grow
 at such phenomenal rates is that they keep the cost of the development
 of the system to a minimum and the overhead of operating 

Re: Integrity on large sites

2007-05-25 Thread Martijn Tonies

 It is my contention that as the clustering capabilities of MySQL
 continue to grow and mature (think of when version 6.0 goes stable)
 companies will move to MySQL in droves.  THEN you have the ability to
 build a single virtual database (at least from the point of view of
 your application) that will scale simply and elegantly.  As I said in
 the previous email it is only that 5.1 is in beta that keeps this from
 being available now.  And many companies, such as Kaneva, are doing this
 right now.

 The only reason that companies like Digg and Flikr can exist and grow at
 such phenomenal rates is that they keep the cost of the development of
 the system to a minimum and the overhead of operating (licensing costs
 and hardware cost) down as low as possible.  In addition, of course,
 they need the ability to scale out very quickly.  Digg didn't get any
 significant funding until just recently.  And yet they epitomize the web
 2.0 companies.  They did it by both keeping their cost down and having
 the ability to grow quickly.  Couldn't have done it with Oracle or MS.

 Just my thoughts :)

Right, sure... No-one cries when Digg loses an article. No gives a rats
ass when they loose their comments on Flikr.

Real systems with real data NEED features that actually exist in Oracle
or SQL Server or any other decent DBMS, that, until recently (and still
not quite there yet) just didn't exist in MySQL.

Transactions? Proper constraints? (when does MySQL come with Check
Constraints?)

I'll say again: if you value your data, use constraints wherever possible
and use transactions.

Martijn Tonies
Database Workbench - tool for InterBase, Firebird, MySQL, NexusDB, Oracle 
MS SQL Server
Upscene Productions
http://www.upscene.com
My thoughts:
http://blog.upscene.com/martijn/
Database development questions? Check the forum!
http://www.databasedevelopmentforum.com


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Integrity on large sites

2007-05-25 Thread Jeremy Cole

Hi Naz,

Just to throw out (plug) an ongoing project:

  http://www.hivedb.org/

From the site:


HiveDB is an open source framework for horizontally partitioning MySQL 
systems. Building scalable and high performance MySQL-backed systems 
requires a good deal of expertise in designing the system and 
implementing the code. One of the main strategies for scaling MySQL is 
by partitioning your data across many servers. While it is not difficult 
to accomplish this, it is difficult to do it in such a way that the 
system is easily maintained and transparent to the developer.



We've been working on HiveDB precisely to avoid the large amount of 
(quite specialized) code in the application.


Regards,

Jeremy

Naz Gassiep wrote:

Wow.
The problem with sharding I have is the large amount of code
required in the app to make it work. IMHO the app should be agnostic to
the underlying database system (by that I don't mean the DB in use such
as MySQL or whatever or the schema, I mean the way the DB has been
deployed) so that changes can be made to it without having to worry
about impacting app code. This is one of my fundamental design imperatives.

Then again, I'm not a regular MySQL user so I don't know what is and
is not the norm in the MySQL world.

- Naz.

Evaldas Imbrasas wrote:

You certainly have a right to disagree, but pretty much every
scalability talk at the MySQL conference a few weeks ago was focused
on data partitioning and sharding. And those talks very given by folks
working for some of the most popular (top 100) websites in the world.
It certainly looks like data partitioning is the way to go in the
MySQL world at this point, probably at least until production-ready
and feature-full MySQL Cluster is out. And even then large percentage
of dotcom companies would use data partitioning instead since it can
be implemented on commodity hardware.

Once again, we're talking *really* big websites using MySQL (not
Oracle or SQL Server or whatever) here. Most websites won't ever need
to partition their production databases, and different RDMS might have
different approaches for scalability.


On 5/24/07, Naz Gassiep [EMAIL PROTECTED] wrote:

Data partitioning? Sorry, I disagree that partitioning a table into more
and more servers is the way to scale properly. Perhaps putting
databases' tables onto different servers with different hardware
designed to meat different usage patterns is a good idea, but data
partitioning was a very short lived idea in the world of databases and
I'm glad that as an idea it is dying in practice.




--
high performance mysql consulting
www.provenscaling.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Integrity on large sites

2007-05-24 Thread Peter Brawley

Naz,

*Really* big sites don't ever have referential integrity. Or if the 
few spots
they do (like with financial transactions) it's implemented on the 
application

level (via, say, optimistic locking), never the database level.

Mebbe that view was common in the MySQL community in the time of version 
3, when the emphasis was on one site managing one db. Agreed the concept 
is scary. Try that quote in an Oracle or MSSQL community :-)


PB

-


Naz Gassiep wrote:

I'm working in a project at the moment that is using MySQL, and people keep 
making assertions like this one:

*Really* big sites don't ever have referential integrity. Or if the few spots they 
do (like with financial transactions) it's implemented on the application level (via, 
say, optimistic locking), never the database level.

A large DB working with no RI would give me nightmares. Is it really true that 
large sites turn RI off to improve performance? Am I just being naive in 
thinking that everyone runs their DBs with RI in production?


  


Re: Integrity on large sites

2007-05-24 Thread Martijn Tonies


 I'm working in a project at the moment that is using MySQL, and people
keep making assertions like this one:

 *Really* big sites don't ever have referential integrity. Or if the few
spots they do (like with financial transactions) it's implemented on the
application level (via, say, optimistic locking), never the database level.

 A large DB working with no RI would give me nightmares. Is it really true
that large sites turn RI off to improve performance? Am I just being naive
in thinking that everyone runs their DBs with RI in production?


If you don't value your data, then choose not to use RI.

If you DO value your data, run with as much valid constraints as you can.

After all, that's the whole idea behind constraints :-)

Martijn Tonies
Database Workbench - development tool for MySQL, and more!
Upscene Productions
http://www.upscene.com
My thoughts:
http://blog.upscene.com/martijn/
Database development questions? Check the forum!
http://www.databasedevelopmentforum.com


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Integrity on large sites

2007-05-24 Thread Philip Mather

Naz,
  Without going into detail about various projects I've seen, surfice it to
say that I have wittnessed some true horrors. In defence however, the
largest abomination I have ever witnessed was from an MS shop that had grown
a database from a MS Access system upward and had then, bluntly bolted MySQL
in to the mix so that they could expose it to the web (stop laughing ;P).
  It has however nothing to do with the specific database, just as you can
write shoddy code in C++ or PHP, database abominations know no vendor
boundaries. I think a large number of people reading this may agree when I
say that commercial (you may read time  money as the obvious subtexts)
pressues to produce quick, cheap and working solutions are the real reason
such things as documentation, proper requirements gathering and analysis,
design and QA testing are the first against the wall when such pressures
begin to bite or clients haggle on price.
  So, I'm afraid in concluesion Yes, you are being naive in thinking that
everyone runs their DBs with RI in production. No they don't turn it off,
they never build it in and if they do turn it off it's not for performance
gains. The counter argument to that would be that it's fairly concievable
that if you implemented a solution in a development enviroment with RI
constraints, tested it carefully and completely, put it into production and
perhaps ran it for a month or two then turned all the RI off that it would
still hold water well enough to be a viable commercial solution. Not an
argument I'd serious back but one you could make at any rate And finally
Yes, it's a nightmare in such situations.
  Without whoring I should perhaps state at this juncture that my current
employer does not produce such solutions. We have design and analysis
procedures, a QA department, people with common-sense etc... to ensure that
we avoid such things.

Regards,
  Phil

On 24/05/07, Naz Gassiep [EMAIL PROTECTED] wrote:


I'm working in a project at the moment that is using MySQL, and people
keep making assertions like this one:

*Really* big sites don't ever have referential integrity. Or if the few
spots they do (like with financial transactions) it's implemented on the
application level (via, say, optimistic locking), never the database level.

A large DB working with no RI would give me nightmares. Is it really true
that large sites turn RI off to improve performance? Am I just being naive
in thinking that everyone runs their DBs with RI in production?


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]





--
Regards,
  Phil


Re: Integrity on large sites

2007-05-24 Thread Evaldas Imbrasas

Since the question was about *really* big websites, the answer is both
yes and no.

Yes, they do turn off RI on the database side, simply because it's not
possible to enforce RI on a database system where data is partitioned
across server farms (or shards) both vertically and horizontally. And
really big websites can't survive without the data partioning.

No, they don't usually turn off RI just to improve performance,
because the gains would be minimal, and for big websites, scalability
is a much bigger issue that performance (although sometimes one
depends on the other), and data partitioning is the way to go to solve
the scalability problem.


On 5/24/07, Naz Gassiep [EMAIL PROTECTED] wrote:

I'm working in a project at the moment that is using MySQL, and people keep 
making assertions like this one:

*Really* big sites don't ever have referential integrity. Or if the few spots they 
do (like with financial transactions) it's implemented on the application level (via, 
say, optimistic locking), never the database level.

A large DB working with no RI would give me nightmares. Is it really true that 
large sites turn RI off to improve performance? Am I just being naive in 
thinking that everyone runs their DBs with RI in production?




--
-
Evaldas Imbrasas
http://www.imbrasas.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Integrity on large sites

2007-05-24 Thread Naz Gassiep
Data partitioning? Sorry, I disagree that partitioning a table into more
and more servers is the way to scale properly. Perhaps putting
databases' tables onto different servers with different hardware
designed to meat different usage patterns is a good idea, but data
partitioning was a very short lived idea in the world of databases and
I'm glad that as an idea it is dying in practice.
- Naz

Evaldas Imbrasas wrote:
 Since the question was about *really* big websites, the answer is both
 yes and no.

 Yes, they do turn off RI on the database side, simply because it's not
 possible to enforce RI on a database system where data is partitioned
 across server farms (or shards) both vertically and horizontally. And
 really big websites can't survive without the data partioning.

 No, they don't usually turn off RI just to improve performance,
 because the gains would be minimal, and for big websites, scalability
 is a much bigger issue that performance (although sometimes one
 depends on the other), and data partitioning is the way to go to solve
 the scalability problem.


 On 5/24/07, Naz Gassiep [EMAIL PROTECTED] wrote:
 I'm working in a project at the moment that is using MySQL, and
 people keep making assertions like this one:

 *Really* big sites don't ever have referential integrity. Or if the
 few spots they do (like with financial transactions) it's implemented
 on the application level (via, say, optimistic locking), never the
 database level.

 A large DB working with no RI would give me nightmares. Is it really
 true that large sites turn RI off to improve performance? Am I just
 being naive in thinking that everyone runs their DBs with RI in
 production?




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Integrity on large sites

2007-05-24 Thread B. Keith Murphy
Sometimes partitioning is absolutely necessary.  If you can't run a 
cluster - how else can you really scale writes to the database?  Some 
companies can't use clustering because in 5.0.x (the non-beta release) 
clustering is all done in memory - all tables have to be in memory (just 
like the old heap tables).  It isn't until 5.1.x that clustering allows 
your data to be stored on disc.  Many companies still consider 5.1 to 
not be production ready.  You might disagree but that is their 
thinking.  So, if you don't use clustering, how else are you going to 
scale an application? 

I suppose you can set up master-master replication - but that doesn't 
really scale to a large extent.  Some companies have huge applications 
with hundreds of gigabytes or even terabytes of data.  I think if you 
read carefully through the presentations from the recent MySQL 
conference by companies such as Digg and Flickr you will find that they 
do partitioning as well as caching and such.  I recall specifically 
reading through a presentation by livejournal about how they split up 
their load across multiple machines by the very partitioning we are 
talking about.


I might be missing something.  I can understand why you wouldn't want to 
work on such a system as it certainly adds complexity to the entire 
database.  But that doesn't mean that it isn't something that isn't 
necessary sometimes.


Just my two cents  :)

Keith

Naz Gassiep wrote:

Data partitioning? Sorry, I disagree that partitioning a table into more
and more servers is the way to scale properly. Perhaps putting
databases' tables onto different servers with different hardware
designed to meat different usage patterns is a good idea, but data
partitioning was a very short lived idea in the world of databases and
I'm glad that as an idea it is dying in practice.
- Naz

Evaldas Imbrasas wrote:
  

Since the question was about *really* big websites, the answer is both
yes and no.

Yes, they do turn off RI on the database side, simply because it's not
possible to enforce RI on a database system where data is partitioned
across server farms (or shards) both vertically and horizontally. And
really big websites can't survive without the data partioning.

No, they don't usually turn off RI just to improve performance,
because the gains would be minimal, and for big websites, scalability
is a much bigger issue that performance (although sometimes one
depends on the other), and data partitioning is the way to go to solve
the scalability problem.


On 5/24/07, Naz Gassiep [EMAIL PROTECTED] wrote:


I'm working in a project at the moment that is using MySQL, and
people keep making assertions like this one:

*Really* big sites don't ever have referential integrity. Or if the
few spots they do (like with financial transactions) it's implemented
on the application level (via, say, optimistic locking), never the
database level.

A large DB working with no RI would give me nightmares. Is it really
true that large sites turn RI off to improve performance? Am I just
being naive in thinking that everyone runs their DBs with RI in
production?

  



  



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Integrity on large sites

2007-05-24 Thread Evaldas Imbrasas

You certainly have a right to disagree, but pretty much every
scalability talk at the MySQL conference a few weeks ago was focused
on data partitioning and sharding. And those talks very given by folks
working for some of the most popular (top 100) websites in the world.
It certainly looks like data partitioning is the way to go in the
MySQL world at this point, probably at least until production-ready
and feature-full MySQL Cluster is out. And even then large percentage
of dotcom companies would use data partitioning instead since it can
be implemented on commodity hardware.

Once again, we're talking *really* big websites using MySQL (not
Oracle or SQL Server or whatever) here. Most websites won't ever need
to partition their production databases, and different RDMS might have
different approaches for scalability.


On 5/24/07, Naz Gassiep [EMAIL PROTECTED] wrote:

Data partitioning? Sorry, I disagree that partitioning a table into more
and more servers is the way to scale properly. Perhaps putting
databases' tables onto different servers with different hardware
designed to meat different usage patterns is a good idea, but data
partitioning was a very short lived idea in the world of databases and
I'm glad that as an idea it is dying in practice.


--
-
Evaldas Imbrasas
http://www.imbrasas.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Integrity on large sites

2007-05-24 Thread Naz Gassiep
Wow.
The problem with sharding I have is the large amount of code
required in the app to make it work. IMHO the app should be agnostic to
the underlying database system (by that I don't mean the DB in use such
as MySQL or whatever or the schema, I mean the way the DB has been
deployed) so that changes can be made to it without having to worry
about impacting app code. This is one of my fundamental design imperatives.

Then again, I'm not a regular MySQL user so I don't know what is and
is not the norm in the MySQL world.

- Naz.

Evaldas Imbrasas wrote:
 You certainly have a right to disagree, but pretty much every
 scalability talk at the MySQL conference a few weeks ago was focused
 on data partitioning and sharding. And those talks very given by folks
 working for some of the most popular (top 100) websites in the world.
 It certainly looks like data partitioning is the way to go in the
 MySQL world at this point, probably at least until production-ready
 and feature-full MySQL Cluster is out. And even then large percentage
 of dotcom companies would use data partitioning instead since it can
 be implemented on commodity hardware.

 Once again, we're talking *really* big websites using MySQL (not
 Oracle or SQL Server or whatever) here. Most websites won't ever need
 to partition their production databases, and different RDMS might have
 different approaches for scalability.


 On 5/24/07, Naz Gassiep [EMAIL PROTECTED] wrote:
 Data partitioning? Sorry, I disagree that partitioning a table into more
 and more servers is the way to scale properly. Perhaps putting
 databases' tables onto different servers with different hardware
 designed to meat different usage patterns is a good idea, but data
 partitioning was a very short lived idea in the world of databases and
 I'm glad that as an idea it is dying in practice.


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Integrity on large sites

2007-05-24 Thread B. Keith Murphy
OK.  Going to try this again.  After reading through these emails I 
think I have learned a little more about the way you are thinking. 

I DO NOT want to start some kind of flame war. 

However, I disagree very strongly with what you are saying.  Yes, you 
are right, sharding does require more complexity from the application 
layer.  Sorry for all you developers out there (and I can safely say 
that I am NOT a developer!!). 

The fundamental issue for you, as I see it, is the increased complexity 
caused by sharding the application.


That being said, I will say this...if you develop on some other RDBMS 
such as MS or Oracle is it possible to deleveop something like you are 
saying...an all-inclusive database that isn't sharded?  Yep, when I 
worked at Netzero in 2001 for example we had two database servers 
running Oracle, one on the east coast in Virginia and one one the west 
coast in California.  The east coast server was a backup of the west 
coast server.  So one database server did the billing for all of 
Netzero's customers.  Millions of customers..absolutely.  All in one 
nice tidy box that I am sure was easier to develop the billing 
applications around.


Here is the kicker.  Each box was a top of the line Sun server that had 
32 processors and 32 gigs of RAM.  They could handle up to 64 procs and 
64 gigs.  And each cost well over a million dollars for the hardware 
alone.  Running Oracle on it must have cost over 100,000 dollars for 
software licenses.  Granted this was in 2001, but the licensing cost for 
Oracle haven't gone down any that I am aware of...and the hardware cost 
will still be quite steep to do this type of thing. 


So I ask you this..

Would it be better to go with that scenario or something like this:

Implement the billing application using MySQL.  Shard it.  Create 
complexity.  Your hardware cost saving alone will pay for multiple 
developers to handle any complexity increases.  Any decent DBA is going 
to be able to handle multiple servers required to operate this setup.  
You will probably see a decrease in salary cost moving from Oracle to 
MySQL dbas. 

So for the bottom line of the company it is a overall win by far.  It is 
only the inherent difficulty in moving complex systems from one type of 
DB to another that keep more companies from switching.  Why hasn't this 
happend previously??  Because until version 4 of MySQL was stable there 
were many features not available in MySQL that were needed by these 
types of systems.


It is my contention that as the clustering capabilities of MySQL 
continue to grow and mature (think of when version 6.0 goes stable) 
companies will move to MySQL in droves.  THEN you have the ability to 
build a single virtual database (at least from the point of view of 
your application) that will scale simply and elegantly.  As I said in 
the previous email it is only that 5.1 is in beta that keeps this from 
being available now.  And many companies, such as Kaneva, are doing this 
right now. 

The only reason that companies like Digg and Flikr can exist and grow at 
such phenomenal rates is that they keep the cost of the development of 
the system to a minimum and the overhead of operating (licensing costs 
and hardware cost) down as low as possible.  In addition, of course, 
they need the ability to scale out very quickly.  Digg didn't get any 
significant funding until just recently.  And yet they epitomize the web 
2.0 companies.  They did it by both keeping their cost down and having 
the ability to grow quickly.  Couldn't have done it with Oracle or MS. 


Just my thoughts :)

Keith







Naz Gassiep wrote:

Wow.
The problem with sharding I have is the large amount of code
required in the app to make it work. IMHO the app should be agnostic to
the underlying database system (by that I don't mean the DB in use such
as MySQL or whatever or the schema, I mean the way the DB has been
deployed) so that changes can be made to it without having to worry
about impacting app code. This is one of my fundamental design imperatives.

Then again, I'm not a regular MySQL user so I don't know what is and
is not the norm in the MySQL world.

- Naz.

Evaldas Imbrasas wrote:
  

You certainly have a right to disagree, but pretty much every
scalability talk at the MySQL conference a few weeks ago was focused
on data partitioning and sharding. And those talks very given by folks
working for some of the most popular (top 100) websites in the world.
It certainly looks like data partitioning is the way to go in the
MySQL world at this point, probably at least until production-ready
and feature-full MySQL Cluster is out. And even then large percentage
of dotcom companies would use data partitioning instead since it can
be implemented on commodity hardware.

Once again, we're talking *really* big websites using MySQL (not
Oracle or SQL Server or whatever) here. Most websites won't ever need
to partition their production databases, and different