DRBD, SAN, etc. Sure, they are highly redundant. Sure they are reliable. But they do not handle the building being in a flood/earthquake/tornado/etc. If you want HA, you have to start with having two (or more) copies of all the data sitting in geographically distinct flood plains, etc. If you have 3 copies of the data, why bother with redundant anything in each of the 3 sites? HA within a device costs a lot of extra money.

You cannot get more than a few 9s, regardless of how you architect it.

Let's look at the design from a different point of view...
* How much data?
* How much traffic?
* It sounds like much of the data is images or videos; correct? One could argue that those do not need to be in a database, other than a URL to them. Then you can scatter copies around the globe, and use Akamai (or similar load balancing service) to not only find a viable copy, but find the nearest copy, thereby improving the "user experience"! * You must put effort into splitting reads from writes. Otherwise you are asking for extra vulnerability from hackers. Yes, hackers will be a big issue, and you have not mentioned it yet.

You may need to scale via Sharding. (Without some of the questions above answered, I cannot predict.) If you do need sharding, you must incorporate it into the software design _early_.



On 4/2/12 1:05 PM, Wes Modes wrote:
Thanks Shawn and Karen, for the suggestions, even given my vague
requirements.

To clarify some of my requirements.

*Application:  *We are using an open-source application called Omeka,
which is a "free, flexible, and open source web-publishing platform for
the display of library, museum, archives, and scholarly collections and
exhibitions."  Without getting into how free (or scalable) free software
really is, we can view it as one aspect we cannot change, having been
written into the grant requirements we received for the project.
Experienced Omeka developers and our own developer have suggested
that/it is not feasible to separate database writes from reads in the
application/ (given time and resources).

*SAN: *The SAN is a Dell EqualLogic 6100 which has redundant everything,
including multiple NICs, controllers, and power.  So we are less
concerned about the SAN being a SPoF.  On the other hand, if we have a
single big MySQL server that fails, we could bring up another copy of it
via VMWare, but until the server came up, the application would be dead
in the water.  If the database is corrupted, service will be interrupted
for a considerable time.

*High Availability:*  It sounds like there is some debate over how to
provide HA best, but do people really disagree on the desired results?
Without getting into the many meanings of this buzz word, here's what we
mean: /We desire to maintain high availability of service, allowing a
small fraction of users to experience outage for only seconds at a
time.  We desire to provide this through horizontal scaling, redundancy,
failover planning, and external monitoring.  /

*Scalability:  *Again, seems like there are lots of applications and
implementation, but people agree on the general concept.  Here's what we
mean for this project:  /We desire to  scale our services so that a
usage surge does not cause unavailability of the services for some
users.  We prefer to horizontally increase scalability using
load-balancing strategies to treat clusters of servers as single logical
units./

The application may have not been designed with great scalability in
mind, but if multiple application instances are accessing multiple
database servers treated as one logical unit, that may not be too
relevant.

I am responsible for creating an architecture upon which this project
will run.  I am not responsible for redesigning the application.   So
far, no one has suggested anything that approached meeting our
requirements, even our vague ones.  Perhaps I am asking the wrong list?

Does anyone have any experience with MySQL Multi-Master Replication?
Perhaps that should be a separate post.

Wes

On 3/30/2012 3:56 PM, shawn green wrote:
Hello Wes,

On 3/29/2012 9:23 PM, Wes Modes wrote:
First, thank you in advance for good solid suggestions you can offer. I
suppose someone has already asked this, but perhaps you will view it as
a fun challenge to meet my many criteria with your suggested MySQL
architecture.

I am working at a University on a high-profile database driven project
that we expect to be slammed within the first few months. Since this is
a new project and one that we expect to be popular, we don't know what
kind of usage to expect, but we want to be prepared. Therefore, we are
building in extra capacity.

Our top goals are scalability and high availability, provided we hope
through multiple MySQL nodes and VMWare functionality. I've been
surprised that there are not more MySQL architects trying to meet these
high-level goals using virtualization and shared storage (or at least
they do not seem to be writing about it).

I've looked at replication, multi-mastering, DRBD, clustering,
partitioning, and sharding.

Here's what we got, and some of our constraints:

* We are concerned that One Big Database instance won't be enough to
handle all of the queries, plus it is a single point of failure.
Therefore, multiple nodes are desirable.

* With the primary application that will be using the database, writes
and reads cannot be split off from each other. This limitation alone,
rules out replication, MMM, and a few other solutions.

* We do not expect to be especially write-heavy.

* We have shared storage in the form of an iSCSI SAN. We'd like to
leverage the shared storage, if possible.

* We have VMWare HA which already monitors hosts and brings them up
within minutes elsewhere if we lose a host. So some of the suggested HA
solutions are redundant.

* We expect to have another instance of our system running in the Amazon
cloud for the first few months while the traffic is high, so we may take
advantage of RDS, though an exact duplicate of our local system will
save us development work.

Thanks for any advice you can give.

Wes Modes

As the others have already pointed out, your specifications are rather
vague and appear to be only hardware-focused. I can state with some
certainty, through my six years of direct MySQL support experience,
that the majority of the problems related to performance are NOT
hardware related.  The majority of the problems I need to resolve are
related to bad queries, bad table design, and bad application design.

For example, you state:
* With the primary application that will be using the database, writes
and reads cannot be split off from each other.
This is already a red flag that your application is not designed for
scale.

Also, you say:
* We are concerned that One Big Database instance won't be enough to
handle all of the queries, plus it is a single point of failure.
then you say:
* We have shared storage in the form of an iSCSI SAN. We'd like to
leverage the shared storage, if possible.
This is another red flag in that your 'shared storage' once again
becomes a single point of failure.  Also, MySQL instances do not share
files. Period. Each MySQL instance needs its own, completely
independent, set of files and folders. Please take the hint that
Replication should be in your very near future.

Here are some solid suggestions:

a) modularize your code so that WRITES can go to a MASTER server and
READS can be distributed over a large number of SLAVE servers.

b) modularize your data so that you can easily subdivide subsets of it
to different master-slave trees. For example, students whose last name
starts with A-D are assigned to machines in group1, E-K in group2, ...
etc.

c) Optimize your queries to avoid subqueries and to take full
advantage of indexes.

d) Build indexes wisely -  A well-crafted multi-column index can take
up less space and will be more functional that multiple single-column
indexes.

e) Test, test, and retest - Until you can push your system to the
choking point in the lab, you will have no idea of how much traffic it
will be able to handle in the field.

Good luck!


--
Rick James - MySQL Geek


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

Reply via email to