Thanks Shawn and Karen, for the suggestions, even given my vague
requirements. 

To clarify some of my requirements. 

*Application:  *We are using an open-source application called Omeka,
which is a "free, flexible, and open source web-publishing platform for
the display of library, museum, archives, and scholarly collections and
exhibitions."  Without getting into how free (or scalable) free software
really is, we can view it as one aspect we cannot change, having been
written into the grant requirements we received for the project. 
Experienced Omeka developers and our own developer have suggested
that/it is not feasible to separate database writes from reads in the
application/ (given time and resources).

*SAN: *The SAN is a Dell EqualLogic 6100 which has redundant everything,
including multiple NICs, controllers, and power.  So we are less
concerned about the SAN being a SPoF.  On the other hand, if we have a
single big MySQL server that fails, we could bring up another copy of it
via VMWare, but until the server came up, the application would be dead
in the water.  If the database is corrupted, service will be interrupted
for a considerable time.

*High Availability:*  It sounds like there is some debate over how to
provide HA best, but do people really disagree on the desired results? 
Without getting into the many meanings of this buzz word, here's what we
mean: /We desire to maintain high availability of service, allowing a
small fraction of users to experience outage for only seconds at a
time.  We desire to provide this through horizontal scaling, redundancy,
failover planning, and external monitoring.  /

*Scalability:  *Again, seems like there are lots of applications and
implementation, but people agree on the general concept.  Here's what we
mean for this project:  /We desire to  scale our services so that a
usage surge does not cause unavailability of the services for some
users.  We prefer to horizontally increase scalability using
load-balancing strategies to treat clusters of servers as single logical
units./ 

The application may have not been designed with great scalability in
mind, but if multiple application instances are accessing multiple
database servers treated as one logical unit, that may not be too
relevant. 

I am responsible for creating an architecture upon which this project
will run.  I am not responsible for redesigning the application.   So
far, no one has suggested anything that approached meeting our
requirements, even our vague ones.  Perhaps I am asking the wrong list? 

Does anyone have any experience with MySQL Multi-Master Replication? 
Perhaps that should be a separate post.

Wes

On 3/30/2012 3:56 PM, shawn green wrote:
> Hello Wes,
>
> On 3/29/2012 9:23 PM, Wes Modes wrote:
>> First, thank you in advance for good solid suggestions you can offer. I
>> suppose someone has already asked this, but perhaps you will view it as
>> a fun challenge to meet my many criteria with your suggested MySQL
>> architecture.
>>
>> I am working at a University on a high-profile database driven project
>> that we expect to be slammed within the first few months. Since this is
>> a new project and one that we expect to be popular, we don't know what
>> kind of usage to expect, but we want to be prepared. Therefore, we are
>> building in extra capacity.
>>
>> Our top goals are scalability and high availability, provided we hope
>> through multiple MySQL nodes and VMWare functionality. I've been
>> surprised that there are not more MySQL architects trying to meet these
>> high-level goals using virtualization and shared storage (or at least
>> they do not seem to be writing about it).
>>
>> I've looked at replication, multi-mastering, DRBD, clustering,
>> partitioning, and sharding.
>>
>> Here's what we got, and some of our constraints:
>>
>> * We are concerned that One Big Database instance won't be enough to
>> handle all of the queries, plus it is a single point of failure.
>> Therefore, multiple nodes are desirable.
>>
>> * With the primary application that will be using the database, writes
>> and reads cannot be split off from each other. This limitation alone,
>> rules out replication, MMM, and a few other solutions.
>>
>> * We do not expect to be especially write-heavy.
>>
>> * We have shared storage in the form of an iSCSI SAN. We'd like to
>> leverage the shared storage, if possible.
>>
>> * We have VMWare HA which already monitors hosts and brings them up
>> within minutes elsewhere if we lose a host. So some of the suggested HA
>> solutions are redundant.
>>
>> * We expect to have another instance of our system running in the Amazon
>> cloud for the first few months while the traffic is high, so we may take
>> advantage of RDS, though an exact duplicate of our local system will
>> save us development work.
>>
>> Thanks for any advice you can give.
>>
>> Wes Modes
>>
>
> As the others have already pointed out, your specifications are rather
> vague and appear to be only hardware-focused. I can state with some
> certainty, through my six years of direct MySQL support experience,
> that the majority of the problems related to performance are NOT
> hardware related.  The majority of the problems I need to resolve are
> related to bad queries, bad table design, and bad application design.
>
> For example, you state:
> > * With the primary application that will be using the database, writes
> > and reads cannot be split off from each other.
>
> This is already a red flag that your application is not designed for
> scale.
>
> Also, you say:
> > * We are concerned that One Big Database instance won't be enough to
> > handle all of the queries, plus it is a single point of failure.
> then you say:
> > * We have shared storage in the form of an iSCSI SAN. We'd like to
> > leverage the shared storage, if possible.
>
> This is another red flag in that your 'shared storage' once again
> becomes a single point of failure.  Also, MySQL instances do not share
> files. Period. Each MySQL instance needs its own, completely
> independent, set of files and folders. Please take the hint that
> Replication should be in your very near future.
>
> Here are some solid suggestions:
>
> a) modularize your code so that WRITES can go to a MASTER server and
> READS can be distributed over a large number of SLAVE servers.
>
> b) modularize your data so that you can easily subdivide subsets of it
> to different master-slave trees. For example, students whose last name
> starts with A-D are assigned to machines in group1, E-K in group2, ...
> etc.
>
> c) Optimize your queries to avoid subqueries and to take full
> advantage of indexes.
>
> d) Build indexes wisely -  A well-crafted multi-column index can take
> up less space and will be more functional that multiple single-column
> indexes.
>
> e) Test, test, and retest - Until you can push your system to the
> choking point in the lab, you will have no idea of how much traffic it
> will be able to handle in the field.
>
> Good luck!


-- 
Wes Modes
Systems Designer, Developer, and Administrator
University Library ITS
University of California, Santa Cruz

Reply via email to