Hello, 

Just looking for a little feedback before deciding on a data engine path
and configuration.

We're working on an application that is expected to have a high growth
rate over the coming years.  We're at the stage of designing the backend
databases to maximize performance while keeping costs at a minimum. (I
know, everyone's heard this one before) -- database already exists but
in a small controlled environment.

Here is the overall gist of the database.  We have a couple tables that
will be update infrequently but read from 100-200 times per second.  We
also have a couple tables that will be updated up to a couple hundred
times per second.  The tables that are read may only have 50k rows in
them, the tables that are written will continue to grow (expecting 50M
rows per year).

Here is my basic idea on the overall design on the system (based upon
how the app will work).

The app has 3 distinct parts to it.  The first part (A) will primarily
do lookups (50K rows - Client table).  The second part (B) will take
transactional information and insert it into a database (50M rows -
Transaction table).  The third part (C) of the app will be used to query
the transactional data.

I was thinking of creating a master database that will hold the basic
information needed across the three processes.  Since parts A, B and C
will all need to access the Client tables, I thought that maybe I should
create a master database for Client Table.  We will call this ServerA.
>From there I figure we can create replicated slave ServerA-Nodes that
all of the processes can read from in a load balanced scenario (using
IPVSADM).  This is the easy part.

The part B is heavy write and part C is heavy read I figure we can use a
similar scenario.  Have a single large cluster for part B's writes
(including the use of table partitions) on ServerB and then create
replicated slave ServerB-nodes that all of the part C processes would
read from.  Replication has to be near real time.  That is, we have a
contractual agreement to report the incoming processed data within 30
minutes.


General table information
Client table, multiple tables, 1:many
        Table A, 500 bytes ~200 read/s (50K records)
        Table B, Text, < 3000 bytes, rarely read (mostly contact info)
        Table C, 250 bytes medium ~1 read/s (150K records)
Transaction table, multiple tables, each 1:1
        Table A, 400 bytes ~200 insert/s (peak 500 insert/s, low 10
insert/s) (50M records)
        Table B, 200 bytes ~50 insert/s (peak 200 insert/s, low 10
insert/s) (15M records)
        Table C, Text, < 2000 bytes ~200 insert/s (peak 500 insert/s,
low 10 insert/s) (50M records)

So, in my end design I'm thinking two master clusters, one for Client
tables, one for Transaction tables, and slave everything else (into two
distinct groups).

Anyone see any downside to this?  Any better suggestions?

Also, I've been looking at some data partitioning schemes.  Would this
add any performance impacts in the long run (by allowing me to put the
different files on different drive arrays in the future).

For the hardware clusters I'm looking to use dual dual core AMD's, 8gb
ram, raid 5, for the slaves single dual core AMD's, 4gb ram, raid 5.
All GB nic.

Any feedback would be greatly appreciated.

Gary

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to