On Thu, Sep 26, 2002 at 10:31:06AM +0100, Andrew Bryant wrote:
> We have 125 million smallish images (each one is tens of KB, produced by
> gene sequencing machines).  Currently we store these in tarfiles and
> index the images (the offset in the tar file) using an Oracle
> database.  Each tarfile is around 1.5GB, containing tens of thousands of
> images.  Thus, by using tarfiles we limit the number of files kept by the 
> operating system (Tru64 Unix), which would be unmanageable
> otherwise.  This method has various drawbacks, however, associated with
> keeping the database and tarfiles in sync.

Since you asked..

I don't have a full understanding of your situation but to me,
this seems to cry out for a cluster. The temptation to want to
stuff everything into database servers is strong, but the
average freenix box is far better at plain file serving without an
RDBMS in the way. All you need in the database is a reference to
the image.

I'd make a smallish cluster of commodity x86 boxes. Pack 200GB-400GB
of RAID-5 storage in each box, and make each box responsible for a
portion of the image space. You'd make a massive filesystem on
each box (or maybe several smaller ones) for storing the images,
and access to the cluster would be provided from gateways that
store image:node mappings.

MySQL's role would be in maintaining this map on the gateway.  The
trickiest problem if you use multiple gateways (and why the
hell not?) might be keeping each MySQL box in sync, although not
a big deal if you use a master/slave setup with only one gateway
allowing you to do updates.

The gateway will either simply redirect you to an internal server
(say, you access them via HTTP) or the gateway NFS mounts each
node's imagespace and acts as a liason for the client.

Such a recipe can also be repeated for each cluster node. There's
no reason they can't also be gateways to smaller sub-clusters.

The best part of all: In addition to better redundancy, simple,
proven software to maintain at every step, and ease of increasing
the imagespace, is the cost. If each node were $5000 USD, and 20
nodes solved the problem, you're looking at $100,000 USD in hardware
costs. Isn't that like what a typical Oracle support contract
costs in a year?

-- 
Michael Bacarella  | Netgraft Corp
                   | 545 Eighth Ave #401
 Systems Analysis  | New York, NY 10018
Technical Support  | 212 946-1038 | 917 670-6982
 Managed Services  | http://netgraft.com/


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to