On Thu, Sep 26, 2002 at 10:31:06AM +0100, Andrew Bryant wrote: > We have 125 million smallish images (each one is tens of KB, produced by > gene sequencing machines). Currently we store these in tarfiles and > index the images (the offset in the tar file) using an Oracle > database. Each tarfile is around 1.5GB, containing tens of thousands of > images. Thus, by using tarfiles we limit the number of files kept by the > operating system (Tru64 Unix), which would be unmanageable > otherwise. This method has various drawbacks, however, associated with > keeping the database and tarfiles in sync.
Since you asked.. I don't have a full understanding of your situation but to me, this seems to cry out for a cluster. The temptation to want to stuff everything into database servers is strong, but the average freenix box is far better at plain file serving without an RDBMS in the way. All you need in the database is a reference to the image. I'd make a smallish cluster of commodity x86 boxes. Pack 200GB-400GB of RAID-5 storage in each box, and make each box responsible for a portion of the image space. You'd make a massive filesystem on each box (or maybe several smaller ones) for storing the images, and access to the cluster would be provided from gateways that store image:node mappings. MySQL's role would be in maintaining this map on the gateway. The trickiest problem if you use multiple gateways (and why the hell not?) might be keeping each MySQL box in sync, although not a big deal if you use a master/slave setup with only one gateway allowing you to do updates. The gateway will either simply redirect you to an internal server (say, you access them via HTTP) or the gateway NFS mounts each node's imagespace and acts as a liason for the client. Such a recipe can also be repeated for each cluster node. There's no reason they can't also be gateways to smaller sub-clusters. The best part of all: In addition to better redundancy, simple, proven software to maintain at every step, and ease of increasing the imagespace, is the cost. If each node were $5000 USD, and 20 nodes solved the problem, you're looking at $100,000 USD in hardware costs. Isn't that like what a typical Oracle support contract costs in a year? -- Michael Bacarella | Netgraft Corp | 545 Eighth Ave #401 Systems Analysis | New York, NY 10018 Technical Support | 212 946-1038 | 917 670-6982 Managed Services | http://netgraft.com/ --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php