Hi Steve,
Some very good points.
We are working on a solution for BLOBs in the a MySQL database at
www.blobstreaming.org.
Any good solution will have to take your comments into account...
On Jun 22, 2007, at 9:24 AM, Steve Edberg wrote:
At 12:11 PM +0530 6/22/07, Ratheesh K J wrote:
Hello All,
I want a clarification. Whe run a forum wherein people send
messages with/without attachments. Attachments may contain images,
documents etc.. We are actually storing the attachment in a blob
column. Sometimes the attachments are big. And today the table
size has grown to 40 GB. This has created a headache for any
maintanance task, backup, restoration. etc.
I want to know whether this is the right approach. Or should we
actually store the attachments in directories and just stiore the
attachment path in the database.
Kindly suggest the best approach so that I can reduce the database
size.
Thanks in advance
Yes, storing files - especially non-textual files - in the file
system instead of the database is generally considered the best
practice.
At one point I had created a document management system that stored
everything in the database as you are doing; my rationale was that
it allowed me to manage permissions using the existing database
permissions, and to back up the whole database using mysqldump, vs
mysqldump + doing a tar of the files. However, I abandoned this
approach for the following reasons:
(1) Storing non-plaintext items (eg; pictures) in the database
makes it bigger and slower without added value - you can't (at
least not yet, or in the foreseeable future) do a meaningful search
on a blob.
(2) It becomes more difficult to split storage out onto multiple
filesystems; eg, leaving the database files in /var/database,
putting the documents themselves into /home/docmanager, etc.
(3) It makes queries on the commandline unwieldy; if you have a
blob field, doing a select * to check a record's contents can dump
a lot of garbage on the screen.
(4) It can make doing incremental backups more difficult; if the
documents themselves are relatively static, but the document
metadata stored in the database is very dynamic, it becomes simple
to do a compact daily database dump + a weekly document directory
backup (for example) if the files are not in the database.
What I do is create a unique SHA1 hash when a file is uploaded (eg;
sha1(rand()). The original filename and the 40-character hash are
stored in the database, and the document is stored in the
filesystem using the hash as the filename. I can optionally
compress and encrypt the document as well, storing the encryption
key in the database. This gives (for me) adequate document
security. An additional advantage is that you can take advantage of
the filesystem tree if you have a large number of documents. For
example, if a document hash is 'f0118e9bd2c4fb29c64ee03abce698b8',
you can store the file in the directory tree f0/11/8e/
f0118e9bd2c4fb29c64ee03abce698b8 (extending to as many levels as
you feel necessary). By keeping the number of files per directory
fairly small, file retrieval becomes relatively fast. As the hashes
approximate a random distribution, you should always have a close-
to-balanced tree.
Lastly, I store a hash of the document itself in the database as
well. This allows me to detect if duplicate files are uploaded, and
to determine if a previously-uploaded file has been corrupted in
some way.
steve
--
+--------------- my people are the people of the dessert,
---------------+
| Steve Edberg http://
pgfsun.ucdavis.edu/ |
| UC Davis Genome Center
[EMAIL PROTECTED] |
| Bioinformatics programming/database/sysadmin (530)
754-9127 |
+---------------- said t e lawrence, picking up his fork
----------------+
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?
[EMAIL PROTECTED]
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]