At 12:11 PM +0530 6/22/07, Ratheesh K J wrote:
Hello All,

I want a clarification. Whe run a forum wherein people send messages with/without attachments. Attachments may contain images, documents etc.. We are actually storing the attachment in a blob column. Sometimes the attachments are big. And today the table size has grown to 40 GB. This has created a headache for any maintanance task, backup, restoration. etc.

I want to know whether this is the right approach. Or should we actually store the attachments in directories and just stiore the attachment path in the database.

Kindly suggest the best approach so that I can reduce the database size.

Thanks in advance


Yes, storing files - especially non-textual files - in the file system instead of the database is generally considered the best practice.

At one point I had created a document management system that stored everything in the database as you are doing; my rationale was that it allowed me to manage permissions using the existing database permissions, and to back up the whole database using mysqldump, vs mysqldump + doing a tar of the files. However, I abandoned this approach for the following reasons:

(1) Storing non-plaintext items (eg; pictures) in the database makes it bigger and slower without added value - you can't (at least not yet, or in the foreseeable future) do a meaningful search on a blob.

(2) It becomes more difficult to split storage out onto multiple filesystems; eg, leaving the database files in /var/database, putting the documents themselves into /home/docmanager, etc.

(3) It makes queries on the commandline unwieldy; if you have a blob field, doing a select * to check a record's contents can dump a lot of garbage on the screen.

(4) It can make doing incremental backups more difficult; if the documents themselves are relatively static, but the document metadata stored in the database is very dynamic, it becomes simple to do a compact daily database dump + a weekly document directory backup (for example) if the files are not in the database.

What I do is create a unique SHA1 hash when a file is uploaded (eg; sha1(rand()). The original filename and the 40-character hash are stored in the database, and the document is stored in the filesystem using the hash as the filename. I can optionally compress and encrypt the document as well, storing the encryption key in the database. This gives (for me) adequate document security. An additional advantage is that you can take advantage of the filesystem tree if you have a large number of documents. For example, if a document hash is 'f0118e9bd2c4fb29c64ee03abce698b8', you can store the file in the directory tree f0/11/8e/f0118e9bd2c4fb29c64ee03abce698b8 (extending to as many levels as you feel necessary). By keeping the number of files per directory fairly small, file retrieval becomes relatively fast. As the hashes approximate a random distribution, you should always have a close-to-balanced tree.

Lastly, I store a hash of the document itself in the database as well. This allows me to detect if duplicate files are uploaded, and to determine if a previously-uploaded file has been corrupted in some way.

        steve

--
+--------------- my people are the people of the dessert, ---------------+
| Steve Edberg                                http://pgfsun.ucdavis.edu/ |
| UC Davis Genome Center                            [EMAIL PROTECTED] |
| Bioinformatics programming/database/sysadmin             (530)754-9127 |
+---------------- said t e lawrence, picking up his fork ----------------+

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to