Re: Blob data

Steve Edberg Fri, 22 Jun 2007 02:32:56 -0700

At 12:11 PM +0530 6/22/07, Ratheesh K J wrote:

Hello All,
I want a clarification. Whe run a forum wherein people send messageswith/without attachments. Attachments may contain images, documentsetc.. We are actually storing the attachment in a blob column.Sometimes the attachments are big. And today the table size hasgrown to 40 GB. This has created a headache for any maintanancetask, backup, restoration. etc.
I want to know whether this is the right approach. Or should weactually store the attachments in directories and just stiore theattachment path in the database.
Kindly suggest the best approach so that I can reduce the database size.

Thanks in advance

Yes, storing files - especially non-textual files - in the filesystem instead of the database is generally considered the bestpractice.

At one point I had created a document management system that storedeverything in the database as you are doing; my rationale was that itallowed me to manage permissions using the existing databasepermissions, and to back up the whole database using mysqldump, vsmysqldump + doing a tar of the files. However, I abandoned thisapproach for the following reasons:

(1) Storing non-plaintext items (eg; pictures) in the database makesit bigger and slower without added value - you can't (at least notyet, or in the foreseeable future) do a meaningful search on a blob.

(2) It becomes more difficult to split storage out onto multiplefilesystems; eg, leaving the database files in /var/database, puttingthe documents themselves into /home/docmanager, etc.

(3) It makes queries on the commandline unwieldy; if you have a blobfield, doing a select * to check a record's contents can dump a lotof garbage on the screen.

(4) It can make doing incremental backups more difficult; if thedocuments themselves are relatively static, but the document metadatastored in the database is very dynamic, it becomes simple to do acompact daily database dump + a weekly document directory backup (forexample) if the files are not in the database.

What I do is create a unique SHA1 hash when a file is uploaded (eg;sha1(rand()). The original filename and the 40-character hash arestored in the database, and the document is stored in the filesystemusing the hash as the filename. I can optionally compress and encryptthe document as well, storing the encryption key in the database.This gives (for me) adequate document security. An additionaladvantage is that you can take advantage of the filesystem tree ifyou have a large number of documents. For example, if a document hashis 'f0118e9bd2c4fb29c64ee03abce698b8', you can store the file in thedirectory tree f0/11/8e/f0118e9bd2c4fb29c64ee03abce698b8 (extendingto as many levels as you feel necessary). By keeping the number offiles per directory fairly small, file retrieval becomes relativelyfast. As the hashes approximate a random distribution, you shouldalways have a close-to-balanced tree.

Lastly, I store a hash of the document itself in the database aswell. This allows me to detect if duplicate files are uploaded, andto determine if a previously-uploaded file has been corrupted in someway.


        steve

--
+--------------- my people are the people of the dessert, ---------------+
| Steve Edberg                                http://pgfsun.ucdavis.edu/ |
| UC Davis Genome Center                            [EMAIL PROTECTED] |
| Bioinformatics programming/database/sysadmin             (530)754-9127 |
+---------------- said t e lawrence, picking up his fork ----------------+

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Blob data

Reply via email to