I made an online file manager using PHP and MySQL some years ago, and am now embedding something similar into my office's database front-end. I decided to store our files in the file system rather than the database in order to keep the DB size low. A benefit of this is it takes less time to restore a backup of the database than it would if I were dealing with the extra gigabytes of embedded files (which I can restore on an individual basis).
As for indexing, a lot of the document retrieval solutions out there just go by metadata when you do a file search. Business class scanning systems offer you the option of embedding user-supplied metadata in your scanned files so adding your own keywords is an option. In my experience you are better off going by just some supplied keywords and metadata rather than the full text of a document because you end up with more relevant results. The exception to this is when you are just dying to know how many documents contain the word "pie". If you find that this is the case then you obviously have the free time needed to build some extra indexes... ;) Thanks, Rob Brahier Web Architect Email: [EMAIL PROTECTED] Special Notice: This email transmission may contain material, which is confidential under Florida statutes and is intended to be delivered only to the named addressee. This information belongs to our facility and is legally privileged. Unauthorized dissemination of this information may be a violation of criminal statutes. The recipient of this information is prohibited from disclosing, copying, distributing or using this information except as permitted by current government law governing privacy information issues. Such information must be destroyed after its stated need has been fulfilled, unless otherwise prohibited by law. If this information is received by anyone other than the named addressee, the recipient should immediately notify us at the address or telephone number shown and obtain instructions as to the disposal thereof. Under no circumstances should this material be read, retained, or copied by anyone other than the named addressee. -----Original Message----- From: Steve Folly [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 3:56 PM To: MySQL MySQL Subject: MySQL as document storage? Hi, (disclaimer - this thread could easily go off topic; I'm interested only in the MySQL aspects of what follows...) At work we are currently investigating ways of filing all our electronic documents. There is commercial software that will do this I know, but I was wondering whether MySQL would be suitable for this type of thing. The 'documents' could be literally any binary file. My idea would be to create a table with a blob column for the document itself, and document title, reference number, keywords, other meta-data. And a web-based front-end to search and serve documents. Although the documents could be any file, the majority would be textual documents (Word documents, PDF, etc). How would one go about indexing such data, since full text searches operate on textual columns? How to cope with columns exceeding the max packet length? Why is there a max_packet_length setting; surely this is low-level stuff that shouldn't affect query and result sizes? Is storing the actual documents in the database such a good idea anyway? Perhaps store the file in a file system somwhere and just store the filename? If anyone has experience in doing (or been dissuaded from doing) this kind of application your thoughts and comments would be appreciated. (If only to tell me "don't be so stupid, it'll never work" :) Thanks. -- Regards, Steve. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]