I made an online file manager using PHP and MySQL some years ago, and am
now embedding something similar into my office's database front-end.  I
decided to store our files in the file system rather than the database
in order to keep the DB size low.  A benefit of this is it takes less
time to restore a backup of the database than it would if I were dealing
with the extra gigabytes of embedded files (which I can restore on an
individual basis).

As for indexing, a lot of the document retrieval solutions out there
just go by metadata when you do a file search.  Business class scanning
systems offer you the option of embedding user-supplied metadata in your
scanned files so adding your own keywords is an option.  In my
experience you are better off going by just some supplied keywords and
metadata rather than the full text of a document because you end up with
more relevant results.  The exception to this is when you are just dying
to know how many documents contain the word "pie".  If you find that
this is the case then you obviously have the free time needed to build
some extra indexes...  ;)  

Thanks,

Rob Brahier
Web Architect
Email: [EMAIL PROTECTED]

Special Notice: This email transmission may contain material, which is
confidential under Florida statutes and is intended to be delivered only
to the named addressee. This information belongs to our facility and is
legally privileged. Unauthorized dissemination of this information may
be a violation of criminal statutes. The recipient of this information
is prohibited from disclosing, copying, distributing or using this
information except as permitted by current government law governing
privacy information issues. Such information must be destroyed after its
stated need has been fulfilled, unless otherwise prohibited by law. If
this information is received by anyone other than the named addressee,
the recipient should immediately notify us at the address or telephone
number shown and obtain instructions as to the disposal thereof. Under
no circumstances should this material be read, retained, or copied by
anyone other than the named addressee.


-----Original Message-----
From: Steve Folly [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, January 07, 2004 3:56 PM
To: MySQL MySQL
Subject: MySQL as document storage?

Hi,

(disclaimer - this thread could easily go off topic; I'm interested 
only in the MySQL aspects of what follows...)

At work we are currently investigating ways of filing all our 
electronic documents.

There is commercial software that will do this I know, but I was 
wondering whether MySQL would be suitable for this type of thing.

The 'documents' could be literally any binary file. My idea would be to 
create a table with a blob column for the document itself, and document 
title, reference number, keywords, other meta-data. And a web-based 
front-end to search and serve documents.

Although the documents could be any file, the majority would be textual 
documents (Word documents, PDF, etc). How would one go about indexing 
such data, since full text searches operate on textual columns?

How to cope with columns exceeding the max packet length? Why is there 
a max_packet_length setting; surely this is low-level stuff that 
shouldn't affect query and result sizes?

Is storing the actual documents in the database such a good idea 
anyway? Perhaps store the file in a file system somwhere and just store 
the filename?


If anyone has experience in doing (or been dissuaded from doing) this 
kind of application your thoughts and comments would be appreciated. 
(If only to tell me "don't be so stupid, it'll never work" :)

Thanks.

-- 
Regards,
Steve.



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to