RE: MySQL as document storage?
Steve, I know you've been pointed at an interesting resource, and others are saying they've done this successfully, but I think you should get some balance. I've worked in document management for years, and I think it's a very bad idea. Have you ever had to restore a system from loads of incremental backups? I'd recommend you look at storing the files on the files system in a proper storage array that can be expanded when needed. Backing up and restoring files is much easier in this scenario, as when someone deletes 1 document by mistake or a file gets corrupted, you only have to restore 1 file. How do you do this in your database system? The storage array will give much more flexible backup options. It's also much easier to use a full text indexer such as Lucene on the raw files. In the database, you store all of the document metadata and a unique id. On the file system, name the files as the same as the id. That way, it's easy to get between the file system and the database record, or the full text index and the database record. I have to say that reading the PHP article made me feel very uneasy - it all seemed fundamentally flawed to me. However, good luck if you decide to go that way. I'd be interested to hear if you genuinely feel it has been a success, especially after you've recovered from your first major data loss :-) (This is not meant to be sarcastic) Best regards, Andy -Original Message- From: Steve Folly [mailto:[EMAIL PROTECTED] Sent: 07 January 2004 20:56 To: MySQL MySQL Subject: MySQL as document storage? Hi, (disclaimer - this thread could easily go off topic; I'm interested only in the MySQL aspects of what follows...) At work we are currently investigating ways of filing all our electronic documents. There is commercial software that will do this I know, but I was wondering whether MySQL would be suitable for this type of thing. The 'documents' could be literally any binary file. My idea would be to create a table with a blob column for the document itself, and document title, reference number, keywords, other meta-data. And a web-based front-end to search and serve documents. Although the documents could be any file, the majority would be textual documents (Word documents, PDF, etc). How would one go about indexing such data, since full text searches operate on textual columns? How to cope with columns exceeding the max packet length? Why is there a max_packet_length setting; surely this is low-level stuff that shouldn't affect query and result sizes? Is storing the actual documents in the database such a good idea anyway? Perhaps store the file in a file system somwhere and just store the filename? If anyone has experience in doing (or been dissuaded from doing) this kind of application your thoughts and comments would be appreciated. (If only to tell me don't be so stupid, it'll never work :) Thanks. -- Regards, Steve. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: MySQL as document storage?
On Thursday 08 Jan 2004 12:33, Harald Fuchs wrote: (B In article [EMAIL PROTECTED], (B (B Richard Davies [EMAIL PROTECTED] writes: (B I find at present from reasons I can't work out if the image to be stored (B is a jpg then there is no problem but if it is a tiff them MySQL won't (B store it. (B (B insert into charts values(5,'No (B comment',(LOAD_FILE("/home/richard/fred.jpg"))); (B (B works fine. (B (B insert into charts values(5,'No (B comment',(LOAD_FILE("/home/richard/fred.tiff"))); (B (B inserts NULL instead of the image. fred.jpg and fred.tiff both exist in (B the same directory. (B (B Maybe fred.tiff is larger than max_allowed_packet? (B (BNo it isn't. fred.tiff is just fred.jpg copied and renamed. Apart from the (Bname the files are identical at about 400k and the max_allowed_packet is set (Bin /etc/my.cnf to be 10Meg. (B (BIt is extemely odd but this install of MySQL doesn't seem to like the (Bextension tiff. (B (B-- (BRegards (B (BRichard (B (B (B-- (BMySQL General Mailing List (BFor list archives: http://lists.mysql.com/mysql (BTo unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: MySQL as document storage?
pardon me for asking a stupid question, but when your storing data into MySQL or any database for that matter, isn't data data? Does this have something to do with the BLOB type? ( Which I've never understood anyway ). I'm very new at this and am wanting to store documents that have database fields in them so I am following this discussion closely trying to learn something. Troy Richard Davies [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] On Thursday 08 Jan 2004 12:33, Harald Fuchs wrote: In article [EMAIL PROTECTED], Richard Davies [EMAIL PROTECTED] writes: I find at present from reasons I can't work out if the image to be stored is a jpg then there is no problem but if it is a tiff them MySQL won't store it. insert into charts values(5,'No comment',(LOAD_FILE(/home/richard/fred.jpg))); works fine. insert into charts values(5,'No comment',(LOAD_FILE(/home/richard/fred.tiff))); inserts NULL instead of the image. fred.jpg and fred.tiff both exist in the same directory. Maybe fred.tiff is larger than max_allowed_packet? No it isn't. fred.tiff is just fred.jpg copied and renamed. Apart from the name the files are identical at about 400k and the max_allowed_packet is set in /etc/my.cnf to be 10Meg. It is extemely odd but this install of MySQL doesn't seem to like the extension tiff. -- Regards Richard -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.558 / Virus Database: 350 - Release Date: 1/2/04 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: MySQL as document storage?
On Thursday 08 Jan 2004 16:50, you wrote: (B On Thu, Jan 08, 2004 at 02:47:51PM +, Richard Davies wrote: (B On Thursday 08 Jan 2004 12:33, Harald Fuchs wrote: (B In article [EMAIL PROTECTED], (B (B Richard Davies [EMAIL PROTECTED] writes: (BI find at present from reasons I can't work out if the image to be (Bstored is a jpg then there is no problem but if it is a tiff them (BMySQL won't store it. (B (Binsert into charts values(5,'No (Bcomment',(LOAD_FILE("/home/richard/fred.jpg"))); (B (Bworks fine. (B (Binsert into charts values(5,'No (Bcomment',(LOAD_FILE("/home/richard/fred.tiff"))); (B (Binserts NULL instead of the image. fred.jpg and fred.tiff both exist (Bin the same directory. (B (B Maybe fred.tiff is larger than max_allowed_packet? (B (B No it isn't. fred.tiff is just fred.jpg copied and renamed. Apart from (B the name the files are identical at about 400k and the max_allowed_packet (B is set in /etc/my.cnf to be 10Meg. (B (B Are the file permissions identical? Especially the third set is important. (B (BThey should have been exactly the same as the only thing I did to the file was (Bmv fred.jpg fred.tiff (BI can no longer check this as I deleted the directory. (B (B Mysqld probably runs under an account (mysql) that is not yourself or in (B your group, so the file needs to be world-readable. Use 'ls -l' to show (B these permissions. (B (BYou must be right though as I have just done some extensive testing with a (Brange of 17 files ranging from 100k to 1.7meg and have no problems so long as (BI chmod 666 * (B (B Also, did the copying or renaming change the case of the filename, e.g. (B because you accessed it through a Samba share? (B (BPurely linux system no Samba but a very good thought. (B (BThank you all for you helpful suggestions I still have no idea how the file (Bpermissions got changed but it seems to be the answer. (B (B (B (B Regards, (B (B Fred. (B (B-- (BRegards (B (BRichard (B (B (B-- (BMySQL General Mailing List (BFor list archives: http://lists.mysql.com/mysql (BTo unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: MySQL as document storage?
Original Message - From: Steve Folly [EMAIL PROTECTED] At work we are currently investigating ways of filing all our electronic documents. I don't know the answer, but it's an interesting question. We are currently looking at using more and more SQL (we use MySQL now in places), and we face looking the issue of storing PDF files and such in a back-end. We also have Lotus Domino - which is very good at this sort of thing - but I'd still be interested in the discussion. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: MySQL as document storage?
May be there is an option on the Commercial Market, BasisPlus from OpenText. I worked quite a lot with it but, sincerely, I prefer a MySQL table with references to files on the File System. A lot faster, and cheaper Now there is true that I use to work with sgml files and I still prefer them. I this case, you can even use TEXT columns and - if you use MyISAM tables - you can index them to use full text retrieval. On the other hand, BasisPlus has the ability to import all sort of data and index them, but you have to pay a high price for that... In conclusion, I still think that MySQL is your best choice here. Leo. AFIP-AR. - Original Message - From: Ian O'Rourke [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 6:06 PM Subject: Re: MySQL as document storage? Original Message - From: Steve Folly [EMAIL PROTECTED] At work we are currently investigating ways of filing all our electronic documents. I don't know the answer, but it's an interesting question. We are currently looking at using more and more SQL (we use MySQL now in places), and we face looking the issue of storing PDF files and such in a back-end. We also have Lotus Domino - which is very good at this sort of thing - but I'd still be interested in the discussion. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: MySQL as document storage?
This article discusses it briefly: http://php.dreamwerx.net/forums/viewtopic.php?t=6 I am using this type of design/technology for quite a few clients. Some storing gigs and gigs of data (images, documents, pdf, anything) over multiple servers. The scalability and performance of a well designed system I think are very close to standard filesystem storage. I have currently written http and ftp gateways to access data stored in mysql dataservers(databases). Quite easy and fast... I have yet to do much with searching and indexing of files (not required) but I would imagine you could have very fast searching features. On Wed, 7 Jan 2004, Steve Folly wrote: Hi, (disclaimer - this thread could easily go off topic; I'm interested only in the MySQL aspects of what follows...) At work we are currently investigating ways of filing all our electronic documents. There is commercial software that will do this I know, but I was wondering whether MySQL would be suitable for this type of thing. The 'documents' could be literally any binary file. My idea would be to create a table with a blob column for the document itself, and document title, reference number, keywords, other meta-data. And a web-based front-end to search and serve documents. Although the documents could be any file, the majority would be textual documents (Word documents, PDF, etc). How would one go about indexing such data, since full text searches operate on textual columns? How to cope with columns exceeding the max packet length? Why is there a max_packet_length setting; surely this is low-level stuff that shouldn't affect query and result sizes? Is storing the actual documents in the database such a good idea anyway? Perhaps store the file in a file system somwhere and just store the filename? If anyone has experience in doing (or been dissuaded from doing) this kind of application your thoughts and comments would be appreciated. (If only to tell me don't be so stupid, it'll never work :) Thanks. -- Regards, Steve. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: MySQL as document storage?
I made an online file manager using PHP and MySQL some years ago, and am now embedding something similar into my office's database front-end. I decided to store our files in the file system rather than the database in order to keep the DB size low. A benefit of this is it takes less time to restore a backup of the database than it would if I were dealing with the extra gigabytes of embedded files (which I can restore on an individual basis). As for indexing, a lot of the document retrieval solutions out there just go by metadata when you do a file search. Business class scanning systems offer you the option of embedding user-supplied metadata in your scanned files so adding your own keywords is an option. In my experience you are better off going by just some supplied keywords and metadata rather than the full text of a document because you end up with more relevant results. The exception to this is when you are just dying to know how many documents contain the word pie. If you find that this is the case then you obviously have the free time needed to build some extra indexes... ;) Thanks, Rob Brahier Web Architect Email: [EMAIL PROTECTED] Special Notice: This email transmission may contain material, which is confidential under Florida statutes and is intended to be delivered only to the named addressee. This information belongs to our facility and is legally privileged. Unauthorized dissemination of this information may be a violation of criminal statutes. The recipient of this information is prohibited from disclosing, copying, distributing or using this information except as permitted by current government law governing privacy information issues. Such information must be destroyed after its stated need has been fulfilled, unless otherwise prohibited by law. If this information is received by anyone other than the named addressee, the recipient should immediately notify us at the address or telephone number shown and obtain instructions as to the disposal thereof. Under no circumstances should this material be read, retained, or copied by anyone other than the named addressee. -Original Message- From: Steve Folly [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 3:56 PM To: MySQL MySQL Subject: MySQL as document storage? Hi, (disclaimer - this thread could easily go off topic; I'm interested only in the MySQL aspects of what follows...) At work we are currently investigating ways of filing all our electronic documents. There is commercial software that will do this I know, but I was wondering whether MySQL would be suitable for this type of thing. The 'documents' could be literally any binary file. My idea would be to create a table with a blob column for the document itself, and document title, reference number, keywords, other meta-data. And a web-based front-end to search and serve documents. Although the documents could be any file, the majority would be textual documents (Word documents, PDF, etc). How would one go about indexing such data, since full text searches operate on textual columns? How to cope with columns exceeding the max packet length? Why is there a max_packet_length setting; surely this is low-level stuff that shouldn't affect query and result sizes? Is storing the actual documents in the database such a good idea anyway? Perhaps store the file in a file system somwhere and just store the filename? If anyone has experience in doing (or been dissuaded from doing) this kind of application your thoughts and comments would be appreciated. (If only to tell me don't be so stupid, it'll never work :) Thanks. -- Regards, Steve. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: MySQL as document storage?
On 7 Jan 2004, at 21:51, [EMAIL PROTECTED] wrote: This article discusses it briefly: http://php.dreamwerx.net/forums/viewtopic.php?t=6 That's an interesting article. Thanks. A similar table design to what I had in mind (hmmm... how different can these things be! :) I like the idea of splitting up binary data into segments so as reduce the load on the server. I assume this works because MySQL doesn't send all rows over the connection when the query completes, only just the ones you ask for? (In this case, a segment at a time?) Steve. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: MySQL as document storage?
On 7 Jan 2004, at 22:24, Rob Brahier wrote: I made an online file manager using PHP and MySQL some years ago, and am now embedding something similar into my office's database front-end. I decided to store our files in the file system rather than the database in order to keep the DB size low. A benefit of this is it takes less time to restore a backup of the database than it would if I were dealing with the extra gigabytes of embedded files (which I can restore on an individual basis). Incremental backups would certainly be smaller. The backup would spot individual files being changed, but being blobs in a table, the one (large) file in the database would have to be backed up for each change? As for indexing, a lot of the document retrieval solutions out there just go by metadata when you do a file search. Business class scanning systems offer you the option of embedding user-supplied metadata in your scanned files so adding your own keywords is an option. In my experience you are better off going by just some supplied keywords and metadata rather than the full text of a document because you end up with more relevant results. The exception to this is when you are just dying to know how many documents contain the word pie. If you find that this is the case then you obviously have the free time needed to build some extra indexes... ;) I think you're right. Thanks for the input. -- Regards, Steve. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]