RE: MySQL as document storage?

2004-01-08 Thread Andy Eastham
Steve,

I know you've been pointed at an interesting resource, and others are saying
they've done this successfully, but I think you should get some balance.
I've worked in document management for years, and I think it's a very bad
idea.  Have you ever had to restore a system from loads of incremental
backups?

I'd recommend you look at storing the files on the files system in a proper
storage array that can be expanded when needed.  Backing up and restoring
files is much easier in this scenario, as when someone deletes 1 document by
mistake or a file gets corrupted, you only have to restore 1 file.  How do
you do this in your database system?  The storage array will give much more
flexible backup options.

It's also much easier to use a full text indexer such as Lucene on the raw
files.

In the database, you store all of the document metadata and a unique id.  On
the file system, name the files as the same as the id.  That way, it's easy
to get between the file system and the database record, or the full text
index and the database record.

I have to say that reading the PHP article made me feel very uneasy - it all
seemed fundamentally flawed to me.

However, good luck if you decide to go that way.  I'd be interested to hear
if you genuinely feel it has been a success, especially after you've
recovered from your first major data loss :-) 
(This is not meant to be sarcastic)

Best regards,

Andy


-Original Message-
From: Steve Folly [mailto:[EMAIL PROTECTED] 
Sent: 07 January 2004 20:56
To: MySQL MySQL
Subject: MySQL as document storage?

Hi,

(disclaimer - this thread could easily go off topic; I'm interested 
only in the MySQL aspects of what follows...)

At work we are currently investigating ways of filing all our 
electronic documents.

There is commercial software that will do this I know, but I was 
wondering whether MySQL would be suitable for this type of thing.

The 'documents' could be literally any binary file. My idea would be to 
create a table with a blob column for the document itself, and document 
title, reference number, keywords, other meta-data. And a web-based 
front-end to search and serve documents.

Although the documents could be any file, the majority would be textual 
documents (Word documents, PDF, etc). How would one go about indexing 
such data, since full text searches operate on textual columns?

How to cope with columns exceeding the max packet length? Why is there 
a max_packet_length setting; surely this is low-level stuff that 
shouldn't affect query and result sizes?

Is storing the actual documents in the database such a good idea 
anyway? Perhaps store the file in a file system somwhere and just store 
the filename?


If anyone has experience in doing (or been dissuaded from doing) this 
kind of application your thoughts and comments would be appreciated. 
(If only to tell me don't be so stupid, it'll never work :)

Thanks.

-- 
Regards,
Steve.


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: MySQL as document storage?

2004-01-08 Thread Richard Davies
On Thursday 08 Jan 2004 12:33, Harald Fuchs wrote:
(B In article [EMAIL PROTECTED],
(B
(B Richard Davies [EMAIL PROTECTED] writes:
(B  I find at present from reasons I can't work out if the image to be stored
(B  is a jpg then there is no problem but if it is a tiff them MySQL won't
(B  store it.
(B
(B  insert into charts values(5,'No
(B  comment',(LOAD_FILE("/home/richard/fred.jpg")));
(B
(B  works fine.
(B
(B  insert into charts values(5,'No
(B  comment',(LOAD_FILE("/home/richard/fred.tiff")));
(B
(B  inserts NULL instead of the image.  fred.jpg and fred.tiff both exist in
(B  the same directory.
(B
(B Maybe fred.tiff is larger than max_allowed_packet?
(B
(BNo it isn't.  fred.tiff is just fred.jpg copied and renamed. Apart from the 
(Bname the files are identical at about 400k and the max_allowed_packet is set 
(Bin /etc/my.cnf to be 10Meg.
(B
(BIt is extemely odd but this install of MySQL doesn't seem to like the 
(Bextension tiff.
(B
(B-- 
(BRegards
(B
(BRichard
(B
(B
(B-- 
(BMySQL General Mailing List
(BFor list archives: http://lists.mysql.com/mysql
(BTo unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]

Re: MySQL as document storage?

2004-01-08 Thread Troy T. Hall
pardon me for asking a stupid question, but when your storing data into
MySQL or any database for that matter, isn't data data?  Does this have
something to do with the BLOB type?  ( Which I've never understood anyway ).
I'm very new at this and am wanting to store documents that have database
fields in them so I am following this discussion closely trying to learn
something.

Troy
Richard Davies [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 On Thursday 08 Jan 2004 12:33, Harald Fuchs wrote:
  In article [EMAIL PROTECTED],

  Richard Davies [EMAIL PROTECTED] writes:
   I find at present from reasons I can't work out if the image to be
stored
   is a jpg then there is no problem but if it is a tiff them MySQL won't
   store it.

   insert into charts values(5,'No
   comment',(LOAD_FILE(/home/richard/fred.jpg)));

   works fine.

   insert into charts values(5,'No
   comment',(LOAD_FILE(/home/richard/fred.tiff)));

   inserts NULL instead of the image.  fred.jpg and fred.tiff both exist
in
   the same directory.

  Maybe fred.tiff is larger than max_allowed_packet?

 No it isn't.  fred.tiff is just fred.jpg copied and renamed. Apart from
the
 name the files are identical at about 400k and the max_allowed_packet is
set
 in /etc/my.cnf to be 10Meg.

 It is extemely odd but this install of MySQL doesn't seem to like the
 extension tiff.

 -- 
 Regards

 Richard


 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]




---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.558 / Virus Database: 350 - Release Date: 1/2/04




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: MySQL as document storage?

2004-01-08 Thread Richard Davies
On Thursday 08 Jan 2004 16:50, you wrote:
(B On Thu, Jan 08, 2004 at 02:47:51PM +, Richard Davies wrote:
(B  On Thursday 08 Jan 2004 12:33, Harald Fuchs wrote:
(B   In article [EMAIL PROTECTED],
(B  
(B   Richard Davies [EMAIL PROTECTED] writes:
(BI find at present from reasons I can't work out if the image to be
(Bstored is a jpg then there is no problem but if it is a tiff them
(BMySQL won't store it.
(B   
(Binsert into charts values(5,'No
(Bcomment',(LOAD_FILE("/home/richard/fred.jpg")));
(B   
(Bworks fine.
(B   
(Binsert into charts values(5,'No
(Bcomment',(LOAD_FILE("/home/richard/fred.tiff")));
(B   
(Binserts NULL instead of the image.  fred.jpg and fred.tiff both exist
(Bin the same directory.
(B  
(B   Maybe fred.tiff is larger than max_allowed_packet?
(B 
(B  No it isn't.  fred.tiff is just fred.jpg copied and renamed. Apart from
(B  the name the files are identical at about 400k and the max_allowed_packet
(B  is set in /etc/my.cnf to be 10Meg.
(B
(B Are the file permissions identical? Especially the third set is important.
(B
(BThey should have been exactly the same as the only thing I did to the file was 
(Bmv fred.jpg fred.tiff
(BI can no longer check this as I deleted the directory. 
(B
(B Mysqld probably runs under an account (mysql) that is not yourself or in
(B your group, so the file needs to be world-readable. Use 'ls -l' to show
(B these permissions.
(B
(BYou must be right though as I have just done some extensive testing with a 
(Brange of 17 files ranging from 100k to 1.7meg and have no problems so long as 
(BI chmod 666 *
(B
(B Also, did the copying or renaming change the case of the filename, e.g.
(B because you accessed it through a Samba share?
(B
(BPurely linux system no Samba but a very good thought.
(B
(BThank you all for you helpful suggestions I still have no idea how the file 
(Bpermissions got changed but it seems to be the answer.
(B
(B
(B
(B Regards,
(B
(B Fred.
(B
(B-- 
(BRegards
(B
(BRichard
(B
(B
(B-- 
(BMySQL General Mailing List
(BFor list archives: http://lists.mysql.com/mysql
(BTo unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]

Re: MySQL as document storage?

2004-01-07 Thread Ian O'Rourke
 Original Message -
From: Steve Folly [EMAIL PROTECTED]

 At work we are currently investigating ways of filing all our
 electronic documents.

I don't know the answer, but it's an interesting question. We are currently
looking at using more and more SQL (we use MySQL now in places), and we face
looking the issue of storing PDF files and such in a back-end. We also have
Lotus Domino - which is very good at this sort of thing - but I'd still be
interested in the discussion.


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: MySQL as document storage?

2004-01-07 Thread Leonardo Javier Belén
May be there is an option on the Commercial Market, BasisPlus from OpenText.
I worked quite a lot with it but, sincerely, I prefer a MySQL table with
references to files on the File System. A lot faster, and cheaper

Now there is true that I use to work with sgml files and I still prefer
them. I this case, you can even use TEXT columns and - if you use MyISAM
tables - you can index them to use full text retrieval. On the other hand,
BasisPlus has the ability to import all sort of data and index them, but you
have to pay a high price for that...

In conclusion, I still think that MySQL is your best choice here.

Leo. AFIP-AR.

- Original Message -
From: Ian O'Rourke [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, January 07, 2004 6:06 PM
Subject: Re: MySQL as document storage?


  Original Message -
 From: Steve Folly [EMAIL PROTECTED]
 
  At work we are currently investigating ways of filing all our
  electronic documents.

 I don't know the answer, but it's an interesting question. We are
currently
 looking at using more and more SQL (we use MySQL now in places), and we
face
 looking the issue of storing PDF files and such in a back-end. We also
have
 Lotus Domino - which is very good at this sort of thing - but I'd still be
 interested in the discussion.


 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: MySQL as document storage?

2004-01-07 Thread colbey

This article discusses it briefly:
http://php.dreamwerx.net/forums/viewtopic.php?t=6

I am using this type of design/technology for quite a few clients. Some
storing gigs and gigs of data (images, documents, pdf, anything) over
multiple servers.

The scalability and performance of a well designed system I think are
very close to standard filesystem storage.

I have currently written http and ftp gateways to access data stored in
mysql dataservers(databases).  Quite easy and fast...

I have yet to do much with searching and indexing of files (not required)
but I would imagine you could have very fast searching features.


On Wed, 7 Jan 2004, Steve Folly wrote:

 Hi,

 (disclaimer - this thread could easily go off topic; I'm interested
 only in the MySQL aspects of what follows...)

 At work we are currently investigating ways of filing all our
 electronic documents.

 There is commercial software that will do this I know, but I was
 wondering whether MySQL would be suitable for this type of thing.

 The 'documents' could be literally any binary file. My idea would be to
 create a table with a blob column for the document itself, and document
 title, reference number, keywords, other meta-data. And a web-based
 front-end to search and serve documents.

 Although the documents could be any file, the majority would be textual
 documents (Word documents, PDF, etc). How would one go about indexing
 such data, since full text searches operate on textual columns?

 How to cope with columns exceeding the max packet length? Why is there
 a max_packet_length setting; surely this is low-level stuff that
 shouldn't affect query and result sizes?

 Is storing the actual documents in the database such a good idea
 anyway? Perhaps store the file in a file system somwhere and just store
 the filename?


 If anyone has experience in doing (or been dissuaded from doing) this
 kind of application your thoughts and comments would be appreciated.
 (If only to tell me don't be so stupid, it'll never work :)

 Thanks.

 --
 Regards,
 Steve.


 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: MySQL as document storage?

2004-01-07 Thread Rob Brahier
I made an online file manager using PHP and MySQL some years ago, and am
now embedding something similar into my office's database front-end.  I
decided to store our files in the file system rather than the database
in order to keep the DB size low.  A benefit of this is it takes less
time to restore a backup of the database than it would if I were dealing
with the extra gigabytes of embedded files (which I can restore on an
individual basis).

As for indexing, a lot of the document retrieval solutions out there
just go by metadata when you do a file search.  Business class scanning
systems offer you the option of embedding user-supplied metadata in your
scanned files so adding your own keywords is an option.  In my
experience you are better off going by just some supplied keywords and
metadata rather than the full text of a document because you end up with
more relevant results.  The exception to this is when you are just dying
to know how many documents contain the word pie.  If you find that
this is the case then you obviously have the free time needed to build
some extra indexes...  ;)  

Thanks,

Rob Brahier
Web Architect
Email: [EMAIL PROTECTED]

Special Notice: This email transmission may contain material, which is
confidential under Florida statutes and is intended to be delivered only
to the named addressee. This information belongs to our facility and is
legally privileged. Unauthorized dissemination of this information may
be a violation of criminal statutes. The recipient of this information
is prohibited from disclosing, copying, distributing or using this
information except as permitted by current government law governing
privacy information issues. Such information must be destroyed after its
stated need has been fulfilled, unless otherwise prohibited by law. If
this information is received by anyone other than the named addressee,
the recipient should immediately notify us at the address or telephone
number shown and obtain instructions as to the disposal thereof. Under
no circumstances should this material be read, retained, or copied by
anyone other than the named addressee.


-Original Message-
From: Steve Folly [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, January 07, 2004 3:56 PM
To: MySQL MySQL
Subject: MySQL as document storage?

Hi,

(disclaimer - this thread could easily go off topic; I'm interested 
only in the MySQL aspects of what follows...)

At work we are currently investigating ways of filing all our 
electronic documents.

There is commercial software that will do this I know, but I was 
wondering whether MySQL would be suitable for this type of thing.

The 'documents' could be literally any binary file. My idea would be to 
create a table with a blob column for the document itself, and document 
title, reference number, keywords, other meta-data. And a web-based 
front-end to search and serve documents.

Although the documents could be any file, the majority would be textual 
documents (Word documents, PDF, etc). How would one go about indexing 
such data, since full text searches operate on textual columns?

How to cope with columns exceeding the max packet length? Why is there 
a max_packet_length setting; surely this is low-level stuff that 
shouldn't affect query and result sizes?

Is storing the actual documents in the database such a good idea 
anyway? Perhaps store the file in a file system somwhere and just store 
the filename?


If anyone has experience in doing (or been dissuaded from doing) this 
kind of application your thoughts and comments would be appreciated. 
(If only to tell me don't be so stupid, it'll never work :)

Thanks.

-- 
Regards,
Steve.



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: MySQL as document storage?

2004-01-07 Thread Steve Folly
On 7 Jan 2004, at 21:51, [EMAIL PROTECTED] wrote:

This article discusses it briefly:
http://php.dreamwerx.net/forums/viewtopic.php?t=6
That's an interesting article. Thanks.  A similar table design to what 
I had in mind (hmmm... how different can these things be! :)

I like the idea of splitting up binary data into segments so as reduce 
the load on the server. I assume this works because MySQL doesn't send 
all rows over the connection when the query completes, only just the 
ones you ask for? (In this case, a segment at a time?)



Steve.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: MySQL as document storage?

2004-01-07 Thread Steve Folly
On 7 Jan 2004, at 22:24, Rob Brahier wrote:

I made an online file manager using PHP and MySQL some years ago, and 
am
now embedding something similar into my office's database front-end.  I
decided to store our files in the file system rather than the database
in order to keep the DB size low.  A benefit of this is it takes less
time to restore a backup of the database than it would if I were 
dealing
with the extra gigabytes of embedded files (which I can restore on an
individual basis).
Incremental backups would certainly be smaller. The backup would spot 
individual files being changed, but being blobs in a table, the one 
(large) file in the database would have to be backed up for each 
change?


As for indexing, a lot of the document retrieval solutions out there
just go by metadata when you do a file search.  Business class scanning
systems offer you the option of embedding user-supplied metadata in 
your
scanned files so adding your own keywords is an option.  In my
experience you are better off going by just some supplied keywords and
metadata rather than the full text of a document because you end up 
with
more relevant results.  The exception to this is when you are just 
dying
to know how many documents contain the word pie.  If you find that
this is the case then you obviously have the free time needed to build
some extra indexes...  ;)

I think you're right.

Thanks for the input.

--
Regards,
Steve.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]