Re: blob versus file

2002-07-04 Thread Kristian Koehntopp

Am Mittwoch, 3. Juli 2002 20:43 schrieb central:
 More specific: Can I efficiently read the bytes x to y from
  any BLOB stored in a MySQL database?

 Why not just add another column, Char(3), that contains the
 file extension? 

That would fix this particular case, but my thought were more 
along a general comparison of BLOB and file APIs. While ext2 
open is dominated by directory lookup times when opening files, 
reiserfs and MySQL perform logarithmically here. On the other 
hand, during read and write phases ext2 and reiserfs as well as 
all other file systems have an API that allows them partial 
object reads and writes as well as seeking within that object. 

The question is, what does the MySQL BLOB API provide and how 
does it relate to performance of applications making use of the 
MySQL BLOB API?

Kristian


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: blob versus file

2002-07-03 Thread Tim Ward

 Grabbing a half-gig video segment out of any
 database

I'm sure you're absolutely right about not putting half gig videos in a
database!

I was thinking more of an application like an access control system, where
there might be tens of thousands of photographs of people, each a jpeg of a
small number of K, or a catalogue, again with thousands of tiny photos.

The experiment I did with 500,000 operating system files was to see whether
smallish blobs (bits of text, in this case, which I didn't need to search or
index) were better stored as disk files or in an SQL Server database. The
database won hands down on all counts.

Tim Ward
Brett Ward Limited - www.brettward.co.uk


---
On August 5th ip.access will be moving to:

CPC1 Telephone: +44 (0) 1223 219000
Capital Park Fax:   +44 (0) 1223 219099
Fulbourn
Cambridge CB1 5XE
United Kingdom
---
This e-mail and the information it contains are
confidential and may be privileged. If you have
received this e-mail in error please notify us
immediately. You should not copy it for any purpose, or
disclose its contents to any other person. All emails
to and from ip.access may be monitored and stored for
audit and other legal reasons.
---


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: blob versus file

2002-07-03 Thread Elizabeth Mattijsen

At 09:42 AM 7/3/02 +0100, Tim Ward wrote:
I was thinking more of an application like an access control system, where
there might be tens of thousands of photographs of people, each a jpeg of a
small number of K, or a catalogue, again with thousands of tiny photos.

The experiment I did with 500,000 operating system files was to see whether
smallish blobs (bits of text, in this case, which I didn't need to search or
index) were better stored as disk files or in an SQL Server database. The
database won hands down on all counts.

Not meaning to put down MySQL, but have you tried this also with a ReiserFS 
filesystem?  I had a similar number of files, about 70 GByte worth on an 
ext2 filesystem.  Moved them to a ReiserFS filesystem and found I only 
needed 51 GByte.  And got a much faster system...

The main gain of MySQL is probably that MySQL packs the data tighter 
together, which in the end causes fewer hard disk head movents and less 
data to be transferred.  And if you use indexes, your directory searches 
are binary chops rather than linear searches.  ReiserFS does that also, but 
at a file system level.

If you need to port your application to the outside world where you don't 
have control over your file systems, MySQL would be the way to go for this 
application.  If you _do_ have control over the filesystem being used and 
you can use ReiserFS and you don't need MySQL specific features, then I 
wouldn't be so sure.


Just my 2 eurocents worth...


Liz


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: blob versus file

2002-07-03 Thread Benjamin Pflugmann

Hello.

On Wed 2002-07-03 at 09:42:52 +0100, [EMAIL PROTECTED] wrote:
[...]
 I was thinking more of an application like an access control system, where
 there might be tens of thousands of photographs of people, each a jpeg of a
 small number of K, or a catalogue, again with thousands of tiny photos.
 
 The experiment I did with 500,000 operating system files was to see whether
 smallish blobs (bits of text, in this case, which I didn't need to search or
 index) were better stored as disk files or in an SQL Server database. The
 database won hands down on all counts.

Well, you did *not* put all 500.000 files in one directory, did you?

I had a similar setting for some years (about 12GB of data in about
1.2 Mio. files at the end) stored in a directory hierachy and had no
particular performance problems (at least once I started to avoid
having more than 5.000 files in one dir ;-).  About 50-100 random
accesses/sec sustained at high time (including all the rest: web
server, database server, etc...).

Btw, this was still under Linux 2.0, Alpha 300MHz? SCSI (which is
beaten performance-wise by any desktop PC nowadays).

I mean not to say, that a SQL database would not have been able to
handle this. But a filesystem is an specialized hierachical database,
and I have yet to see an (more general) SQL database wins hands down
in the preferred domain of the file system - of course, a reasonable
implementation presumed (which ext2 IMHO is).

Bye,

Benjamin.

-- 
[EMAIL PROTECTED]

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: blob versus file

2002-07-03 Thread Kristian Koehntopp

Am Mittwoch, 3. Juli 2002 10:58 schrieb Elizabeth Mattijsen:
 Not meaning to put down MySQL, but have you tried this also
 with a ReiserFS filesystem?  I had a similar number of files,
 about 70 GByte worth on an ext2 filesystem.  Moved them to a
 ReiserFS filesystem and found I only needed 51 GByte.  And got
 a much faster system...

ext2 searches directories linearly. With 500.000 files in a 
directory where each file is subsequently accessed in a 
benchmark, you get quadratic access times.

reiserfs uses a tree structure for directories, giving you almost 
linear (O(n log n)) access. It also packs tails of different 
files into singular blocks, saving much space as well.


Question: Can the MySQL BLOB API access and transfer partial 
blobs. That is, if you want to do the equivalent of a file * 
to a BLOB table, the first 10 bytes or so of each BLOB must be 
read in order to guess the type of the BLOB. Is it possible to 
implement this efficiently using the MySQL API?

More specific: Can I efficiently read the bytes x to y from any 
BLOB stored in a MySQL database?

Kristian


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: blob versus file

2002-07-03 Thread central

Kristian,

Question: Can the MySQL BLOB API access and transfer partial
blobs. That is, if you want to do the equivalent of a file *
to a BLOB table, the first 10 bytes or so of each BLOB must be
read in order to guess the type of the BLOB. Is it possible to
implement this efficiently using the MySQL API?

More specific: Can I efficiently read the bytes x to y from any
BLOB stored in a MySQL database?

Why not just add another column, Char(3), that contains the file extension? 
Then you wouldn't need to read the blob data.
Or you can use ENUM and define the possible blob types. By having a second 
column you can sort and filter on it.
Another column could be the blob size (in bytes) so you know how much blob 
data is being stored.

Mike


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: blob versus file

2002-07-02 Thread Raymond Hamaker

Op maandag 1 juli 2002 14:18, schreef andy:
 Hi there,

 I am wondering if anybody has experiance in saving images to blob in mysql.

 I do save images with 1 K and 4 KB to blob fields while I used to save them
 to file. It seams to me that this is much slower accessing the files. The
 images take a bit (really short but absolutly noticable) to show up on the
 site. Is there a way to improve the performance, and why is this happening?
 I thought the performance might even boost after storing them to blobs.

 Thank you for any comments on that,

 Andy

From the faq on http://www.bitbybit.dk/mysqlfaq/comments.html#ch7_2_0

 Storing images and other binary data

Many people ask how to store and retrieve images and other binary data in the 
database. The question usually pops up along with a statement a la how do I 
retrrieve and show an image on a web page?

There are several reasons why you should consider not storing binary data in 
your database:

* The whole point of storing stuff in a SQL database, is to put some kind 
of ordering and structure on your data, as well as being able to search on 
these data. But how do you search in the binary data of your picture?
* For large data sets, storing binary data will quickly run up the size of 
your database files, making it harder to control the size of the database.
* In order to store binary data in the database, you must continually 
escape and unescape the data to ensure that nothing breaks.

You probably have several reasons why you would want to store your images in 
your database, despite all the statements above. Others have, before you. And 
they have all returned to the same solution: Store a link (e.g. a file path) 
to the image file in the database. Whenever you need the image, use the link 
in whatever program you use to retrieve the file containing the image.

Besides keeping your sanity, you will often find that things work faster this 
way: You have less data to transfer between your application and MySQL, you 
don't need to worry about escaping funny characters. And especially for 
serving HTML pages, you're helping your server decide which data to keep 
cached for faster client retrieval.

There is one good reason why you might want to store the binary data in the 
database: Replication. If you need to ensure that the binary data is 
available on several MySQL hosts, using replication might save your day. That 
said, you should always keep the statements above in mind.
 
HTH 
Ray



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: blob versus file

2002-07-02 Thread Tim Ward


 There are several reasons why you should consider not storing
 binary data in
 your database:

 [snip]

 You probably have several reasons why you would want to store
 your images in
 your database, despite all the statements above. Others have,
 before you. And
 they have all returned to the same solution: Store a link
 (e.g. a file path)
 to the image file in the database. Whenever you need the
 image, use the link
 in whatever program you use to retrieve the file containing the image.

Yes, but, this advice does *not* go on to describe how you cope with the
deletion problem.

If you store data in records in the database a DELETE will delete *all* the
data for the set of rows. If some of the data is lying around in disk files
these obviously don't get deleted by DELETE. So it seems to me that you need
an image file garbage collection process, which every now and then scans the
disk, scans the database and deletes disk files which no longer have a
pointer to them from any reference field in any row in any table in the
database. And of course as the file store and the database could both be
changing under your feet whilst you're doing this it seems to be a good idea
to shut down, or lock, the database server and the processes you've got
adding disk files whilst you're doing the garbage collection. Doesn't sound
to clever to me.

Yes of course you could make your business logic layer clever enough to have
a go at deleting the files when it deletes the database rows. But as you
don't have transactions covering database operations *and* file system
operations things will get out of step sooner or later so you'll still need
the garbage collector.

And some hints as to the viability of, say, hundreds of thousands of
operating system files on the various platforms supported by MySQL would be
helpful. For example many filing systems will lose *lots* of space through
fragmentation like this, which one might hope would not happen with blobs.
And what about the performance implications of lots of files in the same
directory? - NT4 for example *will* handle half a million files in the same
directory, 'cos I've tried it, but it's not terribly practical as some
operations start taking tens of minutes instead of tens of milliseconds.

Really one of the reasons for using a DBMS is to handle all this storage
management crap for you, no?

Tim Ward
Brett Ward Limited - www.brettward.co.uk


---
On August 5th ip.access will be moving to:

CPC1 Telephone: +44 (0) 1223 219000
Capital Park Fax:   +44 (0) 1223 219099
Fulbourn
Cambridge CB1 5XE
United Kingdom
---
This e-mail and the information it contains are
confidential and may be privileged. If you have
received this e-mail in error please notify us
immediately. You should not copy it for any purpose, or
disclose its contents to any other person. All emails
to and from ip.access may be monitored and stored for
audit and other legal reasons.
---


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: blob versus file

2002-07-02 Thread hooker

 Yes, but, this advice does *not* go on to describe how you cope with the
 deletion problem.
 
 If you store data in records in the database a DELETE will delete *all* the
 data for the set of rows. If some of the data is lying around in disk files
 these obviously don't get deleted by DELETE. So it seems to me that you need
 an image file garbage collection process, which every now and then scans the
 disk, scans the database and deletes disk files which no longer have a
 pointer to them from any reference field in any row in any table in the
 database. And of course as the file store and the database could both be
 changing under your feet whilst you're doing this it seems to be a good idea
 to shut down, or lock, the database server and the processes you've got
 adding disk files whilst you're doing the garbage collection. Doesn't sound
 to clever to me.
 
 Yes of course you could make your business logic layer clever enough to have
 a go at deleting the files when it deletes the database rows. But as you
 don't have transactions covering database operations *and* file system
 operations things will get out of step sooner or later so you'll still need
 the garbage collector.
 
 And some hints as to the viability of, say, hundreds of thousands of
 operating system files on the various platforms supported by MySQL would be
 helpful. For example many filing systems will lose *lots* of space through
 fragmentation like this, which one might hope would not happen with blobs.
 And what about the performance implications of lots of files in the same
 directory? - NT4 for example *will* handle half a million files in the same
 directory, 'cos I've tried it, but it's not terribly practical as some
 operations start taking tens of minutes instead of tens of milliseconds.
 
 Really one of the reasons for using a DBMS is to handle all this storage
 management crap for you, no?

Well, sort of, yes, but only when the RDBMS can actually *help*. Storing large
amounts of data is the job of a filesystem, not a database manager. RDBMS
products work when :

*   Each item of data is relatively small
*   That data needs to be broken into pieces, some of which should be individually
identifiable
*   Similar fragments from different items need to be 'related' together

Bulk storage isn't really part of the metaphor.

I'm building a multi-media database which references anything from names and addresses
to 500 Mb video segments. The database stores the smaller items internally (names and
addresses as said, small chunks of text and URLs primarily), and references everything
else via a unique id. The main table relates the id, item type and location, where
for the videos, the location is an external file.

There are two issues in particular, one (as you said above) is the 'how do I delete
something', and the other is 'how do I add a new item'.

Deletion is more than the usual RDBMS delete. I mark a record to be deleted as I don't
want this any more, and an external garbage collector looks for rows with this marker
set and removes any external data, then deletes the row itself. You don't have 
relational
integrity for the whole transaction, but this method handles that well (in my 
experience
anyway) by having the delete process as a multi-stage event (the '(T)' shows the areas
where transactions guarantee atomicity) :

*   mark all associated database entries as to be deleted (T)
*   delete external data referenced by database rows waiting to be deleted
*   delete all related database rows (T)

Insertion involves looking for new files inside a recognised directory tree, the 
structure of which is controlled by either the user or other software modules. Anything
found that the database doesn't currently know about goes into a Waiting table. A human
then tells the system about the newly found file(s). For smaller items where there's no
external data, the standard insert into table values (.) works.

You're quite right about the dangers of huge numbers of files in a directory - most OSs
these days will cope, but few cope well when the number exceeds a couple of thousand.
UNIX visibly degrades at about that point.

Performance is an issue with large blocks of data where ever it comes from. I'd argue 
that these days filesystems are better designed to cope with bulk data transfer than
(most) RDBMSs. Grabbing a half-gig video segment out of any database would put a strain
on the servers ability to respond well to other queries while the transfer is in
progress - deleting it could prove intensive too (depends on what's indexed of course).
After all, database servers also have garbage collection functions even if they are
invisible.

Hope this helps some (or invokes more discussion !).

Regards,


Paul Wilson
Chime Communications


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the 

Re: blob versus file

2002-07-01 Thread Thomas Spahni

Andy,

File Systems are made to store and retrieve files in an efficient way. You
may not expect better performance when you put a database in between. When
speed is an issue you should use MySQL to store and retrieve filenames and
something like ReiserFS, RAID and good hardware to store images.

Thomas Spahni


On Mon, 1 Jul 2002, andy wrote:

 Hi there,
 
 I am wondering if anybody has experiance in saving images to blob in mysql.
 
 I do save images with 1 K and 4 KB to blob fields while I used to save them
 to file. It seams to me that this is much slower accessing the files. The
 images take a bit (really short but absolutly noticable) to show up on the
 site. Is there a way to improve the performance, and why is this happening?
 I thought the performance might even boost after storing them to blobs.
 
 Thank you for any comments on that,
 
 Andy
 
 PS: I was planing to store all the other images (20 - 50 KB) to blobs as
 well (thousands) but after this first experiance I am a bit confused if the
 performance would even more suffer. Would you guys recommend this?
 
 sql, query


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php