> Yes, but, this advice does *not* go on to describe how you cope with the
> deletion problem.
> 
> If you store data in records in the database a DELETE will delete *all* the
> data for the set of rows. If some of the data is lying around in disk files
> these obviously don't get deleted by DELETE. So it seems to me that you need
> an image file garbage collection process, which every now and then scans the
> disk, scans the database and deletes disk files which no longer have a
> pointer to them from any reference field in any row in any table in the
> database. And of course as the file store and the database could both be
> changing under your feet whilst you're doing this it seems to be a good idea
> to shut down, or lock, the database server and the processes you've got
> adding disk files whilst you're doing the garbage collection. Doesn't sound
> to clever to me.
> 
> Yes of course you could make your business logic layer clever enough to have
> a go at deleting the files when it deletes the database rows. But as you
> don't have transactions covering database operations *and* file system
> operations things will get out of step sooner or later so you'll still need
> the garbage collector.
> 
> And some hints as to the viability of, say, hundreds of thousands of
> operating system files on the various platforms supported by MySQL would be
> helpful. For example many filing systems will lose *lots* of space through
> fragmentation like this, which one might hope would not happen with blobs.
> And what about the performance implications of lots of files in the same
> directory? - NT4 for example *will* handle half a million files in the same
> directory, 'cos I've tried it, but it's not terribly practical as some
> operations start taking tens of minutes instead of tens of milliseconds.
> 
> Really one of the reasons for using a DBMS is to handle all this storage
> management crap for you, no?

Well, sort of, yes, but only when the RDBMS can actually *help*. Storing large
amounts of data is the job of a filesystem, not a database manager. RDBMS
products work when :

*       Each item of data is relatively small
*       That data needs to be broken into pieces, some of which should be individually
        identifiable
*       Similar fragments from different items need to be 'related' together

Bulk storage isn't really part of the metaphor.

I'm building a multi-media database which references anything from names and addresses
to 500 Mb video segments. The database stores the smaller items internally (names and
addresses as said, small chunks of text and URLs primarily), and references everything
else via a unique id. The main table relates the id, item type and location, where
for the videos, the location is an external file.

There are two issues in particular, one (as you said above) is the 'how do I delete
something', and the other is 'how do I add a new item'.

Deletion is more than the usual RDBMS delete. I mark a record to be deleted as "I don't
want this any more", and an external garbage collector looks for rows with this marker
set and removes any external data, then deletes the row itself. You don't have 
relational
integrity for the whole transaction, but this method handles that well (in my 
experience
anyway) by having the delete process as a multi-stage event (the '(T)' shows the areas
where transactions guarantee atomicity) :

        *       mark all associated database entries as "to be deleted" (T)
        *       delete external data referenced by database rows waiting to be deleted
        *       delete all related database rows (T)

Insertion involves looking for new files inside a recognised directory tree, the 
structure of which is controlled by either the user or other software modules. Anything
found that the database doesn't currently know about goes into a Waiting table. A human
then tells the system about the newly found file(s). For smaller items where there's no
external data, the standard "insert into <table> values (.....)" works.

You're quite right about the dangers of huge numbers of files in a directory - most OSs
these days will cope, but few cope well when the number exceeds a couple of thousand.
UNIX visibly degrades at about that point.

Performance is an issue with large blocks of data where ever it comes from. I'd argue 
that these days filesystems are better designed to cope with bulk data transfer than
(most) RDBMSs. Grabbing a half-gig video segment out of any database would put a strain
on the servers ability to respond well to other queries while the transfer is in
progress - deleting it could prove intensive too (depends on what's indexed of course).
After all, database servers also have garbage collection functions even if they are
invisible.

Hope this helps some (or invokes more discussion !).

Regards,


Paul Wilson
Chime Communications


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to