Taking into account all the responses I've seen so far (down to J. D. Mitchell) there is
relatively little consideration being given to the transactional issues.

I suggest that before you settle on a solution where performance is the priority
you need to examine the  issue of "What is the correct behavior?" where "correct
behavior" in this case (i.e., where DB's are concerned) is to avoid inconsistencies.
And of course it's corollary, which is "What is the cost of incorrect behavior?"

Putting BLOB assets into the same DB as the data to which it is associated gives you
the simplest implementation of "correct behavior".   From past experience with Oracle
I know that good performance with BLOB assets can be achieved.  I can't speak
specifically to other DB's, but historically, the performance problems started with
not having enough control over how table spaces were allocated and managed as well
as the general failure of the vendor to do a good BLOB support feature.

I think it is a given that BLOB assets are always associated with other data elements.
Putting BLOB assets onto the file system is really the splitting of the data into two DB's
-- BLOBS on the file-system and other, conventional records, in the primary DB.
Immediately this presents transactional problems.  Without getting into every specific
case let me generalize some of the issues:
Each file upload to the file-system has to be in the same transactional scope as the
associated transaction with the primary DB.  Upload failures (successes are easy)
in all forms -- dropped connection, system failure, etc. --- need to be managed in a
manner which includes rollback and cleanup on the file-system as well as rollback
of the transaction with the primary DB.

Furthermore, operations on the primary DB, like backups, need to be in lock-step with
operations with the file-system DB.  One un-informed sysadmin that does a DB backup
without a lock-step backup of the file-system assets, and then there is a subsequent disk
failure, will ruin your whole day (probably month, prepare to give up you life for
some time.)

Then, as already mentioned, the burden of clustering (and replication) falls to you to
implement.  One solution than has been presented is a clustered file-system or network
file-system.  The issue here is that any file-system that is not on the local disk puts BLOB
assets back into play being slung around the networkwith all the same performance
problems you were trying to get away from in the first place.
Having said all that, if I had my druthers, I would put BLOB assets into the primary DB.
This solves all my correctness issues and easily keeps me in the game with respect to
DB clustering, replication and backups.  I would deal with the performance issues by
  1. ensuring that I am designing/configuring my DB BLOB support as efficiently
    as possible.  (I suggest that the reputation of BLOB support in DB's suffers from
    early problems and many people have not gone back to do the due diligence to
    see if the reputation is still warranted.)

  2. implement caching on the Apache/Tomcat server side to allow Apache to do it's thing.
    Caching to the local disk, even with the event mechanism to handle an update to the DB that
    was initiated on a different system, is easier to implement and prove than maintaining
    correctness in the same configuration.  Incorrect caching means you may serve an
    old document.  You can solve this in seconds by flushing the cache and still be out the
    door in time for Happy Hour.  An inconsistent DB means you don't even have the correct
    document to begin with.  Solving this, at the point at which you discover it, will be
    extremely difficult (that's the best case) if not impossible.
One final solution I would consider is to see if my DB would allow me to "slice" my data.
This could take a couple of different forms but the gist of it would be that the BLOB
table spaces would be on the local disk/system with Apache/Tomcat and the other
"conventional" data on the DB server.  Perhaps the local disk is holding
only the replication of the BLOB data?  This particular analysis may not bear great
fruit but it would be worth not leaving that stone unturned.

Just an opinion.

-J



Andrew Huntwork wrote:
I'm writing this web app that allows users to upload documents, such as word docs, images, etc, and then to download those documents again on request.  the documents are not searched, interpretted, processed, version controlled, or anything else.  just upload and download.  i wonder if there's a general rule on whether one should stick such things into a db or onto the file system.

i currently favor sticking them in the db.  putting them on the fs seems to interfere with clustering (different files would be on different filesystems).  it's also another thing to back up and generally maintain.  on the other hand putting them in the db puts extra load on the db and the network.  there are a bunch of other issues too.

Any ideas?  Thanks for any help.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to