Just throwing in my two cents -

Outside of JBoss, with Content Management Systems, the most common problem is how to 
handle large amounts of data, which is the case here.  With databases, you have to 
scale your server up to handle it.  With a Filesystem approach, you can take advantage 
of HSM - Hierarchial Storage Management - to manage the large amounts of data.  You 
create/add files to your environment, and at some point they reach a state where the 
file will no longer change (maybe right after creation) and will not be requested 
frequently (ready to be archived).

Usually you can figure out that in your environment a plan - for instance, after a 
file has been added it is less-frequently requested after 90 days.  At this time, you 
can burn those files to CD or DVD and put them in a jukebox, migrating to a less 
expensive and less fast storage environment. After 3 years, store offline on tape. 
Just an example, your environment will be different.

The best approach to this is to have two keys in your database - one that is the 
unique identifier of the file you are looking for, the the second is the actual 
pointer to the file.  This way, if you HSM and move files to CD/DVD, you're system 
still maintains integrity of the unique identifier for lookups, but the pointer can 
change as-needed.  You can also archive your files offline on tape and just have a 
pointer that will bring up a 'quickfind' page of the location, vault #, box #, tape #, 
and file to find the stored offline file.

There are a lot of content management/HSM products already out there, but I personally 
have not found one within Open Source world that would satisfy my requirements, maybe 
you'll have better luck and share with us? :-)

two cents,
-D, CDIA+ 

-----

Guy Rouillier [mailto:[EMAIL PROTECTED]] wrote:
> Just store areference to a location in the filesystem, and keep the 
> binary files in the filesystem.  You can back up your filesystem
> as easily as you can back up your Oracle logs.

I tried this for a content management system I worked on once.  We had
terrible problems keeping the database and the filesystem in sync.  For
one thing, the database is transactional; the filesystem isn't.  You can
roll back an operation on the database, but not on the filesystem.
Therefore, if your application crashes, you've lost all guarantees of
referential integrity, which a database by itself can provide.

If you have everything in the database, you can back it up by using the
database's own replication facilities to create a mirror, without
shutting down the application.  But if you keep your PDFs in the
filesystem, the only way to be sure of making a consistent backup is to
shut down your application, because you need to make sure that (a) there
are no files that containing data for uncommitted transactions, and (b)
all the files for committed transactions have been written.

Also, most filesystems are notoriously poor at storing huge numbers of
files in a single directory.  Even if you store files in subdirectories
and sub-sub-directories, you'll be limited by the speed at which your
filesystem can traverse directory hierarchies and match filenames.
Different filesystems may or may not be optimised for that sort of
access.  Databases, on the other hand, are designed to quickly store and
retrieve items in tables containing millions of rows.

> Every time you do a full database backup, you are going to be 
> backing up that same, **unchanged** 20 GB of PDFs!

In most databases, most of the data remains invariant most of the time.
So this point applies to most databases, not just the ones that store
PDFs.  The answer is not to do full database backups; instead, use the
database's own replication facility, which is designed to do this job
efficiently.

Benjamin


-------------------------------------------------------
This SF.NET email is sponsored by: FREE  SSL Guide from Thawte
are you planning your Web Server Security? Click here to get a FREE
Thawte SSL guide and find the answers to all your  SSL security issues.
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en
_______________________________________________
JBoss-user mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/jboss-user

Reply via email to