Just throwing in my two cents - Outside of JBoss, with Content Management Systems, the most common problem is how to handle large amounts of data, which is the case here. With databases, you have to scale your server up to handle it. With a Filesystem approach, you can take advantage of HSM - Hierarchial Storage Management - to manage the large amounts of data. You create/add files to your environment, and at some point they reach a state where the file will no longer change (maybe right after creation) and will not be requested frequently (ready to be archived).
Usually you can figure out that in your environment a plan - for instance, after a file has been added it is less-frequently requested after 90 days. At this time, you can burn those files to CD or DVD and put them in a jukebox, migrating to a less expensive and less fast storage environment. After 3 years, store offline on tape. Just an example, your environment will be different. The best approach to this is to have two keys in your database - one that is the unique identifier of the file you are looking for, the the second is the actual pointer to the file. This way, if you HSM and move files to CD/DVD, you're system still maintains integrity of the unique identifier for lookups, but the pointer can change as-needed. You can also archive your files offline on tape and just have a pointer that will bring up a 'quickfind' page of the location, vault #, box #, tape #, and file to find the stored offline file. There are a lot of content management/HSM products already out there, but I personally have not found one within Open Source world that would satisfy my requirements, maybe you'll have better luck and share with us? :-) two cents, -D, CDIA+ ----- Guy Rouillier [mailto:[EMAIL PROTECTED]] wrote: > Just store areference to a location in the filesystem, and keep the > binary files in the filesystem. You can back up your filesystem > as easily as you can back up your Oracle logs. I tried this for a content management system I worked on once. We had terrible problems keeping the database and the filesystem in sync. For one thing, the database is transactional; the filesystem isn't. You can roll back an operation on the database, but not on the filesystem. Therefore, if your application crashes, you've lost all guarantees of referential integrity, which a database by itself can provide. If you have everything in the database, you can back it up by using the database's own replication facilities to create a mirror, without shutting down the application. But if you keep your PDFs in the filesystem, the only way to be sure of making a consistent backup is to shut down your application, because you need to make sure that (a) there are no files that containing data for uncommitted transactions, and (b) all the files for committed transactions have been written. Also, most filesystems are notoriously poor at storing huge numbers of files in a single directory. Even if you store files in subdirectories and sub-sub-directories, you'll be limited by the speed at which your filesystem can traverse directory hierarchies and match filenames. Different filesystems may or may not be optimised for that sort of access. Databases, on the other hand, are designed to quickly store and retrieve items in tables containing millions of rows. > Every time you do a full database backup, you are going to be > backing up that same, **unchanged** 20 GB of PDFs! In most databases, most of the data remains invariant most of the time. So this point applies to most databases, not just the ones that store PDFs. The answer is not to do full database backups; instead, use the database's own replication facility, which is designed to do this job efficiently. Benjamin ------------------------------------------------------- This SF.NET email is sponsored by: FREE SSL Guide from Thawte are you planning your Web Server Security? Click here to get a FREE Thawte SSL guide and find the answers to all your SSL security issues. http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en _______________________________________________ JBoss-user mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/jboss-user
