Re: [jug-discussion] storing blobs on file system or in db

2005-03-16 Thread Jeffrey Peacock





Taking into account all the responses I've seen so far (down to J. D.
Mitchell) there is
relatively little consideration being given to the transactional issues.

I suggest that before you settle on a solution where performance is the
priority
you need to examine the issue of "What is the correct behavior?" where
"correct
behavior" in this case (i.e., where DB's are concerned) is to avoid
inconsistencies.
And of course it's corollary, which is "What is the cost of incorrect
behavior?"

Putting BLOB assets into the same DB as the data to which it is
associated gives you
the simplest implementation of "correct behavior". From past
experience with Oracle
I know that good performance with BLOB assets can be achieved. I can't
speak
specifically to other DB's, but historically, the performance problems
started with
not having enough control over how table spaces were allocated and
managed as well
as the general failure of the vendor to do a good BLOB support feature.

I think it is a given that BLOB assets are always associated with other
data elements.
Putting BLOB assets onto the file system is really the splitting of the
data into two DB's
-- BLOBS on the file-system and other, conventional records, in the
primary DB.
Immediately this presents transactional problems. Without getting into
every specific
case let me generalize some of the issues:
Each file upload to the file-system has to be in the same
transactional scope as the
associated transaction with the primary DB. Upload failures (successes
are easy)
in all forms -- dropped connection, system failure, etc. --- need to be
managed in a
manner which includes rollback and cleanup on the file-system as well
as rollback
of the transaction with the primary DB.
  
Furthermore, operations on the primary DB, like backups, need to be in
lock-step with
operations with the file-system DB. One un-informed sysadmin that does
a DB backup
without a lock-step backup of the file-system assets, and then there is
a subsequent disk
failure, will ruin your whole day (probably month, prepare to give up
you life for
some time.)
  
Then, as already mentioned, the burden of clustering (and replication)
falls to you to
implement. One solution than has been presented is a clustered
file-system or network
file-system. The issue here is that any file-system that is not on the
local disk puts BLOB
assets back into play being slung around the networkwith all the same
performance
problems you were trying to get away from in the first place.

Having said all that, if I had my druthers, I would put BLOB assets
into the primary DB.
This solves all my correctness issues and easily keeps me in the game
with respect to
DB clustering, replication and backups. I would deal with the
performance issues by

  ensuring that I am designing/configuring my DB BLOB support as
efficiently
as possible. (I suggest that the reputation of BLOB support in DB's
suffers from
early problems and many people have not gone back to do the due
diligence to
see if the reputation is still warranted.)

  
  implement caching on the Apache/Tomcat server side to allow
Apache to do it's thing.
Caching to the local disk, even with the event mechanism to handle an
update to the DB that
was initiated on a different system, is easier to implement and prove
than maintaining
correctness in the same configuration. Incorrect caching means you may
serve an
old document. You can solve this in seconds by flushing the cache and
still be out the
door in time for Happy Hour. An inconsistent DB means you don't even
have the correct
document to begin with. Solving this, at the point at which you
discover it, will be
extremely difficult (that's the best case) if not impossible.

One final solution I would consider is to see if my DB would allow me
to "slice" my data.
This could take a couple of different forms but the gist of it would be
that the BLOB
table spaces would be on the local disk/system with Apache/Tomcat and
the other
"conventional" data on the DB server. Perhaps the local disk is holding
only the replication of the BLOB data? This particular analysis may
not bear great
fruit but it would be worth not leaving that stone unturned.

Just an opinion.

-J



Andrew Huntwork wrote:
I'm
writing this web app that allows users to upload documents, such as
word docs, images, etc, and then to download those documents again on
request. the documents are not searched, interpretted, processed,
version controlled, or anything else. just upload and download. i
wonder if there's a general rule on whether one should stick such
things into a db or onto the file system.
  
  
i currently favor sticking them in the db. putting them on the fs
seems to interfere with clustering (different files would be on
different filesystems). it's also another thing to back up and
generally maintain. on the other hand putting them in the db puts
extra load on the db and the network. there are a bunch of other
issues too.
  
  
Any ideas? Thanks for 

Re: [jug-discussion] Off-topic: UNIX usage at local companies?

2004-09-01 Thread Jeffrey Peacock





How do these kids expect to live without Unix!? What foolishness have
they
been taught before they get to your class!? That such a question is
even asked
by the youth of America is more scary than any outcome on Nov. 2nd!!
If Java is the coffee, then Unix is the water, man! (Somehow ending
that
sentence with 'man' just seemed right for that edgy Dennis Hopper,
'Easy Rider'
hysteria I was going for ;-)

Here are a few I know up here in Hell's barren acre:

  http://www.waz-tempe.com
  http://www.foreverliving.com
  http://www.ddcaz.org
  http://www/phx-jug.org # Not exactly a company but still
  http://www.wonkbrew.com
  http://www.etq.com # in Tucson


With Sun and IBM embracing Linux, and webapps everywhere you turn, I
suggest
that the real question is "Who does not use Unix?", especially for the
server.
That has to be a smaller population. Also, remember that Mac OS X is
also Unix,
many embedded devices of all sizes use Linux, and even devices that are
small to
very small use RTOS's that have Unix-like feel to them.

-J




William H. Mitchell wrote:

  I'm teaching CS 352 (Systems Programming and UNIX) at UA this fall.  Today
a student asked me what local companies I know of that are using UNIX.  I
know of a few examples but I'd like to be able to cite some more.

If you know of a local company that's making use of some flavor of UNIX,
I'd surely appreciate it if you'd drop me a quick note.  Just a company
name would be fine but if you have time to add a sentence or two of
details, that'd be wonderful.

I'm sure this off-topic question is of little interest to a lot of folks so
please reply directly to me.  I'll make the compilation available to
anybody who's interested.

Thanks!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


  





Re: [jug-discussion] Java File I/O performance

2004-07-21 Thread Jeffrey Peacock
I'd like to run your tests on my systems too.  I have a similar
set of tests but they test straight read/write channel I/O
as compared to I/OStreams.  You're welcome to it.
Jeffrey Peacock
[EMAIL PROTECTED]
Eddie Dimond wrote:
Randy,
I have a Redhat 9 Linux machine with a 1.8 Ghz P4 that I could run 
your tests.

Eddie Dimond
Daniel Casey wrote:
Randy
I have a Linux box that I should be able to run your program on if 
you want to send it to me.

Daniel
Randolph Kahle wrote:
I have been investigating the performance of Java IO
on different operating systems. The results are so
different that I am concerned I am making some sort of
fundamental mistake in my test.
I would appreciate help in determining if my results
are correct.
The test is simple. I am creating a rather large file
using stream IO and random access IO by writing a 4k
buffer in a loop.
On my Mac OS X box (G4 1Ghz, 4200 RPM disk) I get
Block Size [4096]
Loop Size [4096]
Standard Stream IO[1378]
Standard Random IO RWD[1333]
Standard Random IO RWS[2103]
Block Size [4096]
Loop Size [65536]
Standard Stream IO[18595]
Standard Random IO RWD[20152]
Standard Random IO RWS[19827]
On my Linux box (700 Mhz PIII, 7200 RPM IBM Drive
running Gentoo Linux with EXT3 file system and
hdparm tuning done.)
Block Size [4096]
Loop Size [4096]
Standard Stream IO[194]  -- low probably because of the HD cache
Standard Random IO RWD[111689]
Standard Random IO RWS[108569]
Block Size [4096]
Loop Size [65536]
Standard Stream IO[61573]
Standard Random IO RWD[1760954]
Standard Random IO RWS[1779748]
I expected the Linux box to have better IO performance
because of the differences in the disk specifications.
(And I see this with the stream IO test). However, the
RandomAccess performance on Linux (Sun Java 1.4.1_05)
is horrible compared to the FileOutputStream performance.
Notice that the ratio of stream IO is about 3x between
Mac and Linux (Linux is 3x slower) but with random IO
Linux is more than 80x slower than on the Macintosh.
Something must be wrong with the Sun JDK 1.4.1_05
implementation on Linux. I should not take 29 minutes
to write a 256 Meg file to disk!
If anyone would like to run the program, let me
know and I'll send it to you via email.
I would be very interested in seeing results on other
Linux configurations and on Windows.
Regards,
Randy





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]