Oh, how I love a good rumble :)
Jim Starkey wrote:
Jay Pipes wrote:
Jim Starkey wrote:
Ask Bjørn Hansen wrote:
On Nov 9, 2008, at 14:27, Jim Starkey wrote:
Even if it is inconvenient, it is the way to go. Why send a
50MB PDF
to the client if the client isn't going to actually request it?
What's the real use case for that? If you don't want it, just
don't
ask for it - no?
First, the storage engine always has to materialize the blob,
even if
the record is part of an exhaustive scan and is rejected.
The better solution is to not store the blob in the database, IMHO.
Store the metadata about the blob, but not the blob. Use the
filesystem
for what it was intended. Sure, it may make backups slightly more
complex, since you need to back up the filesystem and the database
for
critical data, but this is a minor inconvenience.
That's a terrible idea. It was a terrible idea in 1983 when it was
first raised by the DEC Rdb/VMS guys, and it's a worse idea today.
Comparing anything from today to 1983 is a terrible idea.
The application is talking to a SQL database. It doesn't know or
care
where the database is.
Sure it does. Even if the application only knows "the database is
in a
cloud", it still knows "the database is in a cloud" and not "the
database is on my local filesystem".
How is it going to know where to fetch the
blob?
By the metadata about the blob stored, efficiently, in the database.
How is security supposed to work?
Same way.
How is backup supposed to
work?
By backing up the blobs stored on the filesystem. I.e. by doing what
sysadmins have been doing for years.
How is replication supposed to work?
Replicating blobs is silly IMHO. What purpose does it serve over
putting the blobs on a clustered, distributed, mirrored filesystem
such
as BigTable/GFS/HDFS? Again, why should the database be concerned
about
blobs at all? What is the benefit of storing a blob in a database?
Why are big things different from small things?
Because they're, uhm, big. Cow in fridge sort of thing. Don't make a
bigger fridge or hope for a smaller cow. Just don't put the cow in
the
fridge when it belongs in the pasture.
Why use a database to
keep track of rows when a simple file will suffice?
I don't understand you here. Could you elaborate?
Data is data. All data should be subject to the same availability,
consistency, and durability constraints.
Perhaps this is where we most differ. I don't subscribe to the idea
that all data is equal. In fact, the design of the storage engine
layer
emphasizes this belief: that not all data is the same -- in its
importance or its layout
Yes, big objects needs more intelligent handling because small
inefficiencies get magnified thousands of times over. But this is
not
an argument against big objects, it is an argument for intelligent
handling of big objects.
Exactly! My point is that the most intelligent way to handle big
objects is to not handle them in the db. :)
The MySQL conception of a blob (a pointer embedded in a server
record)
was moronic on the day it was invented. It hasn't gotten any
better in
the intervening years.
As opposed to a blob repository such as in Falcon or the BlobStreaming
engine? Sorry, but I still don't see this as "more intelligent" than
storing the blob on a distributed filesystem.
I think your argument is basically this: We shouldn't improve blob
handling because the original implementation was so moronic as to be
useless for large blobs.
Well, duh.
No, my argument is don't improve blob handling because we're solving a
problem that has already been solved by using a distributed
filesystem.
If Drizzle is "in the clouds", then we should take for granted that
filesystems such as BigTable/GFS and HDFS are the status quo, and thus
the problem is essentially solved and not something we should be
spending time on.
-j
(Incidentally, the DEC guys who argued against blobs where the same
ones
who argued against relational databases. Real men, they believed,
used
CODASYL databases.)
Second, the
program logic may look at non-blob fields to decide whether or
not to
fetch the blob. For example, both a PDF and HTML translation may
be
stored, but not all records have HTML translations. So the program
selects both and decides on a case by case basis.
At least with InnoDB - IIRC - then you don't even get the page
cache
hit from the data then.
And really - in what kind of application is storing that big
blobs a
good idea?[1]
All modern applications has jpegs and pdfs. It's part of the
environment. Some applications even use Word for textual data
entry
because that's what 99% of the world uses. The minimum pdf is
about
50K. The minimum Word document is about 40K. Pdfs produced by
crappola
scanners are more like 500K. Jpegs by modern cameras are 1 - 2MB.
These are bad ideas only if the database is wretched at storing
blobs.
I see your point, but I also don't see the point of prioritizing the
performance of blob storage over other things. IMHO, storing
BLOBs in
the DB is bad architectural design. We shouldn't focus efforts on
optimizing for poor application design.
Your reasoning is both circular and wrong. Using a database system
for
consistent and reliable data storage is an excellent application
design,
one that we need to foster. Deciding that some data is less worthy
than
others makes no sense at all.
Case in point -- an application to import the contents of a cell
phone
into a database. Are you going to argue that the database is OK
for the
phone book, call history, and text messages but photographs have to
be
stored somewhere else? And a somewhere else subject to different
security, backup, and administration policies? And this is somehow
supposed to represent "good" application design?
A single JPEG is worth a thousand words (and a million precious
three-byte binary words).
Grumph!
_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help : https://help.launchpad.net/ListHelp