Hi Jay,

Let's assume for the moment that the BLOB implementation in MySQL was, well, "as it should be".

Which would mean there are no adverse affects to storing BLOBs in a table, the application can handle the large objects without any special considerations, and the whole thing scales.

In other words:

- The programmer does not have to make special arrangements to handle BLOBs. - The DBA does not have to make special arrangements to get a consistent backup or replicate the data.

If this was the case:

Would you change your mind about whether storing BLOBs in the database is a bad design decision?

So I think Jim's point is this. Yes, putting a cow in a (normal) fridge does not work.

But *not* because putting a cow in a fridge is a bad design idea. But rather because the average fridge is not correctly designed for this.

On the other hand, ask a butcher and he will show his fridge which is designed for this purpose, and explain the utility of it :)

I agree that Drizzle should focus core development elsewhere. But we are many in the community... so we can do this as well. And the additional functionality can be made pluggable.

-Paul

On Nov 11, 2008, at 5:56 PM, Jay Pipes wrote:

Oh, how I love a good rumble :)

Jim Starkey wrote:
Jay Pipes wrote:
Jim Starkey wrote:

Ask Bjørn Hansen wrote:

On Nov 9, 2008, at 14:27, Jim Starkey wrote:


Even if it is inconvenient, it is the way to go. Why send a 50MB PDF
to the client if the client isn't going to actually request it?

What's the real use case for that? If you don't want it, just don't
ask for it - no?



First, the storage engine always has to materialize the blob, even if
the record is part of an exhaustive scan and is rejected.

The better solution is to not store the blob in the database, IMHO.
Store the metadata about the blob, but not the blob. Use the filesystem
for what it was intended.  Sure, it may make backups slightly more
complex, since you need to back up the filesystem and the database for
critical data, but this is a minor inconvenience.

That's a terrible idea.  It was a terrible idea  in 1983 when it was
first raised by the DEC Rdb/VMS guys, and it's a worse idea today.

Comparing anything from today to 1983 is a terrible idea.

The application is talking to a SQL database. It doesn't know or care
where the database is.

Sure it does. Even if the application only knows "the database is in a
cloud", it still knows "the database is in a cloud" and not "the
database is on my local filesystem".

How is it going to know where to fetch the
blob?

By the metadata about the blob stored, efficiently, in the database.

How is security supposed to work?

Same way.

How is backup supposed to
work?

By backing up the blobs stored on the filesystem.  I.e. by doing what
sysadmins have been doing for years.

How is replication supposed to work?

Replicating blobs is silly IMHO.  What purpose does it serve over
putting the blobs on a clustered, distributed, mirrored filesystem such as BigTable/GFS/HDFS? Again, why should the database be concerned about
blobs at all?  What is the benefit of storing a blob in a database?

Why are big things different from small things?

Because they're, uhm, big.  Cow in fridge sort of thing.  Don't make a
bigger fridge or hope for a smaller cow. Just don't put the cow in the
fridge when it belongs in the pasture.

Why use a database to
keep track of rows when a simple file will suffice?

I don't understand you here.  Could you elaborate?

Data is data.  All data should be subject to the same availability,
consistency, and durability constraints.

Perhaps this is where we most differ.  I don't subscribe to the idea
that all data is equal. In fact, the design of the storage engine layer
emphasizes this belief: that not all data is the same -- in its
importance or its layout

Yes, big objects needs more intelligent handling because small
inefficiencies get magnified thousands of times over. But this is not
an argument against big objects, it is an argument for intelligent
handling of big objects.

Exactly!  My point is that the most intelligent way to handle big
objects is to not handle them in the db. :)

The MySQL conception of a blob (a pointer embedded in a server record) was moronic on the day it was invented. It hasn't gotten any better in
the intervening years.

As opposed to a blob repository such as in Falcon or the BlobStreaming
engine?  Sorry, but I still don't see this as "more intelligent" than
storing the blob on a distributed filesystem.

I think your argument is basically this:  We shouldn't improve blob
handling because the original implementation was so moronic as to be
useless for large blobs.

Well, duh.

No, my argument is don't improve blob handling because we're solving a
problem that has already been solved by using a distributed filesystem.
If Drizzle is "in the clouds", then we should take for granted that
filesystems such as BigTable/GFS and HDFS are the status quo, and thus
the problem is essentially solved and not something we should be
spending time on.

-j

(Incidentally, the DEC guys who argued against blobs where the same ones who argued against relational databases. Real men, they believed, used
CODASYL databases.)

Second, the
program logic may look at non-blob fields to decide whether or not to fetch the blob. For example, both a PDF and HTML translation may be
stored, but not all records have HTML translations.  So the program
selects both and decides on a case by case basis.

At least with InnoDB - IIRC - then you don't even get the page cache
hit from the data then.

And really - in what kind of application is storing that big blobs a
good idea?[1]

All modern applications has jpegs and pdfs.  It's part of the
environment. Some applications even use Word for textual data entry because that's what 99% of the world uses. The minimum pdf is about 50K. The minimum Word document is about 40K. Pdfs produced by crappola
scanners are more like 500K.  Jpegs by modern cameras are 1 - 2MB.

These are bad ideas only if the database is wretched at storing blobs.


I see your point, but I also don't see the point of prioritizing the
performance of blob storage over other things. IMHO, storing BLOBs in
the DB is bad architectural design.  We shouldn't focus efforts on
optimizing for poor application design.

Your reasoning is both circular and wrong. Using a database system for consistent and reliable data storage is an excellent application design, one that we need to foster. Deciding that some data is less worthy than
others makes no sense at all.

Case in point -- an application to import the contents of a cell phone into a database. Are you going to argue that the database is OK for the phone book, call history, and text messages but photographs have to be
stored somewhere else?  And a somewhere else subject to different
security, backup, and administration policies?  And this is somehow
supposed to represent "good" application design?

A single JPEG is worth a thousand words (and a million precious
three-byte binary words).

Grumph!




_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp



--
Paul McCullagh
PrimeBase Technologies
www.primebase.org
www.blobstreaming.org
pbxt.blogspot.com




_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Reply via email to