Re: [Drizzle-discuss] New Protocol Draft

Paul McCullagh Tue, 11 Nov 2008 09:42:27 -0800

Hi Jay,

Let's assume for the moment that the BLOB implementation in MySQL was,well, "as it should be".

Which would mean there are no adverse affects to storing BLOBs in atable, the application can handle the large objects without anyspecial considerations, and the whole thing scales.


In other words:

- The programmer does not have to make special arrangements to handleBLOBs.- The DBA does not have to make special arrangements to get aconsistent backup or replicate the data.


If this was the case:

Would you change your mind about whether storing BLOBs in the databaseis a bad design decision?

So I think Jim's point is this. Yes, putting a cow in a (normal)fridge does not work.

But *not* because putting a cow in a fridge is a bad design idea. Butrather because the average fridge is not correctly designed for this.

On the other hand, ask a butcher and he will show his fridge which isdesigned for this purpose, and explain the utility of it :)

I agree that Drizzle should focus core development elsewhere. But weare many in the community... so we can do this as well. And theadditional functionality can be made pluggable.


-Paul

On Nov 11, 2008, at 5:56 PM, Jay Pipes wrote:

Oh, how I love a good rumble :)

Jim Starkey wrote:
Jay Pipes wrote:
Jim Starkey wrote:
Ask Bjørn Hansen wrote:
On Nov 9, 2008, at 14:27, Jim Starkey wrote:
Even if it is inconvenient, it is the way to go. Why send a50MB PDF
to the client if the client isn't going to actually request it?
What's the real use case for that? If you don't want it, justdon't
ask for it - no?
First, the storage engine always has to materialize the blob,even if
the record is part of an exhaustive scan and is rejected.
The better solution is to not store the blob in the database, IMHO.
Store the metadata about the blob, but not the blob. Use thefilesystem
for what it was intended.  Sure, it may make backups slightly more
complex, since you need to back up the filesystem and the databasefor
critical data, but this is a minor inconvenience.
That's a terrible idea.  It was a terrible idea  in 1983 when it was
first raised by the DEC Rdb/VMS guys, and it's a worse idea today.
Comparing anything from today to 1983 is a terrible idea.
The application is talking to a SQL database. It doesn't know orcare
where the database is.
Sure it does. Even if the application only knows "the database isin a
cloud", it still knows "the database is in a cloud" and not "the
database is on my local filesystem".
How is it going to know where to fetch the
blob?
By the metadata about the blob stored, efficiently, in the database.
How is security supposed to work?
Same way.
How is backup supposed to
work?
By backing up the blobs stored on the filesystem.  I.e. by doing what
sysadmins have been doing for years.
How is replication supposed to work?
Replicating blobs is silly IMHO.  What purpose does it serve over
putting the blobs on a clustered, distributed, mirrored filesystemsuchas BigTable/GFS/HDFS? Again, why should the database be concernedabout
blobs at all?  What is the benefit of storing a blob in a database?
Why are big things different from small things?
Because they're, uhm, big.  Cow in fridge sort of thing.  Don't make a
bigger fridge or hope for a smaller cow. Just don't put the cow inthe
fridge when it belongs in the pasture.
Why use a database to
keep track of rows when a simple file will suffice?
I don't understand you here.  Could you elaborate?
Data is data.  All data should be subject to the same availability,
consistency, and durability constraints.
Perhaps this is where we most differ.  I don't subscribe to the idea
that all data is equal. In fact, the design of the storage enginelayer
emphasizes this belief: that not all data is the same -- in its
importance or its layout
Yes, big objects needs more intelligent handling because small
inefficiencies get magnified thousands of times over. But this isnot
an argument against big objects, it is an argument for intelligent
handling of big objects.
Exactly!  My point is that the most intelligent way to handle big
objects is to not handle them in the db. :)
The MySQL conception of a blob (a pointer embedded in a serverrecord)was moronic on the day it was invented. It hasn't gotten anybetter in
the intervening years.
As opposed to a blob repository such as in Falcon or the BlobStreaming
engine?  Sorry, but I still don't see this as "more intelligent" than
storing the blob on a distributed filesystem.
I think your argument is basically this:  We shouldn't improve blob
handling because the original implementation was so moronic as to be
useless for large blobs.

Well, duh.
No, my argument is don't improve blob handling because we're solving a
problem that has already been solved by using a distributedfilesystem.
If Drizzle is "in the clouds", then we should take for granted that
filesystems such as BigTable/GFS and HDFS are the status quo, and thus
the problem is essentially solved and not something we should be
spending time on.

-j
(Incidentally, the DEC guys who argued against blobs where the sameoneswho argued against relational databases. Real men, they believed,used
CODASYL databases.)
Second, the
program logic may look at non-blob fields to decide whether ornot tofetch the blob. For example, both a PDF and HTML translation maybe
stored, but not all records have HTML translations.  So the program
selects both and decides on a case by case basis.
At least with InnoDB - IIRC - then you don't even get the pagecache
hit from the data then.
And really - in what kind of application is storing that bigblobs a
good idea?[1]
All modern applications has jpegs and pdfs.  It's part of the
environment. Some applications even use Word for textual dataentrybecause that's what 99% of the world uses. The minimum pdf isabout50K. The minimum Word document is about 40K. Pdfs produced bycrappola
scanners are more like 500K.  Jpegs by modern cameras are 1 - 2MB.
These are bad ideas only if the database is wretched at storingblobs.
I see your point, but I also don't see the point of prioritizing the
performance of blob storage over other things. IMHO, storingBLOBs in
the DB is bad architectural design.  We shouldn't focus efforts on
optimizing for poor application design.
Your reasoning is both circular and wrong. Using a database systemforconsistent and reliable data storage is an excellent applicationdesign,one that we need to foster. Deciding that some data is less worthythan
others makes no sense at all.
Case in point -- an application to import the contents of a cellphoneinto a database. Are you going to argue that the database is OKfor thephone book, call history, and text messages but photographs have tobe
stored somewhere else?  And a somewhere else subject to different
security, backup, and administration policies?  And this is somehow
supposed to represent "good" application design?

A single JPEG is worth a thousand words (and a million precious
three-byte binary words).

Grumph!
_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp




--
Paul McCullagh
PrimeBase Technologies
www.primebase.org
www.blobstreaming.org
pbxt.blogspot.com




_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Drizzle-discuss] New Protocol Draft

Reply via email to