It seems that requests for some way to store binary data in the  
database is a perennial request. I've seen comments (I think Adrian  
said it "opens a can of mutated worms."),  but never a real  
discussion of what the problems would be.

There's a recent ticket, #2417, that adds support for "small-ish"  
bits of binary data using a BinaryField. It uses the VARBINARY type  
in MySQL, which is limited to 64k bytes of data. It also subclasses  
CharField, so it will show up in the Admin, which may or may not be a  
good idea.

The main problems I see with dealing with binary types in the  
database involve making sure that you never have too much stuff in  
memory at any time, either while you're loading the file into memory  
or while you're outputting it to the HttpResponse. I solved this in a  
Java webapp by breaking the file into chunks, storing each chunk  
separately in the database, and uploading and downloading only a  
chunk at a time.

Here's what I'm suggesting:

class DatabaseFile(models.Model):
     name = models.CharField(maxlength=80)
     content_type = models.TextField()
     last_modified = models.DateTimeField()
     size = models.IntegerField()
     owner = models.ForeignKey(User, blank=True)

class DatabaseFileChunk(models.Model):
     file = models.ForeignKey(DatabaseFile)
     number = models.IntegerField()
     content = models.BinaryField() #implemented with bytea in  
Postgres and BLOB in MySQL

When a file is uploaded and the developer wants it to go into the  
database, only 64kb of data (or so, but this seems reasonable) are  
read into memory at a time and stuck into a DatabaseFileChunk,  
numbered consecutively from 0 to however many are needed.

A DatabaseFile would be a file-like object, iterable, and would be  
output to an HttpResponse a chunk at a time, so never more than about  
64k of server memory is used to serve the file from the database.

Yes, this will be slower than having Apache serve the file directly,  
but it has the huge advantage that the file is served as the result  
of a view. That means you can do all kinds of interesting permission  
checking, url mapping, and general futzing around internal to Django,  
without having to interact with whichever web server you're using.

Using fairly big BLOBs locally (images of about 750kb), the file was  
served almost instantaneously using Django's development server, so I  
think the performance hit would likely be acceptable to people who  
really want the ability to save files in the database rather than on  
the filesystem. (And I, for one, desperately need that flexibility.  
I'm going to have about 1900 users, and I don't want to have to route  
files to folders, set Apache permissions, etc., when Django has such  
a nice API for handling relations.)

I'm going to try to code this up this afternoon, but please let me  
know if anyone sees huge problems with it.

Thanks,
Todd

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers
-~----------~----~----~----~------~----~------~--~---

Reply via email to