Re: Any way to halt big file uploads?

Todd O'Bryan Sat, 05 Aug 2006 17:29:58 -0700

On Aug 5, 2006, at 5:25 PM, Ivan Sagalaev wrote:

>
> Todd O'Bryan wrote:
>> Would it be better to expose the file-like object that comes with a
>> file upload, rather than reading the file's whole content into memory
>> (or into the server's file system, if the patch gets checked in)?
>> It's easy to retain backward compatibility by just having a call to
>> FILES['file_upload']['content'] simply call FILES['file_upload']
>> ['file_like_object'].read(), but a developer could, instead, decide
>> how large a file they're willing to allow someone to upload, upload
>> that many bytes, and then raise an exception if the file is bigger,
>> rather than waiting until the whole file is uploaded.
>
> This is not really possible. An input stream contains MIMEd data that
> can contain many files. So to have a _full_ request.FILES you have to
> read it entirely.
>
> In fact you don't want to cancel an upload looking at _files_, you  
> want
> to look at the size of a raw stream. This idea is covered (nicely, I
> believe :-) ) in my proposal in that ticket:
> http://code.djangoproject.com/ticket/2070#change_34


You are, of course, correct. One of the problems with a nice, object- 
oriented API is that it encourages you to forget that requests come  
in as long streams of bytes.

But I'm not completely getting your idea.

You're suggesting that the developer specify how to handle file  
uploads in some kind of middleware class. Presumably, there would be  
a process_upload() method, which would determine what happens each  
time you upload a file. Is this correct?

The only problem I see with this is that, depending on the context,  
you might want to do different things. If I want to store some  
uploaded files in the database, that's great, but what if I expect  
some to be short text files and don't mind keeping them in memory to  
process. (I teach programming in a high school, and can imagine  
having my students upload a code file, doing some quick checks on it,  
and sending it right back with comments.) And then there's the issue  
of Oyvind's progress monitor, which might only be appropriate for  
files that are expected to be large.

The problem with flexibly handling these things, as you've pointed  
out, is that once you're in a view, the entire stream has already  
been read, packaged in an HttpRequest object, and at that point it's  
too late to do anything except deal with what you've got.

Here's an idea that may be a little less jarring, in that it's not  
very different from what happens now, would not require changing the  
middleware, and doesn't make as many changes to the core:

Requests that only have GET or POST data, that is, those whose  
content-type doesn't start with 'multi-part', will work just as they  
do now.

Requests that have files currently call django.http.parse_upload_file 
(header_dict, post_data) and both modpython and wsgi have perfectly  
good file-like objects that they read() into memory and then pass as  
the post_data. Rather than reading the data first, both could pass  
the file-like object and rely on parse_upload_file() to read it.

parse_upload_file() is only called by subclasses of HttpRequest, and  
is used by both to set a couple of instance variables. Sounds like it  
should be an instance method of the HttpRequest class to me. So, now  
we have parse_file_upload() being called inside an HttpRequest with a  
file-like object that hasn't been read into memory yet.

Here's the trick. Before you use the POST or FILES attribute of the  
request, set a file_handler attribute, which is a dictionary of  
expected file upload fields and instances to handle them. Each  
instance has an upload() method that takes a FieldStorage object and

Here's how it would look:

def some_view_with_upload(request):
     request.file_handler = { 'small_file':FileInMemory(max_size=20 *  
1024),
                         'medium_file':FileInDatabase(),
                         'big_file':FileOnFileSystem(path='blah/blah') }
     # when we call request.POST or request.FILES, parse_file_upload  
will be called, and the files will be uploaded
     s = request.FILES['small_file'] # a file-like object stored in  
memory
     m = request.FILES['medium_file'] # a file-like object in the db

Obviously, if no file_handler is set, it would just default to the in- 
memory handling which currently happens. Also, in addition to having  
file-like methods, all the file_handler objects would be expected to  
have 'filename', 'content', and 'content-type' dictionary items to be  
compatible with current practice.

So, flames anyone?

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers
-~----------~----~----~----~------~----~------~--~---

Re: Any way to halt big file uploads?

Reply via email to