Re: Call for comments: CommonMiddleware (and others) break streaming HttpResponse objects.
On Jul 1, 1:02 pm, Ivan Sagalaev <[EMAIL PROTECTED]> wrote: > I've implemented iterable HttpResponse > in the first place out of purely practical reason: web server times out > if a particularly long file was passed into an HttpResponse. And also it > was really slow and was consuming all the memory. Ivan, does this imply that iteration does not work out of the box, i.e. ordinary HttpResponse __iter__() and next() are not used in the expected way? Or were you just using some middleware that consumed the response and customization was required because of that? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Call for comments: CommonMiddleware (and others) break streaming HttpResponse objects.
On Sep 23, 3:54 am, mrts <[EMAIL PROTECTED]> wrote: > +1 for adding a way to say "process_response should never be called on > this response". > > Taking a quick look at the source, HttpResponse seems to support > iteration already: > > def __iter__(self): > self._iterator = iter(self._container) > return self > > def next(self): > chunk = self._iterator.next() > if isinstance(chunk, unicode): > chunk = chunk.encode(self._charset) > return str(chunk) > > I'd expect this is actually used in returning content, i.e. if no > middleware kicks in, passing an iterator to HttpResponse should just > work for streaming? I believe that's correct. However, I'm still not so sure about stopping `process_response` from running for *all* middleware. As mentioned above, some middleware can work and are useful with a streaming response. How about these two ideas: 1) Add a `skipped_middleware` attribute to HttpResponse which is empty by default, and when looping through and executing middleware, skip those defined in there. That way the developer can either subclass `HttpResponse` or simply modify an `HttpResponse` instance before returning and specify exactly which middleware should be skipped? 2) Create an `HttpStreamingResponse` class and add a `can_stream` attribute to each middleware which defaults to `False`. When you need streaming functionality use the provided `HttpStreamingResponse` class which will automatically skip any middleware for which `can_stream` is `False`. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Call for comments: CommonMiddleware (and others) break streaming HttpResponse objects.
On Jul 4, 2:26 am, Tai Lee <[EMAIL PROTECTED]> wrote: > > The only thing that might be worth doing in this are is adding a way to > > say "middleware should never be called on this response" and then > > somebody can write their own HttpResponse subclass and be in complete > > control of their destiny. > > Would this disable ALL middleware from running? Or only the > process_response methods on middleware? Either way, this is not ideal > as there are useful middleware that CAN still work with on-demand > content (e.g. the updated GZipMiddleware from the patch). +1 for adding a way to say "process_response should never be called on this response". Taking a quick look at the source, HttpResponse seems to support iteration already: def __iter__(self): self._iterator = iter(self._container) return self def next(self): chunk = self._iterator.next() if isinstance(chunk, unicode): chunk = chunk.encode(self._charset) return str(chunk) I'd expect this is actually used in returning content, i.e. if no middleware kicks in, passing an iterator to HttpResponse should just work for streaming? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Call for comments: CommonMiddleware (and others) break streaming HttpResponse objects.
> The only thing that might be worth doing in this are is adding a way to > say "middleware should never be called on this response" and then > somebody can write their own HttpResponse subclass and be in complete > control of their destiny. Would this disable ALL middleware from running? Or only the process_response methods on middleware? Either way, this is not ideal as there are useful middleware that CAN still work with on-demand content (e.g. the updated GZipMiddleware from the patch). --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Call for comments: CommonMiddleware (and others) break streaming HttpResponse objects.
On Jul 1, 7:15 pm, Malcolm Tredinnick <[EMAIL PROTECTED]> wrote: > It's quite a large restriction to say that no middleware should ever try > to examine the contents of the HttpResponse since it might be an > iterator that shouldn't be consumed. You're proposing a bunch of > specific changes for the middleware we have now to work around that, but > you're also imposing a huge functionality constraint on any future > middleware (all middleware will have to work with the possibility that a > non-consumable generator is there). I don't think that restriction true, and even if it were it wouldn't be a significant change from the present functionality, or your suggested functionality of consuming generators immediately. If someone passes a "non-consumable" generator as content (either to be consumed immediately or by 3rd party middleware), or simply had potentially long running code in their views, the end result is the same. A timeout in the browser. Supporting streaming HttpResponse objects at least gives developers a *chance* to avoid these timeouts. 3rd party middleware can continue to access HttpResponse.content directly and consume generators if they like. There doesn't have to be a requirement that ALL middleware fully support streaming HttpResponse. Clearly some middleware which need to manipulate or access the entire content as a string cannot be streamed. But the core Django middleware shouldn't consume content generators unnecessarily and 3rd party middleware developers should have the *option* of allowing content generators to go unmolested. All that's required is a public attribute on HttpResponse that middleware developers can access if they want to be aware of content generators, to fix code Django middleware to be aware of this where possible and where appropriate, and leave the default as it is - for HttpResponse.content to consume content generators. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Call for comments: CommonMiddleware (and others) break streaming HttpResponse objects.
Certainly the implementation of the patch is not ideal (by accessing private attributes on HttpResponse), which is why I brought it up for discussion before going any further. I do feel that there is a real benefit and even requirement in some cases to supporting streaming HttpResponse objects. If there was no streaming output, why would we accept generators (or anything other than string) as the content to HttpResponse at all? It would be easier to simply require content to be passed in as text and leave it to the developer to consume any generator before instantiating a HttpResponse object. If there was no streaming output, I don't see a good alternative when there's a need to return a large amount of dynamic content (e.g. CSV export) in response to an HTTP request. Even if users were to subclass HttpResponse on their own, won't they run into the same problem with 3rd party middleware (and even standard Django middleware at present) which ignores the fact that generators can be passed to HttpResponse as content? The other alternative I can think of is even less appealing - to trigger some scheduled offline processing through a web request and notify the user when it's ready to download as static media. The streaming HttpResponse functionality was initially favoured by Adrian when Ivan introduced it. It would be a shame to "just say no" to it, now. Is it really so bad to add a public attribute on HttpResponse that indicates whether or not the content is a static string or a generator, and have the Django standard middleware (especially any that are enabled by default) be aware of this and only enable features that require checking the entire content when a generator has not been used. Features such as: generating an ETag header, compressing the entire content in one hit instead of iteratively (is there a significant benefit to doing this in one hit?), and setting the Content-Length header? The default behaviour by "unaware" middleware is still to consume any content generator, so those middleware that are ignorant of streaming response will continue to work as they do now. But 3rd party developers who are aware of this can support it if they chose, or not, and Django should support it as best it can. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Call for comments: CommonMiddleware (and others) break streaming HttpResponse objects.
Malcolm Tredinnick wrote: > The response should just return a copy of the > content when response.content is accessed, which means turning any > iterator into a proper string. Ouch.. I thought it already does just that. Yeah, it's a bug then. Though simple to fix. > But it *does* require that everybody will have to behave this way. If I > am writing a piece of middleware I have to assume the worst case, which > here means that the content is a non-repeatable generator. Now I understand what you don't like. If we simply make HttpResponse to replace its source generator with a string upon exhaustion, the problem will disappear. But it's another bug. I'm talking here along the lines "why don't we make GzipMiddleware not read response.content if it doesn't have to". > It's not something that cannot wait and be > trialled in various approaches after 1.0 Oh, yes, definitely post-1.0. > The > ticket itself already proposes some things that will lead to bugs: for > example, content-length must *always* be set I was thinking about it myself... But don't things like Comet work with infinitely long responses that don't have content length? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Call for comments: CommonMiddleware (and others) break streaming HttpResponse objects.
On Tue, 2008-07-01 at 14:42 +0400, Ivan Sagalaev wrote: > Malcolm Tredinnick wrote: > > Any middleware that examines the content has to pull the content into > > memory in case it's an iterator. If they don't they're buggy because > > they're consuming the content ahead of the web server. > > Agreed. > > > But the default behaviour shouldn't > > require repetitive practices from normal, everyday middleware. > > What exactly you mean by "repetitive practices"? I'm afraid my English > fails me again, sorry! At the moment, lots of pieces of middleware are buggy because they consume the content from response.content and neither the content property nor the middleware itself puts the content back into the response. Those pieces of middleware should, with the current state of the code, be checking if the response was created from an iterator and, if so, be careful to replace the content after use by poking into the internals of the response. Every single piece of middleware that examines response.content needs to do this (this is the bit that requires the repetition). Of course, that's a silly requirement (and requires poking at non-public pieces of the response). The response should just return a copy of the content when response.content is accessed, which means turning any iterator into a proper string. > > If I write a third-party middleware that behaves exactly as the > > content-length middleware does now, it won't work with something that > > can only be used as a generator. Since I won't know ahead of time what > > responses will be give to this middleware I am therefore forced to > > assume such a generator will be passed in. Thus, a change like this is > > saying that from now on, for all time, no middleware can look at the > > content of the response unless it fits into the very narrow profile in > > that ticket (matching one or other content types, etc). > > If a middleware has to look at the content - it will. But why not make > certain standard middleware not required to look at the content? They > will be more useful then, and it doesn't imply, IMHO, that everyone > should do it. No? But it *does* require that everybody will have to behave this way. If I am writing a piece of middleware I have to assume the worst case, which here means that the content is a non-repeatable generator. Since generators would be permitted in that world, I have to treat them as possible in my middleware code. So I cannot examine the content. Otherwise my middleware is crippled as a reusable piece of code: it would have to carry a big warning saying "cannot be used unless you know your responses will never use generators", which means I have to investigate the details of every third-party app I have installed before I can safely use my middleware. That simply is not going to be good practice. I really don't see that it's worth the enormous complexity it brings at the moment to introduce this. It's not something that cannot wait and be trialled in various approaches after 1.0, but rushing to create some public interface and functionality promises right now so that generators become fully supported in HttpResponse is going to lead to bugs. The ticket itself already proposes some things that will lead to bugs: for example, content-length must *always* be set, since chunked transfer-encoding isn't permitted with WSGI, so it's not a question of checking _is_string or not -- which, btw, is an internal attribute that middleware should never access -- but, rather, a question of whether the header has already been set. Malcolm --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Call for comments: CommonMiddleware (and others) break streaming HttpResponse objects.
Malcolm Tredinnick wrote: > Any middleware that examines the content has to pull the content into > memory in case it's an iterator. If they don't they're buggy because > they're consuming the content ahead of the web server. Agreed. > But the default behaviour shouldn't > require repetitive practices from normal, everyday middleware. What exactly you mean by "repetitive practices"? I'm afraid my English fails me again, sorry! > If I write a third-party middleware that behaves exactly as the > content-length middleware does now, it won't work with something that > can only be used as a generator. Since I won't know ahead of time what > responses will be give to this middleware I am therefore forced to > assume such a generator will be passed in. Thus, a change like this is > saying that from now on, for all time, no middleware can look at the > content of the response unless it fits into the very narrow profile in > that ticket (matching one or other content types, etc). If a middleware has to look at the content - it will. But why not make certain standard middleware not required to look at the content? They will be more useful then, and it doesn't imply, IMHO, that everyone should do it. No? This got me thinking that the real problem is, in fact, a bit different. Right now middleware applies to all responses indifferently. But it looks like there are at least one more or less common case -- returning a big iterable -- for which most middleware don't make sense anyway and it's better handled as an exception. So may be we should devise a mechanism for such exceptions that won't require a middleware itself to be aware of it. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Call for comments: CommonMiddleware (and others) break streaming HttpResponse objects.
On Tue, 2008-07-01 at 14:02 +0400, Ivan Sagalaev wrote: > Malcolm Tredinnick wrote: > > I thought we'd fixed it, but apparently we haven't: if any iterator is > > passed into an HttpResponse, it should be converted to a string > > immediately so that things can index into it without doing > > non-repeatable consumption. > > Malcolm, sorry, that won't work. I've implemented iterable HttpResponse > in the first place out of purely practical reason: web server times out > if a particularly long file was passed into an HttpResponse. And also it > was really slow and was consuming all the memory. Any middleware that examines the content has to pull the content into memory in case it's an iterator. If they don't they're buggy because they're consuming the content ahead of the web server. If somebody has particular requirements like yours, then, absolutely, they should use a subclass of HttpResponse that doesn't do that. And they know the restrictions from that decision. But the default behaviour shouldn't require repetitive practices from normal, everyday middleware. > In fact this ticket appeared out of our conversation with Tai Lee. I > don't think you argument on adding complexity makes sense because not > all middleware should care about this. Why, say, TransactionMiddleware > should be affected? Or some 3rd-party HTMLValidationMiddleware which > won't touch huge iterators because it checks only HTML? This patch just > establishes a good practice to follow: if you expect a "reasonably > short" response in your middleware -- check for it. If I write a third-party middleware that behaves exactly as the content-length middleware does now, it won't work with something that can only be used as a generator. Since I won't know ahead of time what responses will be give to this middleware I am therefore forced to assume such a generator will be passed in. Thus, a change like this is saying that from now on, for all time, no middleware can look at the content of the response unless it fits into the very narrow profile in that ticket (matching one or other content types, etc). Malcolm --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Call for comments: CommonMiddleware (and others) break streaming HttpResponse objects.
Malcolm Tredinnick wrote: > I thought we'd fixed it, but apparently we haven't: if any iterator is > passed into an HttpResponse, it should be converted to a string > immediately so that things can index into it without doing > non-repeatable consumption. Malcolm, sorry, that won't work. I've implemented iterable HttpResponse in the first place out of purely practical reason: web server times out if a particularly long file was passed into an HttpResponse. And also it was really slow and was consuming all the memory. In fact this ticket appeared out of our conversation with Tai Lee. I don't think you argument on adding complexity makes sense because not all middleware should care about this. Why, say, TransactionMiddleware should be affected? Or some 3rd-party HTMLValidationMiddleware which won't touch huge iterators because it checks only HTML? This patch just establishes a good practice to follow: if you expect a "reasonably short" response in your middleware -- check for it. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Call for comments: CommonMiddleware (and others) break streaming HttpResponse objects.
On Tue, 2008-07-01 at 00:03 -0700, Tai Lee wrote: > http://code.djangoproject.com/ticket/7581 > > Just posted this ticket with an initial patch (sans documentation > changes and tests). Basically there are several middleware classes > that access HttpResponse.content directly which break streaming > HttpResponse objects that use a generator to yield content > progressively. > > I'd like to get some feedback on the direction and implementation of > the patch and proposed solution before I go any further with tests or > documentation changes. I don't think we should be supporting on-demand generation like this. Go back a year or 18 months in the archives and look at where Brian Harring put up some patches to do that sort of thing. He and I and others spent ages chasing down unintended side-effects and in the end it was considered not worth it. That was tip-of-the-iceberg stuff. Yes, it's probably possible to get it right, but I'm not convinced it's worth it. It's quite a large restriction to say that no middleware should ever try to examine the contents of the HttpResponse since it might be an iterator that shouldn't be consumed. You're proposing a bunch of specific changes for the middleware we have now to work around that, but you're also imposing a huge functionality constraint on any future middleware (all middleware will have to work with the possibility that a non-consumable generator is there). You're also effectively said that only certain content types can be streamed (because a whole bunch of them *will* be affected by consuming middleware). "Just say no" is a cleaner solution here. I thought we'd fixed it, but apparently we haven't: if any iterator is passed into an HttpResponse, it should be converted to a string immediately so that things can index into it without doing non-repeatable consumption. The only thing that might be worth doing in this are is adding a way to say "middleware should never be called on this response" and then somebody can write their own HttpResponse subclass and be in complete control of their destiny. Adding all the complexity to the existing HttpResponse gets a -1 from me. It's too easy to have unintended side-effects and not really in the sweet spot of Django. It also doesn't seem to be a 1.0 requirement. Malcolm --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Call for comments: CommonMiddleware (and others) break streaming HttpResponse objects.
Tai Lee wrote: > http://code.djangoproject.com/ticket/7581 > > Just posted this ticket with an initial patch (sans documentation > changes and tests). Basically there are several middleware classes > that access HttpResponse.content directly which break streaming > HttpResponse objects that use a generator to yield content > progressively. > > I'd like to get some feedback on the direction and implementation of > the patch and proposed solution before I go any further with tests or > documentation changes. In GzipMiddleware you don't touch Content-Length if the response is not string. This means that if a response has Content-Length already set it will become incorrect upon gzipping the content. In this case Content-Length can only be deleted from headers, though sending responses with no determined length is a bit ugly... --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---