#36656: GZipMiddleware drops content from async streaming responses
-------------------------------+----------------------------------------
     Reporter:  Adam Johnson   |                    Owner:  Adam Johnson
         Type:  Bug            |                   Status:  assigned
    Component:  HTTP handling  |                  Version:  dev
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Unreviewed
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+----------------------------------------
Changes (by Adam Johnson):

 * summary:  `GZipMiddleware` drops content from async streaming responses
     => GZipMiddleware drops content from async streaming responses


Old description:

> 0bd2c0c9015b53c41394a1c0989afbfd94dc2830 (#33735) expanded
> `GzipMiddleware` to support async streaming responses. But it does so
> with a faulty `gzip_wrapper()` that compresses chunks as individual
> files, rather than as a continuous stream. As a result, only the first
> chunk is decompressible, and browsers drop the rest of the response, even
> as they stay connected and download all the data.
>
> The solution is to use a streaming approach with `GzipFile`, as is
> already done for sync responses in `compress_sequence()`.
>
> Additionally, the sync approach currently starts by sending an empty
> chunk to flush the headers. I think that may be necessary for async
> responses too, since the first content chunk may take an arbitrary amount
> of time to be generated.
>
> To reproduce the issue, use the app below, which can be run with `uv run
> --script`. If you comment out `GzipMiddleware` and load the page in a
> browser, you will see the numbers incrementing every second. If you
> include `GzipMiddleware`, only the header will appear, and the rest of
> the response will be dropped.
>
> {{{#!python
> #!/usr/bin/env uv run --script
> # /// script
> # requires-python = ">=3.14"
> # dependencies = [
> #     "daphne",
> #     "django",
> # ]
> # ///
> from __future__ import annotations
>
> import asyncio
> import os
> import sys
>
> from django.conf import settings
> from django.core.asgi import get_asgi_application
> from django.http import StreamingHttpResponse
> from django.urls import path
>
> settings.configure(
>     # Dangerous: disable host header validation
>     ALLOWED_HOSTS=["*"],
>     # Use DEBUG=1 to enable debug mode
>     DEBUG=(os.environ.get("DEBUG", "") == "1"),
>     # Make this module the urlconf
>     ROOT_URLCONF=__name__,
>     # Use Daphne for async runserver
>     INSTALLED_APPS=[
>         "daphne",
>     ],
>     ASGI_APPLICATION=f"{__name__}.app",
>     # Only gzip middleware
>     MIDDLEWARE=[
>         "django.middleware.gzip.GZipMiddleware",
>     ],
> )
>

> async def clock(request):
>     async def stream():
>         yield "<h1>Clock</h1>\n"
>         count = 1
>         while True:
>             yield f"<p>{count}</p>\n"
>             count += 1
>             await asyncio.sleep(1)
>
>     return StreamingHttpResponse(stream())
>

> urlpatterns = [
>     path("", clock),
> ]
>
> app = get_asgi_application()
>
> if __name__ == "__main__":
>     from django.core.management import execute_from_command_line
>
>     execute_from_command_line(sys.argv)
> }}}

New description:

 0bd2c0c9015b53c41394a1c0989afbfd94dc2830 (#33735) expanded
 `GZipMiddleware` to support async streaming responses. But it does so with
 a faulty `gzip_wrapper()` that compresses chunks as individual files,
 rather than as a continuous stream. As a result, only the first chunk is
 decompressible, and browsers drop the rest of the response, even as they
 stay connected and download all the data.

 The solution is to use a streaming approach with `GzipFile`, as is already
 done for sync responses in `compress_sequence()`.

 Additionally, the sync approach currently starts by sending an empty chunk
 to flush the headers. I think that may be necessary for async responses
 too, since the first content chunk may take an arbitrary amount of time to
 be generated.

 To reproduce the issue, use the app below, which can be run with `uv run
 --script`. If you comment out `GzipMiddleware` and load the page in a
 browser, you will see the numbers incrementing every second. If you
 include `GzipMiddleware`, only the header will appear, and the rest of the
 response will be dropped.

 {{{#!python
 #!/usr/bin/env uv run --script
 # /// script
 # requires-python = ">=3.14"
 # dependencies = [
 #     "daphne",
 #     "django",
 # ]
 # ///
 from __future__ import annotations

 import asyncio
 import os
 import sys

 from django.conf import settings
 from django.core.asgi import get_asgi_application
 from django.http import StreamingHttpResponse
 from django.urls import path

 settings.configure(
     # Dangerous: disable host header validation
     ALLOWED_HOSTS=["*"],
     # Use DEBUG=1 to enable debug mode
     DEBUG=(os.environ.get("DEBUG", "") == "1"),
     # Make this module the urlconf
     ROOT_URLCONF=__name__,
     # Use Daphne for async runserver
     INSTALLED_APPS=[
         "daphne",
     ],
     ASGI_APPLICATION=f"{__name__}.app",
     # Only gzip middleware
     MIDDLEWARE=[
         "django.middleware.gzip.GZipMiddleware",
     ],
 )


 async def clock(request):
     async def stream():
         yield "<h1>Clock</h1>\n"
         count = 1
         while True:
             yield f"<p>{count}</p>\n"
             count += 1
             await asyncio.sleep(1)

     return StreamingHttpResponse(stream())


 urlpatterns = [
     path("", clock),
 ]

 app = get_asgi_application()

 if __name__ == "__main__":
     from django.core.management import execute_from_command_line

     execute_from_command_line(sys.argv)
 }}}

--
-- 
Ticket URL: <https://code.djangoproject.com/ticket/36656#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/django-updates/01070199cee3f177-95c8406e-7f24-4afc-93f6-8c1682128ede-000000%40eu-central-1.amazonses.com.

Reply via email to