#36293: Extend `django.utils.text.compress_sequence()` to optionally flush data
written to compressed file
-------------------------------+--------------------------------------
     Reporter:  huoyinghui     |                    Owner:  (none)
         Type:  New feature    |                   Status:  closed
    Component:  HTTP handling  |                  Version:  dev
     Severity:  Normal         |               Resolution:  needsinfo
     Keywords:  gzip flush     |             Triage Stage:  Unreviewed
    Has patch:  1              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+--------------------------------------
Comment (by Adam Johnson):

 Thanks for using that example Jacob.

 However, I don’t think it’s a great template for streaming responses,
 pushing one line at a time. The approach is quite wasteful for HTTP, as
 each line is pushed in its own packet. Moreover, it would be wasteful to
 load one record at a time from the database.

 A better example would paginate a queryset and yield one page of rows at a
 time. This would be more optimal in all layers, compression included. And
 this should stnad for streaming all kinds of data: HTML, JSON, CSV, ...

 I modified the example to send batches of 100 rows at a time using
 `itertools.batched`:

 {{{#!python
 #!/usr/bin/env uv run --script
 # /// script
 # requires-python = ">=3.14"
 # dependencies = [
 #     "django",
 # ]
 #
 # [tool.uv.sources]
 # django = { path = "../../../django", editable = true }
 # ///
 from __future__ import annotations

 import csv
 import os
 import sys
 from io import StringIO
 from itertools import batched

 from django.conf import settings
 from django.core.wsgi import get_wsgi_application
 from django.http import StreamingHttpResponse
 from django.urls import path

 settings.configure(
     # Dangerous: disable host header validation
     ALLOWED_HOSTS=["*"],
     # Use DEBUG=1 to enable debug mode
     DEBUG=(os.environ.get("DEBUG", "") == "1"),
     # Make this module the urlconf
     ROOT_URLCONF=__name__,
     # Only gzip middleware
     MIDDLEWARE=[
         "django.middleware.gzip.GZipMiddleware",
     ],
 )


 def some_streaming_csv_view(request):
     """A view that streams a large CSV file."""
     # Generate a sequence of rows. The range is based on the maximum
 number of
     # rows that can be handled by a single sheet in most spreadsheet
     # applications.
     rows = ([f"Row {idx}", str(idx)] for idx in range(65536))
     buffer = StringIO()
     writer = csv.writer(buffer)

     def stream_rows():
         for batch in batched(rows, 100):
             buffer.seek(0)
             buffer.truncate()
             writer.writerows(batch)
             yield buffer.getvalue()

     return StreamingHttpResponse(
         stream_rows(),
         # Show the response in the browser to more easily measure
 response.
         content_type="text/plain; charset=utf-8",
         # content_type="text/csv",
         # headers={"Content-Disposition": 'attachment;
 filename="somefilename.csv"'},
     )


 urlpatterns = [
     path("", some_streaming_csv_view),
 ]

 app = get_wsgi_application()

 if __name__ == "__main__":
     from django.core.management import execute_from_command_line

     execute_from_command_line(sys.argv)
 }}}

 (Note editable Django install in metadata, needs correct path.)

 I used this cURL command to measure the same stats you were checking:

 {{{
 curl -w "%{size_download} %{time_total}" -o /dev/null -s -H "Accept-
 Encoding: gzip" --raw "http://127.0.0.1:8000/"; | awk '{printf "%.0fKB,
 %.0fms\n", $1/1024, $2*1000}'
 }}}

 It gave me these results:

 gzipped, no flushing: 307KB, 44ms
 gzipped, flushing: 287KB, 52ms
 no gzipping: 1066KB, 30ms

 This change makes the flushing version *more* optimal than the flushless
 one (73% savings versus 71%), while a bit (~15%) slower. I think we can
 always expect some slowdown because we’ll send more packets, but the per-
 chunk latency saving is worth it.

 So maybe we should rewrite that example, as well as restore the flushing?
-- 
Ticket URL: <https://code.djangoproject.com/ticket/36293#comment:14>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/django-updates/0107019a187cf6d1-01d9e3c2-a411-4670-a970-8be37524b6f9-000000%40eu-central-1.amazonses.com.

Reply via email to