#36293: Extend `django.utils.text.compress_sequence()` to optionally flush data
written to compressed file
-------------------------------+--------------------------------------
Reporter: huoyinghui | Owner: (none)
Type: New feature | Status: closed
Component: HTTP handling | Version: dev
Severity: Normal | Resolution: needsinfo
Keywords: gzip flush | Triage Stage: Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------+--------------------------------------
Comment (by Adam Johnson):
Thanks for using that example Jacob.
However, I don’t think it’s a great template for streaming responses,
pushing one line at a time. The approach is quite wasteful for HTTP, as
each line is pushed in its own packet. Moreover, it would be wasteful to
load one record at a time from the database.
A better example would paginate a queryset and yield one page of rows at a
time. This would be more optimal in all layers, compression included. And
this should stnad for streaming all kinds of data: HTML, JSON, CSV, ...
I modified the example to send batches of 100 rows at a time using
`itertools.batched`:
{{{#!python
#!/usr/bin/env uv run --script
# /// script
# requires-python = ">=3.14"
# dependencies = [
# "django",
# ]
#
# [tool.uv.sources]
# django = { path = "../../../django", editable = true }
# ///
from __future__ import annotations
import csv
import os
import sys
from io import StringIO
from itertools import batched
from django.conf import settings
from django.core.wsgi import get_wsgi_application
from django.http import StreamingHttpResponse
from django.urls import path
settings.configure(
# Dangerous: disable host header validation
ALLOWED_HOSTS=["*"],
# Use DEBUG=1 to enable debug mode
DEBUG=(os.environ.get("DEBUG", "") == "1"),
# Make this module the urlconf
ROOT_URLCONF=__name__,
# Only gzip middleware
MIDDLEWARE=[
"django.middleware.gzip.GZipMiddleware",
],
)
def some_streaming_csv_view(request):
"""A view that streams a large CSV file."""
# Generate a sequence of rows. The range is based on the maximum
number of
# rows that can be handled by a single sheet in most spreadsheet
# applications.
rows = ([f"Row {idx}", str(idx)] for idx in range(65536))
buffer = StringIO()
writer = csv.writer(buffer)
def stream_rows():
for batch in batched(rows, 100):
buffer.seek(0)
buffer.truncate()
writer.writerows(batch)
yield buffer.getvalue()
return StreamingHttpResponse(
stream_rows(),
# Show the response in the browser to more easily measure
response.
content_type="text/plain; charset=utf-8",
# content_type="text/csv",
# headers={"Content-Disposition": 'attachment;
filename="somefilename.csv"'},
)
urlpatterns = [
path("", some_streaming_csv_view),
]
app = get_wsgi_application()
if __name__ == "__main__":
from django.core.management import execute_from_command_line
execute_from_command_line(sys.argv)
}}}
(Note editable Django install in metadata, needs correct path.)
I used this cURL command to measure the same stats you were checking:
{{{
curl -w "%{size_download} %{time_total}" -o /dev/null -s -H "Accept-
Encoding: gzip" --raw "http://127.0.0.1:8000/" | awk '{printf "%.0fKB,
%.0fms\n", $1/1024, $2*1000}'
}}}
It gave me these results:
gzipped, no flushing: 307KB, 44ms
gzipped, flushing: 287KB, 52ms
no gzipping: 1066KB, 30ms
This change makes the flushing version *more* optimal than the flushless
one (73% savings versus 71%), while a bit (~15%) slower. I think we can
always expect some slowdown because we’ll send more packets, but the per-
chunk latency saving is worth it.
So maybe we should rewrite that example, as well as restore the flushing?
--
Ticket URL: <https://code.djangoproject.com/ticket/36293#comment:14>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/django-updates/0107019a187cf6d1-01d9e3c2-a411-4670-a970-8be37524b6f9-000000%40eu-central-1.amazonses.com.