#24242: compress_sequence creates a larger content than no compression
-------------------------------+--------------------
     Reporter:  dracos         |      Owner:  nobody
         Type:  Uncategorized  |     Status:  new
    Component:  Uncategorized  |    Version:  1.7
     Severity:  Normal         |   Keywords:
 Triage Stage:  Unreviewed     |  Has patch:  0
Easy pickings:  0              |      UI/UX:  0
-------------------------------+--------------------
 I have a view that is 825157 bytes without gzipping, 35751 bytes gzipped
 as an HttpResponse, but 1010920 bytes gzipped as a StreamingHttpResponse.
 The output of the script given below with some noddy data is:

 {{{
 Normal string: 38890
 compress_string: 18539
 compress_sequence: 89567
 compress_sequence, no flush: 18539
 }}}

 Noddy content perhaps, but in actual use I'm very much wanting to use
 StreamingHttpResponse on very large JSON responses (then it uses 200Mb
 memory with iterables throughout, as opposed to 2Gb with more standard
 code/HttpResponse), and the Python json package flushes after each key,
 value, and punctuation in-between. Having the gzip middleware flush
 similarly creates a much larger output than no gzipping, with the figures
 given at the top. It would seem that many uses of StreamingHttpResponse
 will similarly be flushing regularly at the content level. #7581 does
 mention "some penalty in compression performance" but producing a worse-
 than-none performance seems a bit much :)

 Should compress_sequence bunch up flushes to provide at least some level
 of compression? Or if it's a StreamingHttpResponse, should it not bother
 gzipping?

 {{{#!python
 from django.utils.text import *
 from django.utils.six.moves import map

 # Identical to django.utils.text.compress_sequence
 # but with the flush line commented out
 def compress_sequence_without_flush(sequence):
     buf = StreamingBuffer()
     zfile = GzipFile(mode='wb', compresslevel=6, fileobj=buf)
     # Output headers...
     yield buf.read()
     for item in sequence:
         zfile.write(item)
         # zfile.flush()
         yield buf.read()
     zfile.close()
     yield buf.read()

 class Example(object):
     def __iter__(self):
         return map(str, xrange(10000))

 e = Example()
 print 'Normal string:', len(b''.join(e))
 print 'compress_string:', len(compress_string(b''.join(e)))
 print 'compress_sequence:', len(b''.join(compress_sequence(e)))
 print 'compress_sequence, no flush:',
 len(b''.join(compress_sequence_without_flush(e)))
 }}}

--
Ticket URL: <https://code.djangoproject.com/ticket/24242>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To post to this group, send email to django-updates@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/049.88df24e82075f80a2b2dc8b06b2054aa%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to