#32472: runserver prematurely closes connection for large response body
-------------------------------+------------------------------------
     Reporter:  David Sanders  |                    Owner:  nobody
         Type:  Bug            |                   Status:  new
    Component:  HTTP handling  |                  Version:  3.1
     Severity:  Normal         |               Resolution:
     Keywords:                 |             Triage Stage:  Accepted
    Has patch:  0              |      Needs documentation:  0
  Needs tests:  0              |  Patch needs improvement:  0
Easy pickings:  0              |                    UI/UX:  0
-------------------------------+------------------------------------

Comment (by David Sanders):

 > This can be a regression in 934acf1126995f6e6ccba5947ec8f7561633c27f.

 @Mariusz, I did review that change while investigating, but came to the
 conclusion that the behavior would have been there before that change as
 well, since the main issue seemed to be a race between sending the data
 and closing the socket.

 @Florian, yea, I was actually able to reproduce with a little script using
 just the Python standard library. Since the issue was due to a race
 between sending and closing the socket, and that's the standard behavior
 from the Python standard library, the issue was in some ways inherited
 from it.

 Unfortunately I can no longer reproduce, and don't think anyone else will
 have much luck reproducing.

 I had to go deeper down the rabbit hole, and now I believe the reason for
 the premature connection closure was actually between the client and the
 server. I was running Django in a VS Code dev container (see
 https://code.visualstudio.com/docs/remote/containers) using the
 `python:3.9` Docker image as the base. I'd forgotten that VS Code has some
 "magic" when using dev containers, where it will forward a port from the
 host VS Code is running on to a port inside the dev container. I don't
 know exactly how they have that set up (they haven't open-sourced the code
 for dev containers), but looking at what's running inside the dev
 container, I'm imagining there's probably a bucket brigade which goes
 something like: bind the port on the host VS Code is running on, pipe
 traffic to a script running inside the dev container, pipe traffic from
 that script inside the dev container to the target port inside the dev
 container.

 I'm guessing there's a race condition or bug in their scripts for that
 bucket brigade where the connection with Django closing causes the
 observed behavior where the connection is closed to the client without
 finishing sending the response data. Since there's likely multiple legs
 involved in that forwarding magic, and the source isn't available, who
 knows exactly where that was.

 I somewhat confirmed this theory by skipping the forwarding magic and
 having my test client script go straight to the Django port in the
 container, and that didn't seem to have the premature closure.

 However, at some point I decided (or thought it was a good idea) to
 recreate the dev container, and now it won't reproduce reliably. I did see
 it for a moment when the container was doing work installing extensions,
 so I'm guessing the race condition or bug may be exasperated by load on
 the container. The container I'd been using before when it was reliably
 reproducing had been up several days. Note to self, add a tool for
 debugging - try a full system reboot before investing time to chase an
 issue.

 So, in effect, there was a buggy network link (since there was software
 forwarding involved) between the client and Django. The proposed fix of
 waiting for the client to close the connection inadvertently worked around
 that buggy link behavior.

 Certainly explains the seemingly odd behavior and why `SO_LINGER` didn't
 do anything to help the situation.

 ----

 I think the proposed patch in the original comment helps add robustness to
 network glitches since it waits for the client to close the connection,
 ensuring that the response was received, but it has trade offs which
 ultimately might make it not worthwhile. There could also be HTTP clients
 which might not close the connection if they expect the server to do so,
 so it could do more harm than good.

 @Mariusz's comment about test failures caused me to spend some time
 looking at that before I realized the VS Code mischief, and doing so
 pointed out that waiting for the client to close the connection can't work
 when the response has no `Content-Length` header (aka,
 `StreamingHttpResponse`). I made a quick and dirty implementation of
 chunked encoding for the dev server to handle this case, and it did work
 in my testing, allowing the HTTP client to know all data was received and
 close the connection itself. However, HTTP/1.0 clients wouldn't be able to
 use that, so for a streaming response to HTTP/1.0 the existing behavior of
 the server closing the connection would need to be maintained.

 I'm going to attach some in-progress stuff for posterity's sake, and in
 case anyone has any interest. The implementation of chunked encoding for
 HTTP/1.1 can be done pretty minimally and might be worth implementing in a
 different ticket. It's always good to have the dev server behavior act
 more like production, and I think most production web servers would be
 using chunked encoding for a response like that.

 ----

 Apologies for chasing ghosts with this issue, definitely a case of a bit
 too much "magic" in modern development tools.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/32472#comment:4>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/068.0ab74b294dc89faf58061b36a02dc8fc%40djangoproject.com.

Reply via email to