Hello,

I apologize if this is the wrong list to report this bug to, I did not
find a more specific listing in the maintainers file. I think this is
a kernel issue and not an issue with my distro, but if you disagree I
can re-direct this report as appropriate.

I am upgrading some Linux 4.2 servers to Linux 4.4 (Ubuntu Xenial),
and during testing I'm observing TCP segment re-transmits very
occasionally on the loopback device, leading to 200ms latency spikes.
I don't observe the issues on non loopback devices, and I believe that
I've narrowed it down to an issue with qdiscs on loopback.

It seems that when a queuing discipline other than noqueue is attached
to a loopback device in 4.4+ kernels, packets will (very occasionally)
get dropped completely leading to a re-transmit. I'm not sure how this
can happen, and I've been trying to figure out what's going on, but if
anyone has any pointers or suggestions I'd very much appreciate that.

I've attached the script I'm using to reproduce the bug and an example
ab run that I believe shows the bug. In particular, the max timings of
200ms in the ab output and seeing TCP segment re-transmits (and
sometimes RSTs) in the tcpdump output is indicating the issue to me. I
have tested on 3.13, 4.2 and 4.4 kernels and only 4.4 is showing the
issue. Furthermore non loopback interfaces don't appear to have the
bug. So I ran git diff v4.2..v4.4 drivers/net/loopback.c, and the only
commit that seems to touch loopback.c is e65db2b7. I'm attempting to
revert the change and re-compile to see if that commit triggers the
bug, but I don't understand why that change would be breaking things
in this way so that's just a guess.

I'm continuing to try to debug this, but I figured it would be a good
idea to report it here in case someone with more familiarity may know
what's going on. Please let me know if there is any additional
information I can provide or tests I can run.

Thank you,
-Joey Lynch
This is ApacheBench, Version 2.3 <$Revision: 1528965 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests


Server Software:        nginx/1.11.3
Server Hostname:        127.0.0.1
Server Port:            80

Document Path:          /
Document Length:        612 bytes

Concurrency Level:      100
Time taken for tests:   10.636 seconds
Complete requests:      100000
Failed requests:        0
Total transferred:      84500000 bytes
HTML transferred:       61200000 bytes
Requests per second:    9401.97 [#/sec] (mean)
Time per request:       10.636 [ms] (mean)
Time per request:       0.106 [ms] (mean, across all concurrent requests)
Transfer rate:          7758.46 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2   1.9      2       8
Processing:     2    9   2.8      9     206
Waiting:        1    8   3.1      8     205
Total:          6   11   1.6     11     206

Percentage of the requests served within a certain time (ms)
  50%     11
  66%     11
  75%     11
  80%     11
  90%     12
  95%     12
  98%     14
  99%     14
 100%    206 (longest request)

Attachment: repro.sh
Description: Bourne shell script

This is ApacheBench, Version 2.3 <$Revision: 1528965 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests


Server Software:        nginx/1.11.3
Server Hostname:        127.0.0.1
Server Port:            80

Document Path:          /
Document Length:        612 bytes

Concurrency Level:      100
Time taken for tests:   7.423 seconds
Complete requests:      100000
Failed requests:        0
Total transferred:      84500000 bytes
HTML transferred:       61200000 bytes
Requests per second:    13471.01 [#/sec] (mean)
Time per request:       7.423 [ms] (mean)
Time per request:       0.074 [ms] (mean, across all concurrent requests)
Transfer rate:          11116.21 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0       5
Processing:     1    7   0.9      7      12
Waiting:        1    7   0.9      7      12
Total:          4    7   0.6      7      12

Percentage of the requests served within a certain time (ms)
  50%      7
  66%      7
  75%      7
  80%      7
  90%      7
  95%      8
  98%     10
  99%     11
 100%     12 (longest request)

Reply via email to