Hi everyone,

so this is an interesting problem.  There are 2 parts to this:

1. in the container being destroyed, some socket(s) remain open for a
period of time, which prevents the container from fully exits until all
its sockets have exited.  While this happens you will see the 'waiting
for lo to become free' message repeated.

2. while the previous container is waiting for its sockets to exit, so
it can exit, any new containers are blocked from starting.

For issue #2, I'm not clear just yet on what is blocking the new
containers from starting.  I thought it could be the rtnl_lock since the
exiting container does loop in the rtnl_unlock path, however while it's
looping it also calls __rtnl_unlock/rtnl_lock during its periods of
waiting, so the mutex should not be held the entire time.  I'm still
investigating that part of this issue to find what exactly is blocking
the new containers.

For issue #1, the sockets are lingering because - at least for the
reproducer case from comment 16 - there is a TCP connection that never
is closed, so the kernel continues to probe the other end until it times
out, which takes around 2 minutes.  For this specific reproducer, there
are 2 easy ways to work around this.  First for background, the
reproducer uses docker to create a CIFS mount, which uses a TCP
connection.  This is the script run in the 'client' side, where the hang
happens:

    date ; \
    mount.cifs //$SAMBA_PORT_445_TCP_ADDR/public /mnt/ -o 
vers=3.0,user=nobody,password= ; \
    date ; \
    ls -la /mnt ; \
    umount /mnt ; \
    echo "umount ok"

this is repeated 50 times in the script, but the 2nd (and later) calls
hang (for 2 minutes each, if you wait that long).

A. The reason the TCP connection to the CIFS server lingers is beacuse
it is never closed; the reason it isn't closed is because the script
exits the container immediately after unmounting.  The kernel cifs
driver, during fs unmount, closes the TCP socket to the CIFS server,
which queues a TCP FIN to the server.  Normally, this FIN will reach the
server who will respond and the TCP connection will close.  However,
since the container starts exiting immediately, that FIN never reaches
the CIFS server - the interface is taken down too quickly.  So, the TCP
connection remains.  To avoid this, simply add a short sleep between
unmounting and exiting the container, e.g. (new line prefixed with +):

    date ; \
    mount.cifs //$SAMBA_PORT_445_TCP_ADDR/public /mnt/ -o 
vers=3.0,user=nobody,password= ; \
    date ; \
    ls -la /mnt ; \
    umount /mnt ; \
+    sleep 1 ; \
    echo "umount ok"

that avoids the problem, and allows the container to exit immediately
and the next container to start immediately.

B. Instead of delaying enough to allow the TCP connection to exit, the
kernel's TCP configuration can be changed to more quickly timeout.  The
specific setting to change is tcp_orphan_retries, which controls how
many times the kernel keeps trying to talk to a remote host over a
closed socket.  Normally, this defaults to 0 - this will cause the
kernel to actually use a value of 8 (retries).  So, change it from 0 to
1, to actually reduce it from 8 to 1 (confusing, I know).  Like (new
lines prefixed with +):

    date ; \
+    echo 1 > /proc/sys/net/ipv4/tcp_orphan_retries ; \
+    grep -H . /proc/sys/net/ipv4/tcp_orphan_retries ; \
    mount.cifs //$SAMBA_PORT_445_TCP_ADDR/public /mnt/ -o 
vers=3.0,user=nobody,password= ; \
    date ; \
    ls -la /mnt ; \
    umount /mnt ; \
    echo "umount ok"


the 'grep' added there is just so you can verify the value was correctly 
changed.  Note this method will take slightly longer than method A, since this 
does perform 1 retry, while method A closes the TCP connection correctly and 
does no retries.


To extrapolate this generally to situations beside this specific reproducer, 
here are some suggestions:

1. close all your connections before closing the container.  Especially
kernel connections (like the CIFS TCP connection).

2. if you can't close the connections, add a delay as the last action
the container takes, after bringing down all its interfaces (except lo)
but before exiting - like sleep for up to 2 minutes.  The container will
take up a bit more resources during that time but it should allow enough
time for all TCP connections to close.

3. if you can't do either of those workarounds, reduce the number of TCP
retries that will be done after the container starts exiting by changing
the tcp_orphan_retries param (from inside the container - don't change
this value in your host system).  Change the parameter right before you
close the container, if possible.

Those are simply workarounds - I'm still investigating what is blocking
new containers from starting, as well as looking at ways to forcibly
destroy a container's sockets when the container is exiting (e.g. maybe
sock_diag_destroy)

Please let me know if any of those, or similar, workarounds helps anyone
who is seeing this problem.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1711407

Title:
  unregister_netdevice: waiting for lo to become free

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Trusty:
  New
Status in linux source package in Xenial:
  New
Status in linux source package in Zesty:
  New
Status in linux source package in Artful:
  Confirmed
Status in linux source package in bb-series:
  New

Bug description:
  This is a "continuation" of bug 1403152, as that bug has been marked
  "fix released" and recent reports of failure may (or may not) be a new
  bug.  Any further reports of the problem should please be reported
  here instead of that bug.

  --

  [Impact]

  When shutting down and starting containers the container network
  namespace may experience a dst reference counting leak which results
  in this message repeated in the logs:

      unregister_netdevice: waiting for lo to become free. Usage count =
  1

  This can cause issues when trying to create net network namespace and
  thus block a user from creating new containers.

  [Test Case]

  See comment 16, reproducer provided at https://github.com/fho/docker-
  samba-loop

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to