Jie Yu created MESOS-9127:
-----------------------------

             Summary: Port mapper CNI plugin might deadlock iptables on the 
agent.
                 Key: MESOS-9127
                 URL: https://issues.apache.org/jira/browse/MESOS-9127
             Project: Mesos
          Issue Type: Bug
          Components: network
    Affects Versions: 1.6.1, 1.5.1, 1.4.1
            Reporter: Jie Yu


Recently, we noticed that if one launches a lot of containers that use the port 
mapper CNI plugin, the iptables will get deadlock. The symptom is like the 
following.

If you do any iptables command on the box, it'll get stuck on acquiring the 
xtables lock:
{noformat}
core@ip-10-0-2-99 ~ $ time iptables -w -t nat -S UCR-DEFAULT-BRIDGE 
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
Another app is currently holding the xtables lock; waiting for it to exit...
^C

real    0m41.349s
user    0m0.001s
sys    0m0.000s
{noformat}

And you'll notice that a lot of iptables and mesos port mapper CNI plugin 
processes on the box

{noformat}
$ ps -fp $(pidof mesos-cni-port-mapper) | wc -l
191
$ ps -fp $(pidof iptables) | wc -l
192
{noformat}

Then, we look into the process that is holding the xtables lock:

{noformat}
$ sudo netstat -p -n | grep xtables
unix  2      [ ]         STREAM                   225083   25048/iptables       
@xtables
$ ps aux | grep 25048
root     25048  0.0  0.0  26184  2512 ?        S    22:41   0:00 iptables -w -t 
nat -S UCR-DEFAULT-BRIDGE
core     31857  0.0  0.0   6760   976 pts/0    S+   23:18   0:00 grep 
--colour=auto 25048
$ sudo strace -s 10000 -p 25048
Process 25048 attached
write(1, "-dport 13839 -m comment --comment \"container_id: 
a074efef-c119-4764-a584-87c57e6b7f68\" -j DNAT --to-destination 
172.31.254.183:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 
17282 -m comment --comment \"container_id: 
47880db5-90ad-4034-9c2b-2fd246d42342\" -j DNAT --to-destination 
172.31.254.126:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 
28712 -m comment --comment \"container_id: 
3293d5d0-772c-48d2-bafd-1c4b4d56247e\" -j DNAT --to-destination 
172.31.254.130:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 
23893 -m comment --comment \"container_id: 
f57de8eb-a3b9-44cb-8dac-7ab261bc8aac\" -j DNAT --to-destination 
172.31.254.149:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 
28449 -m comment --comment \"container_id: 
9238dbf0-7b28-4fda-880a-bf0c8f40562a\" -j DNAT --to-destination 
172.31.254.190:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 
26438 -m comment --comment \"container_id: 
d307cf58-8972-4de4-ad45-26c29786add0\" -j DNAT --to-destination 
172.31.254.187:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 
7682 -m comment --comment \"container_id: 
60f5a61b-f4c0-4846-b2cd-63cd7eb5a4e8\" -j DNAT --to-destination 
172.31.254.177:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 
23904 -m comment --comment \"container_id: 
f203ff9e-7b81-4e54-ab44-d45e2a937f38\" -j DNAT --to-destination 
172.31.254.157:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 
8359 -m comment --comment \"container_id: 
578cc89c-83bf-46ba-9ae7-c7b89e40e739\" -j DNAT --to-destination 
172.31.254.158:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 
28482 -m comment --comment \"container_id: 
70721adb-cd6c-4158-8b11-4ef694999203\" -j DNAT --to-destination 
172.31.254.163:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 
2564 -m comment --comment \"container_id: 
70bc9735-8e0a-4b27-9a68-1b1151c9f6ec\" -j DNAT --to-destination 
172.31.254.105:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 
6952 -m comment --comment \"container_id: 
a494e0fe-ce6e-4e95-b094-504c7603b6fb\" -j DNAT --to-destination 
172.31.254.159:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 
29371 -m comment --comment \"container_id: 
1dac0183-8939-4165-95e8-cf6503cc53ad\" -j DNAT --to-destination 
172.31.254.174:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 
6551 -m comment --comment \"container_id: 
148a197b-7247-4a3c-b528-1417cc8ccf3f\" -j DNAT --to-destination 
172.31.254.144:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 
16568 -m comment --comment \"container_id: 
e68bb942-ab69-43a5-a308-40b33368d111\" -j DNAT --to-destination 
172.31.254.184:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 
5915 -m comment --comment \"container_id: 
08d53ef6-7d3c-4b44-8bb7-923b6a4a1780\" -j DNAT --to-destination 
172.31.254.186:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 
11252 -m comment --comment \"container_id: 
894e1455-6c77-4fc2-8414-53f3db7b492a\" -j DNAT --to-destination 
172.31.254.223:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 
7583 -m comment --comment \"container_id: 
917b1fc3-12ca-4198-a5f4-13144fbf9096\" -j DNAT --to-destination 
172.31.254.164:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 
12695 -m comment --comment \"container_id: 
cfadeb29-3481-4e9e-8eba-e57ce6a4f693\" -j DNAT --to-destination 
172.31.254.153:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 
1256 -m comment --comment \"container_id: 
212649ee-a894-4201-bc38-798545ce9e1f\" -j DNAT --to-destination 
172.31.254.132:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 
28581 -m comment --comment \"container_id: 
38c46524-9023-450d-a134-24ad14eecbd5\" -j DNAT --to-destination 
172.31.254.202:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 
5709 -m comment --comment \"container_id: 
8a510a8e-df16-40b2-a4b7-36605a5f9984\" -j DNAT --to-destination 
172.31.254.194:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport 
6050 -m comment --comment \"container_id: 
94757d96-b6ad-4a57-9e38-c956a2e24fd6\" -j DNAT --to-destination 
172.31.254.152:80\n-A UCR-DEFAULT-B", 4096
{noformat}

So it gets stuck when writing to stdout. That indicates that the pipe is full, 
and the reader end hasn't read from the pipe. Then we identified the reader

{noformat}
$ ps aux | grep a074efef-c119-4764-a584-87c57e6b7f68
core      2355  0.0  0.0   6756  1020 pts/0    S+   23:39   0:00 grep 
--colour=auto a074efef-c119-4764-a584-87c57e6b7f68
root     26486  0.0  0.0  11948  2852 ?        S    22:41   0:00 sh -c        
#!/bin/sh       exec 1>&2       set -x        # The iptables command searches 
for the DNAT rules with tag       # "container_id: <CNI_CONTAINERID>", and if 
it exists goes ahead       # and deletes it.       iptables -w -t nat -S 
UCR-DEFAULT-BRIDGE | sed "/container_id: a074efef-c119-4764-a584-87c57e6b7f68/ 
s/-A/iptables -w -t nat -D/e"
root     26490  0.0  0.0   4340   896 ?        S    22:41   0:00 sed 
/container_id: a074efef-c119-4764-a584-87c57e6b7f68/ s/-A/iptables -w -t nat 
-D/e
$ sudo strace -s 10000 -p 26490
Process 26490 attached
read(0, ^CProcess 26490 detached
{noformat}

The reader which is `sed` is reading from stdin, somehow gets stuck as well.

The corresponding container gets stuck during CNI detach after checking the 
agent log.

After checking the code, we pinpoint the problematic script in our CNI port 
mapper plugin:
https://github.com/apache/mesos/blob/1.6.0/src/slave/containerizer/mesos/isolators/network/cni/plugins/port_mapper/port_mapper.cpp#L372

{code}
iptables -w -t nat -S %s | sed "/%s/ s/-A/iptables -w -t nat -D/e"
{code}

This command will deadlock if there are a lot of entires in the iptables 
because the `sed` won't process the next line while executing `iptables -w -t 
nat -D ...`. But the executing of `iptables -w -t nat -D ... ` might get stuck 
if the first command `iptables -w -t nat -S %s` didn't finish, because the lock 
is not released. The first command might not finish if it has a lot of output, 
filling the pipe that `sed` hasn't had a chance to process yet.

The correct solution is to write the command to execute in a file, and execute 
the file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to