On 04/03/2022 15:36, Johan Hendriks wrote:
Hello all, i use jails for some testing, but i can not seem to make it stable. I use vnet jails with a bridge but when i put some load on it, some jails loose there network connectivity.

My setup is as follows, haproxy internal IP 10.233.185.20 using binat to make it Public accessable.
Then a varnish jail, and two web servers al on the 10.233.185.x range.

If i give it a little load with hey (hey -h2 -n 10 -c 20 -z 60s https://wp.test.nl) than within the test the haproxy jail is not reachable anymore it is not pingable from the host machine, and from the other jails. restarting the jails solves it, if i leave the system alone for some time i saw the varnish jail become unresponsive.

If i do a tcpdump on the epair${name}a interface i do see the packages from the host machine to the jail but the jail itself is not reachable.

There is nothing in the logs from the host and the jail itself, i can ping the jails ip adres from the jail itself.


I do not think i have a special setup, but i could be doing something wrong.
my jail.conf

# Global settings applied to all jails.
$domain = "test.nl";
$subdomain = "";

exec.start = "/bin/sh /etc/rc";
exec.stop = "/bin/sh /etc/rc.shutdown";
exec.clean;

mount.fstab = "/storage/jails/$name.fstab";

exec.system_user  = "root";
exec.jail_user    = "root";
mount.devfs;
sysvshm="new";
sysvsem="new";
allow.raw_sockets;
allow.set_hostname = 0;
allow.sysvipc;
enforce_statfs = "2";
devfs_ruleset     = "11";

path = "/storage/jails/${name}";
host.hostname = "${name}${subdomain}.${domain}";

# Networking
$uplinkdev        = "vtnet1";
$epid             = "${ip}";
$subnet           = "10.233.185.";
$cidr             = "/24";
$ipv4_addr        = "${subnet}${ip}${cidr}";
vnet;
vnet.interface    = "vnet0";

$epair=epair${ip};
vnet;
#vnet.interface    = "${epair}b";  # default vnet interface
exec.prestart     = "ifconfig bridge0 > /dev/null 2>&1 || ( ifconfig bridge0 create up && ifconfig bridge0 addm $uplinkdev )"; exec.prestart    += "ifconfig ${epair} create up description jail_${name}   || echo 'Skipped creating epair (exists?)'"; exec.prestart    += "ifconfig bridge0 addm ${epair}a           || echo 'Skipped adding bridge member (already member?)'";
exec.created      = "ifconfig ${epair}b name vnet0";
exec.start        = "/bin/sh /etc/rc";
exec.consolelog   = "/var/log/jail/$name.test.nl";
exec.stop         = "/bin/sh /etc/rc.shutdown";
exec.poststop     = "ifconfig bridge0 deletem ${epair}a";
exec.poststop    += "ifconfig ${epair}a destroy";

varnish01 {
    $ip = 16;
    mount.fstab = "";
    path = "/storage/jails/${name}";
}

web01 {
    $ip = 18;
}

web02 {
    $ip = 19;
}

haproxy {
    $ip = 20;
    mount.fstab = "";
    path = "/storage/jails/${name}";
}

My ifconfig

bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    ether 58:9c:fc:10:ff:82
    inet 10.233.185.1 netmask 0xffffff00 broadcast 10.233.185.255
    id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
    maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
    root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
    member: epair20a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
            ifmaxaddr 0 port 13 priority 128 path cost 2000
    member: epair19a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
            ifmaxaddr 0 port 53 priority 128 path cost 2000
    member: epair18a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
            ifmaxaddr 0 port 48 priority 128 path cost 2000
    member: epair16a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
            ifmaxaddr 0 port 28 priority 128 path cost 2000
    groups: bridge
    nd6 options=9<PERFORMNUD,IFDISABLED>
epair16a: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
    description: jail_varnish01
    options=8<VLAN_MTU>
    ether 02:76:32:8e:0e:0a
    groups: epair
    media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
epair18a: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
    description: jail_web01
    options=8<VLAN_MTU>
    ether 02:6d:be:b8:36:0a
    groups: epair
    media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
epair19a: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
    description: jail_web02
    options=8<VLAN_MTU>
    ether 02:54:fd:77:9a:0a
    groups: epair
    media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
epair20a: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
    description: jail_haproxy
    options=8<VLAN_MTU>
    ether 02:f8:58:06:78:0a
    groups: epair
    media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

This is on both 13-STABLE and 14-HEAD.


For the sake of testing i tried it with FreeBSD 13.0-RELEASE-p7 and this works fine. This is an exact copy of the setup i use on 14-CURRENT and 13-STABLE. (i did a ZFS send and receive of the jails and a copy of the jail.conf. pf.conf and so on) I did run the hey command targeting the 13-0-RELEASE multiple times.

hey -h2 -n 10 -c 30 -z 300s https://wp.test.nl

Summary:
  Total:    300.0045 secs
  Slowest:    0.1137 secs
  Fastest:    0.0006 secs
  Average:    0.0090 secs
  Requests/sec:    4627.4504


Response time histogram:
  0.001 [1]    |
  0.012 [977291]    |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.023 [21236]    |■
  0.035 [1125]    |
  0.046 [230]    |
  0.057 [12]    |
  0.068 [18]    |
  0.080 [9]    |
  0.091 [18]    |
  0.102 [30]    |
  0.114 [30]    |


Latency distribution:
  10% in 0.0037 secs
  25% in 0.0046 secs
  50% in 0.0061 secs
  75% in 0.0080 secs
  90% in 0.0096 secs
  95% in 0.0106 secs
  99% in 0.0133 secs

Details (average, fastest, slowest):
  DNS+dialup:    0.0000 secs, 0.0006 secs, 0.1137 secs
  DNS-lookup:    0.0000 secs, 0.0000 secs, 0.0028 secs
  req write:    0.0001 secs, 0.0000 secs, 0.1126 secs
  resp wait:    0.0192 secs, 0.0000 secs, 214.9645 secs
  resp read:    0.0018 secs, 0.0002 secs, 0.1076 secs

Status code distribution:
  [200]    1000000 responses


All is fine on the 13.0-RELEASE-p7 also with a higher concurrency, however if i do it against the 14-CURRENT or the 13-STABLE, even a run of 60 seconds kills the network connectivity of the jail. (haproxy in my case)

regards,
Johan

Reply via email to