Hi, everyone,

one of the systems on which we run our jail based "proServer" product failed
in a very odd way for the second time with a couple of days between the two
incidents.

We run VIMAGE based jails (a lot) and bridge them with the physical interface
of the machine.

---------
cloned_interfaces="bridge0 bridge1"

ifconfig_bridge0_name="inet0"
ifconfig_inet0="addm ix0 up"
ifconfig_inet0_alias0="inet 217.29.41.2/24"
ifconfig_inet0_ipv6="inet6 2a00:b580:8000:11:44e8:ab80:816:7869/64 
auto_linklocal"

ifconfig_bridge1_name="mgmt0"
ifconfig_mgmt0="addm ix1 up"
ifconfig_mgmt0_alias0="inet 10.5.105.7/16"
ifconfig_mgmt0_ipv6="inet6 auto_linklocal"
---------

The rest is managed by iocage wich creates the needed epair(4) interfaces,
for some reason renames them to "vnetX" and adds them as members to
the bridge.

E.g.
---------
[ry93@ph002 ~]$ ifconfig inet0
inet0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether 02:50:51:fe:cc:00
        inet6 fe80::50:51ff:fefe:cc00%inet0 prefixlen 64 scopeid 0x4
        inet6 2a00:b580:8000:11:44e8:ab80:816:7869 prefixlen 64
        inet 217.29.41.2 netmask 0xffffff00 broadcast 217.29.41.255
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        groups: bridge
        id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
        maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
        root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
        member: vnet0:69 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 76 priority 128 path cost 2000
[... 50 vnet interfaces following ...]
        member: ix0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 1 priority 128 path cost 2000
---------

When the system fails

- no jail is reachable from the outside via IP
- no jail is reachable from the host via IP
- the host itself is reachable just fine
- when we `iocage console` into a jail it can reach it's own IP addresses but 
nothing "outside"


I tried

- ifconfig ix0 down; ifconfig ix0 up
- ifconfig inet0 down; ifconfig inet0 up # aka bridge0
- iocage stop <jail>; iocage start <jail>

The latter deletes the epair instance connected to the jail and creates a fresh 
one,
then adds it to the bridge. No change in connectivity ... the start of the jail 
takes
"forever" because various processes hang waiting DNS timeouts (no networking ;-)

There's nothing in /var/log/messages or the dmesg buffer that relates to 
networking!
Rebooting the host system "fixes" the situation.


Now I'm well aware that this is too little information to draw some definite 
conclusions.
Hence my first question is: what should I do (commands) when the situation 
arises again
to gather more evidence?

Or maybe we are just lucky and there is a known problem? Yes, I know VIMAGE is 
still
considered experimental. We have been running this in production for months and 
it
looks like it could be related to upgrading host and jails from 10.3 to 11.0 
*or* switching
the old shell based iocage for Brandon's new python based version.
I cannot rule out iocage, yet it's not very probable - this is not a Docker 
like running service
or network component, after all. Once the jails are up, iocage is done ...

An then there's the chance that it is something with the ix driver and the way 
we use the
interface ... so for completeness:
---------
ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> port 
0x6020-0x603f mem 0xc7c00000-0xc7dfffff,0xc7e04000-0xc7e07fff irq 26 at device 
0.0 numa-domain 0 on pci3
ix0: Using MSIX interrupts with 9 vectors
ix0: Ethernet address: 0c:c4:7a:34:ec:ba
ix0: PCI Express Bus: Speed 5.0GT/s Width x8
ix0: netmap queues/slots: TX 8/2048, RX 8/2048
ix0: promiscuous mode enabled
ix0: link state changed to UP
---------


As usual thanks for any hints,
Patrick

Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to