CPU utilization with kvm / vhost, differences 3.14 / 4.4 / 4.6
Hi, I'm stumped by a weird development in measured CPU utilization when testing an upgrade path from 3.14.70 to 4.4.14. I'm running, on identical hardware (2 4-core Xeon E5420), a HA (active/standby) pair of firewall/loadbalancer VMs. The OS on the host and the VM is identical - openSUSE 13.1 userlevel, qemu 1.6.2 KVM, and kernels self- built from vanilla sources. Inside the VM I make pretty heavy use of ipset, iptables, and ipvs. Traffic level is around 100 mbit/s, mostly ordinary web traffic, translating to around 10 kpps. For the last X months I have been running this on 3.14.x kernels, currently 3.14.70. As that's nearing its end of support, I aim for an upgrade to 4.4.x, testing with 4.4.14. For testing, I keep the kernel _within_ the VM stable - i.e. 3.14.70, and upgrade only the host kernel of one of the two machines, to 4.4.14, and due to the weirdness I'll describe next, to 4.6.4. What I see, and what is totally unexpected, is a severe variation in the system and irq time measured on the host system, and less so inside the VM. The 3.14.70 running host shows 0.6 cores system and 0.4 cores IRQ time. The 4.4.14 running host shows 2.3 cores system and 0.4 cores IRQ time. The same host on 4.6.4, is again back at 0.6 cores system and 0.4 cores IRQ, while the guest (showing as user outside) is down from the 1 core on the previous to kernels, to about 0.6 cores (which I wouldn't complain about) But my desired target kernel, 4.4.14, clearly uses about 1 1/2 cores more on the same load... (all other indicators and measurements I have show that the load served is pretty much stable over the situations I tested). Some details on the networking setup (invariant over the tested kernels): * host bonds 4 NICs, half on on-board BNX2 BCM5708, other half on PCIe card intel 82571EB hardware. The bond mode is LACP. * host lacp bond is then member of an ordinary software bridge interface, which then also has the tap interface to the VM added. There is vlan filtering active on the bridge. * two bridge vlans are separately broken out and member of a second layer bridge with an extra tap interface to my VM. Don't ask why :) but one of these carries about half of the traffic * within the VM, I have another bridge with the VLANs on top and macvlan sprinkled in (keepalived VRRP setup on several legs) * host/vm network is virtio, of course * I had to disable (already some time ago, identical in all tests described here) TSO / GSO / UFO on the tap interfaces to my VM, to alleviate severe performance regressions. Different story, mentioning it just for completeness. Regarding the host hardware, I actually have a third system, software identical, but with some more cores and purely on BNX2 BCM5719. The 4.4.14- needs-lots-more-systemtime symptoms were practically the same there. To end this tale, let me note that I have NO operational problems with the test using the 4.4.14 kernel, as far as one can know that within some hours of testing. All production metrics (and I have lots of them) are fine - except for that system time usage on the host system... Anybody got a clue what may be happening? I'm a bit reluctant to jump to 4.6.x or newer kernels, as I like the concept of long term stable kernels somehow... :) best regards Patrick
CPU utilization with kvm / vhost, differences 3.14 / 4.4 / 4.6
Hi, I'm stumped by a weird development in measured CPU utilization when testing an upgrade path from 3.14.70 to 4.4.14. I'm running, on identical hardware (2 4-core Xeon E5420), a HA (active/standby) pair of firewall/loadbalancer VMs. The OS on the host and the VM is identical - openSUSE 13.1 userlevel, qemu 1.6.2 KVM, and kernels self- built from vanilla sources. Inside the VM I make pretty heavy use of ipset, iptables, and ipvs. Traffic level is around 100 mbit/s, mostly ordinary web traffic, translating to around 10 kpps. For the last X months I have been running this on 3.14.x kernels, currently 3.14.70. As that's nearing its end of support, I aim for an upgrade to 4.4.x, testing with 4.4.14. For testing, I keep the kernel _within_ the VM stable - i.e. 3.14.70, and upgrade only the host kernel of one of the two machines, to 4.4.14, and due to the weirdness I'll describe next, to 4.6.4. What I see, and what is totally unexpected, is a severe variation in the system and irq time measured on the host system, and less so inside the VM. The 3.14.70 running host shows 0.6 cores system and 0.4 cores IRQ time. The 4.4.14 running host shows 2.3 cores system and 0.4 cores IRQ time. The same host on 4.6.4, is again back at 0.6 cores system and 0.4 cores IRQ, while the guest (showing as user outside) is down from the 1 core on the previous to kernels, to about 0.6 cores (which I wouldn't complain about) But my desired target kernel, 4.4.14, clearly uses about 1 1/2 cores more on the same load... (all other indicators and measurements I have show that the load served is pretty much stable over the situations I tested). Some details on the networking setup (invariant over the tested kernels): * host bonds 4 NICs, half on on-board BNX2 BCM5708, other half on PCIe card intel 82571EB hardware. The bond mode is LACP. * host lacp bond is then member of an ordinary software bridge interface, which then also has the tap interface to the VM added. There is vlan filtering active on the bridge. * two bridge vlans are separately broken out and member of a second layer bridge with an extra tap interface to my VM. Don't ask why :) but one of these carries about half of the traffic * within the VM, I have another bridge with the VLANs on top and macvlan sprinkled in (keepalived VRRP setup on several legs) * host/vm network is virtio, of course * I had to disable (already some time ago, identical in all tests described here) TSO / GSO / UFO on the tap interfaces to my VM, to alleviate severe performance regressions. Different story, mentioning it just for completeness. Regarding the host hardware, I actually have a third system, software identical, but with some more cores and purely on BNX2 BCM5719. The 4.4.14- needs-lots-more-systemtime symptoms were practically the same there. To end this tale, let me note that I have NO operational problems with the test using the 4.4.14 kernel, as far as one can know that within some hours of testing. All production metrics (and I have lots of them) are fine - except for that system time usage on the host system... Anybody got a clue what may be happening? I'm a bit reluctant to jump to 4.6.x or newer kernels, as I like the concept of long term stable kernels somehow... :) best regards Patrick
Re: Kernel 4.1 hang, apparently in __inet_lookup_established
On Sunday 15 November 2015 16:58:33 Grant Zhang wrote: > > Have you tried the two patches Eric mentioned? One of my 4.1.11 server > just hanged with very similar stack trace and I am wondering whether the > aforementioned patches would help. Sorry, Grant - I'm sticking to 3.14.xx for now. best regards Patrick -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 4.1 hang, apparently in __inet_lookup_established
On Sunday 15 November 2015 16:58:33 Grant Zhang wrote: > > Have you tried the two patches Eric mentioned? One of my 4.1.11 server > just hanged with very similar stack trace and I am wondering whether the > aforementioned patches would help. Sorry, Grant - I'm sticking to 3.14.xx for now. best regards Patrick -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Kernel 4.1 hang, apparently in __inet_lookup_established
Dear kernel developers, I recently started to upgrade my production hosts and VMs from the 3.14 series to 4.1 kernels, starting with 4.1.6. Yesterday, for the second time after I started these upgrades, I experienced one of our webserver VMs hanging. The first time this happened, the VM hung completely, all 5 virtual cores spinning at 100%, ping still worked, but nothing else, including no virsh console reaction - I had to destroy and restart that VM. No messages were to be found. Yesterday, when it happened the second time, I found the VM spinning on a single core only, and could still connect to it via ssh - but it stopped accepting apache connections. The core it spun on showed 100% time used in "si", with top, and it produced the messages appended below. The VM did not shutdown properly when told to, and had to be destroyed again. If I read that dmesg output correctly it spins in __inet_lookup_established, which indeed reads like it has infinite spin potential. But that code itself did not change relative to the 3.14 series we've been running for a long time without the issues - so the root cause would be something else. For our production systems I'll revert to the 3.14 series, but maybe this report may help somebody understand what's going on. best regards Patrick dmesg of the hang: [449302.540017] INFO: rcu_sched self-detected stall on CPU { 4} (t=6000 jiffies g=22900108 c=22900107 q=22617) [449302.540017] Task dump for CPU 4: [449302.540017] swapper/4 R running task0 0 1 0x0008 [449302.540017] 81831140 88081f403950 810ead0e 0004 [449302.540017] 81831140 88081f403970 810ed288 0083 [449302.540017] 0005 88081f4039a0 81105cc0 88081f414d00 [449302.540017] Call Trace: [449302.540017][] sched_show_task+0xae/0x120 [449302.540017] [] dump_cpu_task+0x38/0x40 [449302.540017] [] rcu_dump_cpu_stacks+0x90/0xd0 [449302.540017] [] rcu_check_callbacks+0x3eb/0x6e0 [449302.540017] [] ? account_process_tick+0x5c/0x180 [449302.540017] [] ? tick_sched_handle.isra.18+0x40/0x40 [449302.540017] [] update_process_times+0x34/0x60 [449302.540017] [] tick_sched_handle.isra.18+0x31/0x40 [449302.540017] [] tick_sched_timer+0x3c/0x70 [449302.540017] [] __run_hrtimer.isra.34+0x4a/0xf0 [449302.540017] [] hrtimer_interrupt+0xcd/0x1f0 [449302.540017] [] local_apic_timer_interrupt+0x34/0x60 [449302.540017] [] smp_apic_timer_interrupt+0x3c/0x60 [449302.540017] [] apic_timer_interrupt+0x6b/0x70 [449302.540017] [] ? __inet_lookup_established+0x68/0x130 [449302.540017] [] ? __inet_lookup_established+0x41/0x130 [449302.540017] [] tcp_v4_early_demux+0x96/0x150 [449302.540017] [] ip_rcv_finish+0xb8/0x360 [449302.540017] [] ip_rcv+0x294/0x3f0 [449302.540017] [] ? ip_local_deliver_finish+0x140/0x140 [449302.540017] [] __netif_receive_skb_core+0x52b/0x760 [449302.540017] [] __netif_receive_skb+0x13/0x60 [449302.540017] [] netif_receive_skb_internal+0x1e/0x90 [449302.540017] [] netif_receive_skb_sk+0xc/0x10 [449302.540017] [] virtnet_receive+0x221/0x7a0 [449302.540017] [] virtnet_poll+0x1c/0x80 [449302.540017] [] net_rx_action+0xea/0x2b0 [449302.540017] [] __do_softirq+0xda/0x1f0 [449302.540017] [] irq_exit+0x9d/0xb0 [449302.540017] [] do_IRQ+0x55/0xf0 [449302.540017] [] common_interrupt+0x6b/0x6b [449302.540017][] ? sched_clock_cpu+0x98/0xc0 [449302.540017] [] ? native_safe_halt+0x6/0x10 [449302.540017] [] default_idle+0x9/0x10 [449302.540017] [] arch_cpu_idle+0xa/0x10 [449302.540017] [] cpu_startup_entry+0x258/0x310 [449302.540017] [] start_secondary+0x123/0x130 [449482.570137] INFO: rcu_sched self-detected stall on CPU { 4} (t=24004 jiffies g=22900108 c=22900107 q=97787) [449482.570148] Task dump for CPU 4: [449482.570151] swapper/4 R running task0 0 1 0x0008 [449482.570156] 81831140 88081f403950 810ead0e 0004 [449482.570165] 81831140 88081f403970 810ed288 0083 [449482.570167] 0005 88081f4039a0 81105cc0 88081f414d00 [449482.570169] Call Trace: [449482.570171][] sched_show_task+0xae/0x120 [449482.570183] [] dump_cpu_task+0x38/0x40 [449482.570188] [] rcu_dump_cpu_stacks+0x90/0xd0 [449482.570191] [] rcu_check_callbacks+0x3eb/0x6e0 [449482.570194] [] ? account_process_tick+0x5c/0x180 [449482.570199] [] ? tick_sched_handle.isra.18+0x40/0x40 [449482.570202] [] update_process_times+0x34/0x60 [449482.570203] [] tick_sched_handle.isra.18+0x31/0x40 [449482.570205] [] tick_sched_timer+0x3c/0x70 [449482.570207] [] __run_hrtimer.isra.34+0x4a/0xf0 [449482.570209] [] hrtimer_interrupt+0xcd/0x1f0 [449482.570220] [] local_apic_timer_interrupt+0x34/0x60 [449482.570222] [] smp_apic_timer_interrupt+0x3c/0x60 [449482.570226] [] apic_timer_interrupt+0x6b/0x70 [449482.570230] [] ? __inet_lookup_established+0x60/0x130
Kernel 4.1 hang, apparently in __inet_lookup_established
Dear kernel developers, I recently started to upgrade my production hosts and VMs from the 3.14 series to 4.1 kernels, starting with 4.1.6. Yesterday, for the second time after I started these upgrades, I experienced one of our webserver VMs hanging. The first time this happened, the VM hung completely, all 5 virtual cores spinning at 100%, ping still worked, but nothing else, including no virsh console reaction - I had to destroy and restart that VM. No messages were to be found. Yesterday, when it happened the second time, I found the VM spinning on a single core only, and could still connect to it via ssh - but it stopped accepting apache connections. The core it spun on showed 100% time used in "si", with top, and it produced the messages appended below. The VM did not shutdown properly when told to, and had to be destroyed again. If I read that dmesg output correctly it spins in __inet_lookup_established, which indeed reads like it has infinite spin potential. But that code itself did not change relative to the 3.14 series we've been running for a long time without the issues - so the root cause would be something else. For our production systems I'll revert to the 3.14 series, but maybe this report may help somebody understand what's going on. best regards Patrick dmesg of the hang: [449302.540017] INFO: rcu_sched self-detected stall on CPU { 4} (t=6000 jiffies g=22900108 c=22900107 q=22617) [449302.540017] Task dump for CPU 4: [449302.540017] swapper/4 R running task0 0 1 0x0008 [449302.540017] 81831140 88081f403950 810ead0e 0004 [449302.540017] 81831140 88081f403970 810ed288 0083 [449302.540017] 0005 88081f4039a0 81105cc0 88081f414d00 [449302.540017] Call Trace: [449302.540017][] sched_show_task+0xae/0x120 [449302.540017] [] dump_cpu_task+0x38/0x40 [449302.540017] [] rcu_dump_cpu_stacks+0x90/0xd0 [449302.540017] [] rcu_check_callbacks+0x3eb/0x6e0 [449302.540017] [] ? account_process_tick+0x5c/0x180 [449302.540017] [] ? tick_sched_handle.isra.18+0x40/0x40 [449302.540017] [] update_process_times+0x34/0x60 [449302.540017] [] tick_sched_handle.isra.18+0x31/0x40 [449302.540017] [] tick_sched_timer+0x3c/0x70 [449302.540017] [] __run_hrtimer.isra.34+0x4a/0xf0 [449302.540017] [] hrtimer_interrupt+0xcd/0x1f0 [449302.540017] [] local_apic_timer_interrupt+0x34/0x60 [449302.540017] [] smp_apic_timer_interrupt+0x3c/0x60 [449302.540017] [] apic_timer_interrupt+0x6b/0x70 [449302.540017] [] ? __inet_lookup_established+0x68/0x130 [449302.540017] [] ? __inet_lookup_established+0x41/0x130 [449302.540017] [] tcp_v4_early_demux+0x96/0x150 [449302.540017] [] ip_rcv_finish+0xb8/0x360 [449302.540017] [] ip_rcv+0x294/0x3f0 [449302.540017] [] ? ip_local_deliver_finish+0x140/0x140 [449302.540017] [] __netif_receive_skb_core+0x52b/0x760 [449302.540017] [] __netif_receive_skb+0x13/0x60 [449302.540017] [] netif_receive_skb_internal+0x1e/0x90 [449302.540017] [] netif_receive_skb_sk+0xc/0x10 [449302.540017] [] virtnet_receive+0x221/0x7a0 [449302.540017] [] virtnet_poll+0x1c/0x80 [449302.540017] [] net_rx_action+0xea/0x2b0 [449302.540017] [] __do_softirq+0xda/0x1f0 [449302.540017] [] irq_exit+0x9d/0xb0 [449302.540017] [] do_IRQ+0x55/0xf0 [449302.540017] [] common_interrupt+0x6b/0x6b [449302.540017][] ? sched_clock_cpu+0x98/0xc0 [449302.540017] [] ? native_safe_halt+0x6/0x10 [449302.540017] [] default_idle+0x9/0x10 [449302.540017] [] arch_cpu_idle+0xa/0x10 [449302.540017] [] cpu_startup_entry+0x258/0x310 [449302.540017] [] start_secondary+0x123/0x130 [449482.570137] INFO: rcu_sched self-detected stall on CPU { 4} (t=24004 jiffies g=22900108 c=22900107 q=97787) [449482.570148] Task dump for CPU 4: [449482.570151] swapper/4 R running task0 0 1 0x0008 [449482.570156] 81831140 88081f403950 810ead0e 0004 [449482.570165] 81831140 88081f403970 810ed288 0083 [449482.570167] 0005 88081f4039a0 81105cc0 88081f414d00 [449482.570169] Call Trace: [449482.570171][] sched_show_task+0xae/0x120 [449482.570183] [] dump_cpu_task+0x38/0x40 [449482.570188] [] rcu_dump_cpu_stacks+0x90/0xd0 [449482.570191] [] rcu_check_callbacks+0x3eb/0x6e0 [449482.570194] [] ? account_process_tick+0x5c/0x180 [449482.570199] [] ? tick_sched_handle.isra.18+0x40/0x40 [449482.570202] [] update_process_times+0x34/0x60 [449482.570203] [] tick_sched_handle.isra.18+0x31/0x40 [449482.570205] [] tick_sched_timer+0x3c/0x70 [449482.570207] [] __run_hrtimer.isra.34+0x4a/0xf0 [449482.570209] [] hrtimer_interrupt+0xcd/0x1f0 [449482.570220] [] local_apic_timer_interrupt+0x34/0x60 [449482.570222] [] smp_apic_timer_interrupt+0x3c/0x60 [449482.570226] [] apic_timer_interrupt+0x6b/0x70 [449482.570230] [] ? __inet_lookup_established+0x60/0x130
Re: [PATCH 2/3] x_tables: Use also dev->ifalias for interface matching
On Monday 12 January 2015 17:22:57 Patrick McHardy wrote: > On 12.01, Patrick Schaaf wrote: > > > > Interfaces come and go through many different actions. There's the admin > > downing and upping stuff like bridges or bonds. There's stuff like libvirt > > / KVM / qemu creating and destroying interfaces. In all these cases, in > > my practise, I give the interfaces useful names to that I can > > prefix-match them in iptables rules. > > > > Dynamically modifying the ruleset for each such creation and destruction, > > would be a huge burden. The base ruleset would need suitable "hooks" where > > these rules were inserted (ordering matters!). The addition would hardly > > be > > atomic (with traditional iptables, unless done by generating a whole new > > ruleset and restoring). The programs (e.g. libvirt) would need to be able > > to call out to these specially crafted rule generator scripts. The admin > > would need to add them as pre/post actions to their static (manual) > > interface configuration. Loading and looking at the ruleset before > > bringing up the interface would be impossible. > > devgroups seem like the best solution for this. Could be, technically. Is there devgroup support in libvirt, ifcfg, whatever other distros use for their static interface configuration? Or, do I again have to write pre/post scripts to set devgroups? Wouldn't bother me too much nowadays, I've automated that for ifcfg style stuff in my production environment a year ago, but it's something an admin must actively manage... There is other stuff, apart from libvirt, that creates and destroys interfaces on the fly. From my production environment, there's at least keepalived, which creates macvlan interfaces on the fly for VRRP VMAC support. I can configure the name for that, but nothing else, nor can I call a script pre/post for that. And my iptables rules on that boxen _do_ match specially on these interfaces. Gooling a bit around does not immediately turn up any good documentation on it at all (four year old iproute2 commits, once I give that as a search term too?). Looks very sketchy (although the fundamental idea is clear to me. I'm looking through the normal admin practise lens) best regards Patrick -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] x_tables: Use also dev->ifalias for interface matching
On Monday 12 January 2015 08:51:54 Eric Dumazet wrote: > On Mon, 2015-01-12 at 17:39 +0100, Patrick Schaaf wrote: > > > > Not to comment on the ifalias thing, which I think is unneccessary, > > too, but matching on interface names instead of only ifindex, is > > definitely needed, so that one can establish a full ruleset before > > interfaces even exist. That's good practise at boottime, but also > > needed for dynamic interface creation during runtime. > > Please do not send html messages : Your reply did not reach the lists. Sigh. Sorry... > Then, all you mention could have been solved by proper userspace > support. > > Every time you add an interface or change device name, you could change > firewalls rules if needed. Nothing shocking here. That is totally impractical, IMO. Interfaces come and go through many different actions. There's the admin downing and upping stuff like bridges or bonds. There's stuff like libvirt / KVM / qemu creating and destroying interfaces. In all these cases, in my practise, I give the interfaces useful names to that I can prefix-match them in iptables rules. Dynamically modifying the ruleset for each such creation and destruction, would be a huge burden. The base ruleset would need suitable "hooks" where these rules were inserted (ordering matters!). The addition would hardly be atomic (with traditional iptables, unless done by generating a whole new ruleset and restoring). The programs (e.g. libvirt) would need to be able to call out to these specially crafted rule generator scripts. The admin would need to add them as pre/post actions to their static (manual) interface configuration. Loading and looking at the ruleset before bringing up the interface would be impossible. Note that I do fully agree that it's sad that iptables rules waste all that memory for each and every rule! I remember musing about improving that in talks with Harald Welte back in the 90ies. A simple match would be perfectly fine for me. Only having ifindex support, isn't. best regards Patrick -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] x_tables: Use also dev-ifalias for interface matching
On Monday 12 January 2015 17:22:57 Patrick McHardy wrote: On 12.01, Patrick Schaaf wrote: Interfaces come and go through many different actions. There's the admin downing and upping stuff like bridges or bonds. There's stuff like libvirt / KVM / qemu creating and destroying interfaces. In all these cases, in my practise, I give the interfaces useful names to that I can prefix-match them in iptables rules. Dynamically modifying the ruleset for each such creation and destruction, would be a huge burden. The base ruleset would need suitable hooks where these rules were inserted (ordering matters!). The addition would hardly be atomic (with traditional iptables, unless done by generating a whole new ruleset and restoring). The programs (e.g. libvirt) would need to be able to call out to these specially crafted rule generator scripts. The admin would need to add them as pre/post actions to their static (manual) interface configuration. Loading and looking at the ruleset before bringing up the interface would be impossible. devgroups seem like the best solution for this. Could be, technically. Is there devgroup support in libvirt, ifcfg, whatever other distros use for their static interface configuration? Or, do I again have to write pre/post scripts to set devgroups? Wouldn't bother me too much nowadays, I've automated that for ifcfg style stuff in my production environment a year ago, but it's something an admin must actively manage... There is other stuff, apart from libvirt, that creates and destroys interfaces on the fly. From my production environment, there's at least keepalived, which creates macvlan interfaces on the fly for VRRP VMAC support. I can configure the name for that, but nothing else, nor can I call a script pre/post for that. And my iptables rules on that boxen _do_ match specially on these interfaces. Gooling a bit around does not immediately turn up any good documentation on it at all (four year old iproute2 commits, once I give that as a search term too?). Looks very sketchy (although the fundamental idea is clear to me. I'm looking through the normal admin practise lens) best regards Patrick -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] x_tables: Use also dev-ifalias for interface matching
On Monday 12 January 2015 08:51:54 Eric Dumazet wrote: On Mon, 2015-01-12 at 17:39 +0100, Patrick Schaaf wrote: Not to comment on the ifalias thing, which I think is unneccessary, too, but matching on interface names instead of only ifindex, is definitely needed, so that one can establish a full ruleset before interfaces even exist. That's good practise at boottime, but also needed for dynamic interface creation during runtime. Please do not send html messages : Your reply did not reach the lists. Sigh. Sorry... Then, all you mention could have been solved by proper userspace support. Every time you add an interface or change device name, you could change firewalls rules if needed. Nothing shocking here. That is totally impractical, IMO. Interfaces come and go through many different actions. There's the admin downing and upping stuff like bridges or bonds. There's stuff like libvirt / KVM / qemu creating and destroying interfaces. In all these cases, in my practise, I give the interfaces useful names to that I can prefix-match them in iptables rules. Dynamically modifying the ruleset for each such creation and destruction, would be a huge burden. The base ruleset would need suitable hooks where these rules were inserted (ordering matters!). The addition would hardly be atomic (with traditional iptables, unless done by generating a whole new ruleset and restoring). The programs (e.g. libvirt) would need to be able to call out to these specially crafted rule generator scripts. The admin would need to add them as pre/post actions to their static (manual) interface configuration. Loading and looking at the ruleset before bringing up the interface would be impossible. Note that I do fully agree that it's sad that iptables rules waste all that memory for each and every rule! I remember musing about improving that in talks with Harald Welte back in the 90ies. A simple match would be perfectly fine for me. Only having ifindex support, isn't. best regards Patrick -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/