CPU utilization with kvm / vhost, differences 3.14 / 4.4 / 4.6

2016-07-27 Thread Patrick Schaaf
Hi,

I'm stumped by a weird development in measured CPU utilization when testing an 
upgrade path from 3.14.70 to 4.4.14.

I'm running, on identical hardware (2 4-core Xeon E5420), a HA 
(active/standby) pair of firewall/loadbalancer VMs. The OS on the host and the 
VM is identical - openSUSE 13.1 userlevel, qemu 1.6.2 KVM, and kernels self-
built from vanilla sources. Inside the VM I make pretty heavy use of ipset, 
iptables, and ipvs. Traffic level is around 100 mbit/s, mostly ordinary web 
traffic, translating to around 10 kpps.

For the last X months I have been running this on 3.14.x kernels, currently 
3.14.70. As that's nearing its end of support, I aim for an upgrade to 4.4.x, 
testing with 4.4.14.

For testing, I keep the kernel _within_ the VM stable - i.e. 3.14.70, and 
upgrade only the host kernel of one of the two machines, to 4.4.14, and due to 
the weirdness I'll describe next, to 4.6.4.

What I see, and what is totally unexpected, is a severe variation in the 
system and irq time measured on the host system, and less so inside the VM.

The 3.14.70 running host shows 0.6 cores system and 0.4 cores IRQ time.

The 4.4.14 running host shows 2.3 cores system and 0.4 cores IRQ time.

The same host on 4.6.4, is again back at 0.6 cores system and 0.4 cores IRQ, 
while the guest (showing as user outside) is down from the 1 core on the 
previous to kernels, to about 0.6 cores (which I wouldn't complain about)

But my desired target kernel, 4.4.14, clearly uses about 1 1/2 cores more on 
the same load... (all other indicators and measurements I have show that the 
load served is pretty much stable over the situations I tested).

Some details on the networking setup (invariant over the tested kernels):
* host bonds 4 NICs, half on on-board BNX2 BCM5708, other half on PCIe card 
intel 82571EB hardware. The bond mode is LACP.
* host lacp bond is then member of an ordinary software bridge interface,
  which then also has the tap interface to the VM added. There is vlan
  filtering active on the bridge.
* two bridge vlans are separately broken out and member of a second layer 
bridge with an extra tap interface to my VM. Don't ask why :) but one of these 
carries about half of the traffic
* within the VM, I have another bridge with the VLANs on top and macvlan 
sprinkled in (keepalived VRRP setup on several legs)
* host/vm network is virtio, of course
* I had to disable (already some time ago, identical in all tests described 
here) TSO / GSO / UFO on the tap interfaces to my VM, to alleviate severe 
performance regressions. Different story, mentioning it just for completeness.

Regarding the host hardware, I actually have a third system, software 
identical, but with some more cores and purely on BNX2 BCM5719. The 4.4.14-
needs-lots-more-systemtime symptoms were practically the same there.

To end this tale, let me note that I have NO operational problems with the 
test using the 4.4.14 kernel, as far as one can know that within some hours of 
testing. All production metrics (and I have lots of them) are fine - except 
for that system time usage on the host system...

Anybody got a clue what may be happening?

I'm a bit reluctant to jump to 4.6.x or newer kernels, as I like the concept 
of long term stable kernels somehow... :)


best regards
  Patrick




CPU utilization with kvm / vhost, differences 3.14 / 4.4 / 4.6

2016-07-27 Thread Patrick Schaaf
Hi,

I'm stumped by a weird development in measured CPU utilization when testing an 
upgrade path from 3.14.70 to 4.4.14.

I'm running, on identical hardware (2 4-core Xeon E5420), a HA 
(active/standby) pair of firewall/loadbalancer VMs. The OS on the host and the 
VM is identical - openSUSE 13.1 userlevel, qemu 1.6.2 KVM, and kernels self-
built from vanilla sources. Inside the VM I make pretty heavy use of ipset, 
iptables, and ipvs. Traffic level is around 100 mbit/s, mostly ordinary web 
traffic, translating to around 10 kpps.

For the last X months I have been running this on 3.14.x kernels, currently 
3.14.70. As that's nearing its end of support, I aim for an upgrade to 4.4.x, 
testing with 4.4.14.

For testing, I keep the kernel _within_ the VM stable - i.e. 3.14.70, and 
upgrade only the host kernel of one of the two machines, to 4.4.14, and due to 
the weirdness I'll describe next, to 4.6.4.

What I see, and what is totally unexpected, is a severe variation in the 
system and irq time measured on the host system, and less so inside the VM.

The 3.14.70 running host shows 0.6 cores system and 0.4 cores IRQ time.

The 4.4.14 running host shows 2.3 cores system and 0.4 cores IRQ time.

The same host on 4.6.4, is again back at 0.6 cores system and 0.4 cores IRQ, 
while the guest (showing as user outside) is down from the 1 core on the 
previous to kernels, to about 0.6 cores (which I wouldn't complain about)

But my desired target kernel, 4.4.14, clearly uses about 1 1/2 cores more on 
the same load... (all other indicators and measurements I have show that the 
load served is pretty much stable over the situations I tested).

Some details on the networking setup (invariant over the tested kernels):
* host bonds 4 NICs, half on on-board BNX2 BCM5708, other half on PCIe card 
intel 82571EB hardware. The bond mode is LACP.
* host lacp bond is then member of an ordinary software bridge interface,
  which then also has the tap interface to the VM added. There is vlan
  filtering active on the bridge.
* two bridge vlans are separately broken out and member of a second layer 
bridge with an extra tap interface to my VM. Don't ask why :) but one of these 
carries about half of the traffic
* within the VM, I have another bridge with the VLANs on top and macvlan 
sprinkled in (keepalived VRRP setup on several legs)
* host/vm network is virtio, of course
* I had to disable (already some time ago, identical in all tests described 
here) TSO / GSO / UFO on the tap interfaces to my VM, to alleviate severe 
performance regressions. Different story, mentioning it just for completeness.

Regarding the host hardware, I actually have a third system, software 
identical, but with some more cores and purely on BNX2 BCM5719. The 4.4.14-
needs-lots-more-systemtime symptoms were practically the same there.

To end this tale, let me note that I have NO operational problems with the 
test using the 4.4.14 kernel, as far as one can know that within some hours of 
testing. All production metrics (and I have lots of them) are fine - except 
for that system time usage on the host system...

Anybody got a clue what may be happening?

I'm a bit reluctant to jump to 4.6.x or newer kernels, as I like the concept 
of long term stable kernels somehow... :)


best regards
  Patrick




Re: Kernel 4.1 hang, apparently in __inet_lookup_established

2015-11-16 Thread Patrick Schaaf
On Sunday 15 November 2015 16:58:33 Grant Zhang wrote:
> 
> Have you tried the two patches Eric mentioned? One of my 4.1.11 server
> just hanged with very similar stack trace and I am wondering whether the
> aforementioned patches would help.

Sorry, Grant - I'm sticking to 3.14.xx for now.

best regards
  Patrick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel 4.1 hang, apparently in __inet_lookup_established

2015-11-16 Thread Patrick Schaaf
On Sunday 15 November 2015 16:58:33 Grant Zhang wrote:
> 
> Have you tried the two patches Eric mentioned? One of my 4.1.11 server
> just hanged with very similar stack trace and I am wondering whether the
> aforementioned patches would help.

Sorry, Grant - I'm sticking to 3.14.xx for now.

best regards
  Patrick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Kernel 4.1 hang, apparently in __inet_lookup_established

2015-09-23 Thread Patrick Schaaf
Dear kernel developers,

I recently started to upgrade my production hosts and VMs from the 3.14 series 
to 4.1 kernels, starting with 4.1.6. Yesterday, for the second time after I 
started these upgrades, I experienced one of our webserver VMs hanging.

The first time this happened, the VM hung completely, all 5 virtual cores 
spinning at 100%, ping still worked, but nothing else, including no virsh 
console reaction - I had to destroy and restart that VM. No messages were to 
be found.

Yesterday, when it happened the second time, I found the VM spinning on a 
single core only, and could still connect to it via ssh - but it stopped 
accepting apache connections. The core it spun on showed 100% time used in 
"si", with top, and it produced the messages appended below. The VM did not 
shutdown properly when told to, and had to be destroyed again.

If I read that dmesg output correctly it spins in __inet_lookup_established, 
which indeed reads like it has infinite spin potential. But that code itself 
did not change relative to the 3.14 series we've been running for a long time 
without the issues - so the root cause would be something else.

For our production systems I'll revert to the 3.14 series, but maybe this 
report may help somebody understand what's going on.

best regards
  Patrick

dmesg of the hang:

[449302.540017] INFO: rcu_sched self-detected stall on CPU { 4}  (t=6000 
jiffies g=22900108 c=22900107 q=22617)
[449302.540017] Task dump for CPU 4:
[449302.540017] swapper/4   R  running task0 0  1 
0x0008
[449302.540017]  81831140 88081f403950 810ead0e 
0004
[449302.540017]  81831140 88081f403970 810ed288 
0083
[449302.540017]  0005 88081f4039a0 81105cc0 
88081f414d00
[449302.540017] Call Trace:
[449302.540017][] sched_show_task+0xae/0x120
[449302.540017]  [] dump_cpu_task+0x38/0x40
[449302.540017]  [] rcu_dump_cpu_stacks+0x90/0xd0
[449302.540017]  [] rcu_check_callbacks+0x3eb/0x6e0
[449302.540017]  [] ? account_process_tick+0x5c/0x180
[449302.540017]  [] ? tick_sched_handle.isra.18+0x40/0x40
[449302.540017]  [] update_process_times+0x34/0x60
[449302.540017]  [] tick_sched_handle.isra.18+0x31/0x40
[449302.540017]  [] tick_sched_timer+0x3c/0x70
[449302.540017]  [] __run_hrtimer.isra.34+0x4a/0xf0
[449302.540017]  [] hrtimer_interrupt+0xcd/0x1f0
[449302.540017]  [] local_apic_timer_interrupt+0x34/0x60
[449302.540017]  [] smp_apic_timer_interrupt+0x3c/0x60
[449302.540017]  [] apic_timer_interrupt+0x6b/0x70
[449302.540017]  [] ? __inet_lookup_established+0x68/0x130
[449302.540017]  [] ? __inet_lookup_established+0x41/0x130
[449302.540017]  [] tcp_v4_early_demux+0x96/0x150
[449302.540017]  [] ip_rcv_finish+0xb8/0x360
[449302.540017]  [] ip_rcv+0x294/0x3f0
[449302.540017]  [] ? ip_local_deliver_finish+0x140/0x140
[449302.540017]  [] __netif_receive_skb_core+0x52b/0x760
[449302.540017]  [] __netif_receive_skb+0x13/0x60
[449302.540017]  [] netif_receive_skb_internal+0x1e/0x90
[449302.540017]  [] netif_receive_skb_sk+0xc/0x10
[449302.540017]  [] virtnet_receive+0x221/0x7a0
[449302.540017]  [] virtnet_poll+0x1c/0x80
[449302.540017]  [] net_rx_action+0xea/0x2b0
[449302.540017]  [] __do_softirq+0xda/0x1f0
[449302.540017]  [] irq_exit+0x9d/0xb0
[449302.540017]  [] do_IRQ+0x55/0xf0
[449302.540017]  [] common_interrupt+0x6b/0x6b
[449302.540017][] ? sched_clock_cpu+0x98/0xc0
[449302.540017]  [] ? native_safe_halt+0x6/0x10
[449302.540017]  [] default_idle+0x9/0x10
[449302.540017]  [] arch_cpu_idle+0xa/0x10
[449302.540017]  [] cpu_startup_entry+0x258/0x310
[449302.540017]  [] start_secondary+0x123/0x130
[449482.570137] INFO: rcu_sched self-detected stall on CPU { 4}  (t=24004 
jiffies g=22900108 c=22900107 q=97787)
[449482.570148] Task dump for CPU 4:
[449482.570151] swapper/4   R  running task0 0  1 
0x0008
[449482.570156]  81831140 88081f403950 810ead0e 
0004
[449482.570165]  81831140 88081f403970 810ed288 
0083
[449482.570167]  0005 88081f4039a0 81105cc0 
88081f414d00
[449482.570169] Call Trace:
[449482.570171][] sched_show_task+0xae/0x120
[449482.570183]  [] dump_cpu_task+0x38/0x40
[449482.570188]  [] rcu_dump_cpu_stacks+0x90/0xd0
[449482.570191]  [] rcu_check_callbacks+0x3eb/0x6e0
[449482.570194]  [] ? account_process_tick+0x5c/0x180
[449482.570199]  [] ? tick_sched_handle.isra.18+0x40/0x40
[449482.570202]  [] update_process_times+0x34/0x60
[449482.570203]  [] tick_sched_handle.isra.18+0x31/0x40
[449482.570205]  [] tick_sched_timer+0x3c/0x70
[449482.570207]  [] __run_hrtimer.isra.34+0x4a/0xf0
[449482.570209]  [] hrtimer_interrupt+0xcd/0x1f0
[449482.570220]  [] local_apic_timer_interrupt+0x34/0x60
[449482.570222]  [] smp_apic_timer_interrupt+0x3c/0x60
[449482.570226]  [] apic_timer_interrupt+0x6b/0x70
[449482.570230]  [] ? __inet_lookup_established+0x60/0x130

Kernel 4.1 hang, apparently in __inet_lookup_established

2015-09-23 Thread Patrick Schaaf
Dear kernel developers,

I recently started to upgrade my production hosts and VMs from the 3.14 series 
to 4.1 kernels, starting with 4.1.6. Yesterday, for the second time after I 
started these upgrades, I experienced one of our webserver VMs hanging.

The first time this happened, the VM hung completely, all 5 virtual cores 
spinning at 100%, ping still worked, but nothing else, including no virsh 
console reaction - I had to destroy and restart that VM. No messages were to 
be found.

Yesterday, when it happened the second time, I found the VM spinning on a 
single core only, and could still connect to it via ssh - but it stopped 
accepting apache connections. The core it spun on showed 100% time used in 
"si", with top, and it produced the messages appended below. The VM did not 
shutdown properly when told to, and had to be destroyed again.

If I read that dmesg output correctly it spins in __inet_lookup_established, 
which indeed reads like it has infinite spin potential. But that code itself 
did not change relative to the 3.14 series we've been running for a long time 
without the issues - so the root cause would be something else.

For our production systems I'll revert to the 3.14 series, but maybe this 
report may help somebody understand what's going on.

best regards
  Patrick

dmesg of the hang:

[449302.540017] INFO: rcu_sched self-detected stall on CPU { 4}  (t=6000 
jiffies g=22900108 c=22900107 q=22617)
[449302.540017] Task dump for CPU 4:
[449302.540017] swapper/4   R  running task0 0  1 
0x0008
[449302.540017]  81831140 88081f403950 810ead0e 
0004
[449302.540017]  81831140 88081f403970 810ed288 
0083
[449302.540017]  0005 88081f4039a0 81105cc0 
88081f414d00
[449302.540017] Call Trace:
[449302.540017][] sched_show_task+0xae/0x120
[449302.540017]  [] dump_cpu_task+0x38/0x40
[449302.540017]  [] rcu_dump_cpu_stacks+0x90/0xd0
[449302.540017]  [] rcu_check_callbacks+0x3eb/0x6e0
[449302.540017]  [] ? account_process_tick+0x5c/0x180
[449302.540017]  [] ? tick_sched_handle.isra.18+0x40/0x40
[449302.540017]  [] update_process_times+0x34/0x60
[449302.540017]  [] tick_sched_handle.isra.18+0x31/0x40
[449302.540017]  [] tick_sched_timer+0x3c/0x70
[449302.540017]  [] __run_hrtimer.isra.34+0x4a/0xf0
[449302.540017]  [] hrtimer_interrupt+0xcd/0x1f0
[449302.540017]  [] local_apic_timer_interrupt+0x34/0x60
[449302.540017]  [] smp_apic_timer_interrupt+0x3c/0x60
[449302.540017]  [] apic_timer_interrupt+0x6b/0x70
[449302.540017]  [] ? __inet_lookup_established+0x68/0x130
[449302.540017]  [] ? __inet_lookup_established+0x41/0x130
[449302.540017]  [] tcp_v4_early_demux+0x96/0x150
[449302.540017]  [] ip_rcv_finish+0xb8/0x360
[449302.540017]  [] ip_rcv+0x294/0x3f0
[449302.540017]  [] ? ip_local_deliver_finish+0x140/0x140
[449302.540017]  [] __netif_receive_skb_core+0x52b/0x760
[449302.540017]  [] __netif_receive_skb+0x13/0x60
[449302.540017]  [] netif_receive_skb_internal+0x1e/0x90
[449302.540017]  [] netif_receive_skb_sk+0xc/0x10
[449302.540017]  [] virtnet_receive+0x221/0x7a0
[449302.540017]  [] virtnet_poll+0x1c/0x80
[449302.540017]  [] net_rx_action+0xea/0x2b0
[449302.540017]  [] __do_softirq+0xda/0x1f0
[449302.540017]  [] irq_exit+0x9d/0xb0
[449302.540017]  [] do_IRQ+0x55/0xf0
[449302.540017]  [] common_interrupt+0x6b/0x6b
[449302.540017][] ? sched_clock_cpu+0x98/0xc0
[449302.540017]  [] ? native_safe_halt+0x6/0x10
[449302.540017]  [] default_idle+0x9/0x10
[449302.540017]  [] arch_cpu_idle+0xa/0x10
[449302.540017]  [] cpu_startup_entry+0x258/0x310
[449302.540017]  [] start_secondary+0x123/0x130
[449482.570137] INFO: rcu_sched self-detected stall on CPU { 4}  (t=24004 
jiffies g=22900108 c=22900107 q=97787)
[449482.570148] Task dump for CPU 4:
[449482.570151] swapper/4   R  running task0 0  1 
0x0008
[449482.570156]  81831140 88081f403950 810ead0e 
0004
[449482.570165]  81831140 88081f403970 810ed288 
0083
[449482.570167]  0005 88081f4039a0 81105cc0 
88081f414d00
[449482.570169] Call Trace:
[449482.570171][] sched_show_task+0xae/0x120
[449482.570183]  [] dump_cpu_task+0x38/0x40
[449482.570188]  [] rcu_dump_cpu_stacks+0x90/0xd0
[449482.570191]  [] rcu_check_callbacks+0x3eb/0x6e0
[449482.570194]  [] ? account_process_tick+0x5c/0x180
[449482.570199]  [] ? tick_sched_handle.isra.18+0x40/0x40
[449482.570202]  [] update_process_times+0x34/0x60
[449482.570203]  [] tick_sched_handle.isra.18+0x31/0x40
[449482.570205]  [] tick_sched_timer+0x3c/0x70
[449482.570207]  [] __run_hrtimer.isra.34+0x4a/0xf0
[449482.570209]  [] hrtimer_interrupt+0xcd/0x1f0
[449482.570220]  [] local_apic_timer_interrupt+0x34/0x60
[449482.570222]  [] smp_apic_timer_interrupt+0x3c/0x60
[449482.570226]  [] apic_timer_interrupt+0x6b/0x70
[449482.570230]  [] ? __inet_lookup_established+0x60/0x130

Re: [PATCH 2/3] x_tables: Use also dev->ifalias for interface matching

2015-01-12 Thread Patrick Schaaf
On Monday 12 January 2015 17:22:57 Patrick McHardy wrote:
> On 12.01, Patrick Schaaf wrote:
> >
> > Interfaces come and go through many different actions. There's the admin
> > downing and upping stuff like bridges or bonds. There's stuff like libvirt
> > / KVM / qemu creating and destroying interfaces. In all these cases, in
> > my practise, I give the interfaces useful names to that I can
> > prefix-match them in iptables rules.
> > 
> > Dynamically modifying the ruleset for each such creation and destruction,
> > would be a huge burden. The base ruleset would need suitable "hooks" where
> > these rules were inserted (ordering matters!). The addition would hardly
> > be
> > atomic (with traditional iptables, unless done by generating a whole new
> > ruleset and restoring). The programs (e.g. libvirt) would need to be able
> > to call out to these specially crafted rule generator scripts. The admin
> > would need to add them as pre/post actions to their static (manual)
> > interface configuration. Loading and looking at the ruleset before
> > bringing up the interface would be impossible.
> 
> devgroups seem like the best solution for this.

Could be, technically.

Is there devgroup support in libvirt, ifcfg, whatever other distros use for 
their static interface configuration? Or, do I again have to write pre/post 
scripts to set devgroups? Wouldn't bother me too much nowadays, I've automated 
that for ifcfg style stuff in my production environment a year ago, but it's 
something an admin must actively manage...

There is other stuff, apart from libvirt, that creates and destroys interfaces 
on the fly. From my production environment, there's at least keepalived, which 
creates macvlan interfaces on the fly for VRRP VMAC support. I can configure 
the name for that, but nothing else, nor can I call a script pre/post for 
that. And my iptables rules on that boxen _do_ match specially on these 
interfaces.

Gooling a bit around does not immediately turn up any good documentation on it 
at all (four year old iproute2 commits, once I give that as a search term 
too?). Looks very sketchy (although the fundamental idea is clear to me. I'm 
looking through the normal admin practise lens)

best regards
  Patrick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] x_tables: Use also dev->ifalias for interface matching

2015-01-12 Thread Patrick Schaaf
On Monday 12 January 2015 08:51:54 Eric Dumazet wrote:
> On Mon, 2015-01-12 at 17:39 +0100, Patrick Schaaf wrote:
> > 
> > Not to comment on the ifalias thing, which I think is unneccessary,
> > too, but matching on interface names instead of only ifindex, is
> > definitely needed, so that one can establish a full ruleset before
> > interfaces even exist. That's good practise at boottime, but also
> > needed for dynamic interface creation during runtime.
> 
> Please do not send html messages : Your reply did not reach the lists.

Sigh. Sorry...

> Then, all you mention could have been solved by proper userspace
> support.
> 
> Every time you add an interface or change device name, you could change
> firewalls rules if needed. Nothing shocking here.

That is totally impractical, IMO.

Interfaces come and go through many different actions. There's the admin 
downing and upping stuff like bridges or bonds. There's stuff like libvirt / 
KVM / qemu creating and destroying interfaces. In all these cases, in my 
practise, I give the interfaces useful names to that I can prefix-match them 
in iptables rules.

Dynamically modifying the ruleset for each such creation and destruction, 
would be a huge burden. The base ruleset would need suitable "hooks" where 
these rules were inserted (ordering matters!). The addition would hardly be 
atomic (with traditional iptables, unless done by generating a whole new 
ruleset and restoring). The programs (e.g. libvirt) would need to be able to 
call out to these specially crafted rule generator scripts. The admin would 
need to add them as pre/post actions to their static (manual) interface 
configuration. Loading and looking at the ruleset before bringing up the 
interface would be impossible.

Note that I do fully agree that it's sad that iptables rules waste all that 
memory for each and every rule! I remember musing about improving that in 
talks with Harald Welte back in the 90ies. A simple match would be perfectly 
fine for me. Only having ifindex support, isn't.

best regards
  Patrick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] x_tables: Use also dev-ifalias for interface matching

2015-01-12 Thread Patrick Schaaf
On Monday 12 January 2015 17:22:57 Patrick McHardy wrote:
 On 12.01, Patrick Schaaf wrote:
 
  Interfaces come and go through many different actions. There's the admin
  downing and upping stuff like bridges or bonds. There's stuff like libvirt
  / KVM / qemu creating and destroying interfaces. In all these cases, in
  my practise, I give the interfaces useful names to that I can
  prefix-match them in iptables rules.
  
  Dynamically modifying the ruleset for each such creation and destruction,
  would be a huge burden. The base ruleset would need suitable hooks where
  these rules were inserted (ordering matters!). The addition would hardly
  be
  atomic (with traditional iptables, unless done by generating a whole new
  ruleset and restoring). The programs (e.g. libvirt) would need to be able
  to call out to these specially crafted rule generator scripts. The admin
  would need to add them as pre/post actions to their static (manual)
  interface configuration. Loading and looking at the ruleset before
  bringing up the interface would be impossible.
 
 devgroups seem like the best solution for this.

Could be, technically.

Is there devgroup support in libvirt, ifcfg, whatever other distros use for 
their static interface configuration? Or, do I again have to write pre/post 
scripts to set devgroups? Wouldn't bother me too much nowadays, I've automated 
that for ifcfg style stuff in my production environment a year ago, but it's 
something an admin must actively manage...

There is other stuff, apart from libvirt, that creates and destroys interfaces 
on the fly. From my production environment, there's at least keepalived, which 
creates macvlan interfaces on the fly for VRRP VMAC support. I can configure 
the name for that, but nothing else, nor can I call a script pre/post for 
that. And my iptables rules on that boxen _do_ match specially on these 
interfaces.

Gooling a bit around does not immediately turn up any good documentation on it 
at all (four year old iproute2 commits, once I give that as a search term 
too?). Looks very sketchy (although the fundamental idea is clear to me. I'm 
looking through the normal admin practise lens)

best regards
  Patrick
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] x_tables: Use also dev-ifalias for interface matching

2015-01-12 Thread Patrick Schaaf
On Monday 12 January 2015 08:51:54 Eric Dumazet wrote:
 On Mon, 2015-01-12 at 17:39 +0100, Patrick Schaaf wrote:
  
  Not to comment on the ifalias thing, which I think is unneccessary,
  too, but matching on interface names instead of only ifindex, is
  definitely needed, so that one can establish a full ruleset before
  interfaces even exist. That's good practise at boottime, but also
  needed for dynamic interface creation during runtime.
 
 Please do not send html messages : Your reply did not reach the lists.

Sigh. Sorry...

 Then, all you mention could have been solved by proper userspace
 support.
 
 Every time you add an interface or change device name, you could change
 firewalls rules if needed. Nothing shocking here.

That is totally impractical, IMO.

Interfaces come and go through many different actions. There's the admin 
downing and upping stuff like bridges or bonds. There's stuff like libvirt / 
KVM / qemu creating and destroying interfaces. In all these cases, in my 
practise, I give the interfaces useful names to that I can prefix-match them 
in iptables rules.

Dynamically modifying the ruleset for each such creation and destruction, 
would be a huge burden. The base ruleset would need suitable hooks where 
these rules were inserted (ordering matters!). The addition would hardly be 
atomic (with traditional iptables, unless done by generating a whole new 
ruleset and restoring). The programs (e.g. libvirt) would need to be able to 
call out to these specially crafted rule generator scripts. The admin would 
need to add them as pre/post actions to their static (manual) interface 
configuration. Loading and looking at the ruleset before bringing up the 
interface would be impossible.

Note that I do fully agree that it's sad that iptables rules waste all that 
memory for each and every rule! I remember musing about improving that in 
talks with Harald Welte back in the 90ies. A simple match would be perfectly 
fine for me. Only having ifindex support, isn't.

best regards
  Patrick
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/